Expanding a wide table into a column (EAV pattern)


    There is an entity that is characterized by a huge and often variable number of parameters. The task of storing these entities and even so that the search can also be conducted preferably also with the construction of the index.

    I must say right away that I don’t know how this task is solved by reference, but it was necessary to solve it quickly, and therefore I thought about it and made the decision that I will talk about.

    The first one.
    The number of parameters is variable => we store different properties of the entity in a table of this kind:
    | (int) id | (enum: bool, int, float, string) prop_type | (varchar) prop_name |

    Now we consider the entity and try to distinguish the primary (most used in the search) and secondary properties of the entity, by the primary properties we create a table:
    | id | meta data - all primary entities |

    And how to store all those 10, 20, 300 secondary characteristics?

    Initially, I set the type separation because I thought that storing a number in a string and searching them was blasphemous, so we created 3 tables.
    int_properties, float_properties, string_properties,
    (int) object_id - external to the object table primary key
    (int) prop_id - external to the property table primary key
    value - the type corresponds to the table.
    I suppose you understand what I mean?
    So when I save the object, I get one record in the objects table and as many records in int_properties and float_properties and string_properties, how many properties of such an object I have, but no more than the number of lines in prop_description.
    Such data storage is called 3.5 normal form .

    I think the question of how to write your essence into such a representation you decide for yourself and there is no big problem: a cycle of data, table selection, recording.

    The search problem is much more interesting, it is clear that in order to find something in such a structure you need to somehow multiply the tables, the whole question is how? Not really that hard.
    I write further examples in PHP because its syntax is the most intuitive.
    We are writing a simple constructor of SQL queries:
    * положим у нас есть массив $prop_descr[$properties_name] = array('id'=>$prop_id, 'type'=>$prop_type)
    * как правило, это предположение может быть выполенно ибо таблица, описывающая свойства объектов мала и часто может быть закешированна
    $selectQuery=' FROM objects';
    foreach ($data as $properties_name=>$value){
       $selectToken[]=$tableName .'.`value` as '.$properties_name;
       // иногда тут нужно вставить LEFT
       $selectQuery .= 'INNER JOIN `'. $prop_type .'_properties` AS `'. $tableName .'`
           ON (`objects`.`id` = `'. $tableName .'`.`object_id`
           AND `' .$tableName. '`.`prop_id ` = '. $prop_id .')';
    $selectQuery = 'SELECT '.implode(', ', $selectToken).' '.$selectQuery;

    PS There are no deliberate checks in the code because they have nothing to do with logic!

    So we will get the result expanded into rows, of course, to conduct a search on such a resulting table, you also need to write $ tableName.`value in WHERE, but this is not difficult to achieve by setting up an array of matches: array ($ properties_name => $ tableName. ' .`value`);

    Pros :
    - It’s really easy to add and remove characteristics of objects (that is, easier to maintain).
    - the search is relatively fast (certainly faster than bitwise masks) because indexes are used (primary key)
    - the tables are surprisingly normalized and just draw, explain, present logic.
    - # more from kmmbvnr

    Cons :
    - This is a bit slower than just one mega wide table on a previously known query.
    - SQL is not so readable (but the optimizer grabs the current into the path).

    UPD thanks Joshua for the name

    Also popular now: