Index Types Overview Oracle, MySQL, PostgreSQL, MS SQL

    In one of the comments there was a request to tell you more about indexes, and since there are practically no summary data on supported indexes of various DBMSs in Runet, in this review I will consider what types of indexes are supported in the most popular DBMSs

    B-tree


    The B-Tree index family is the most commonly used type of index, organized as a balanced tree of ordered keys. They are supported by almost all DBMSs, both relational and non-relational, and for almost all data types. Since most probably know them well (or can read about them here , for example ), the only thing that should be noted here is that this type of index is optimal for a set with a good distribution of values ​​and high power (cardinality -number of unique values).





    Spatial indexes


    At the moment, all DBMS data has spatial data types and functions for working with them; for Oracle, it is a set of types and functions in the MDSYS scheme; for PostgreSQL, it is point, line, lseg, polygon, box, path, polygon, circle, in MySQL - geometry, point, linestring, polygon, multipoint, multilinestring, multipolygon, geometrycollection, MS SQL - Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, GeometryCollection.
    In the spatial query operation scheme, two stages or two stages of filtering are usually distinguished. DBMSs with poor spatial support work out only the first stage (coarse filtering, MySQL). As a rule, an approximate, approximated representation of objects is used at this stage. The most common type of approximation is the Minimum Bounding Rectangle [MBR] [100].
    For spatial data types, there are special indexing methods based on R-trees (R-Tree index) and grids (Grid-based Spatial index).

    Spatial grid

    Spatial grid index is a tree structure similar to a B-tree, but is used to organize access to Spatial data, that is, to index multidimensional information, such as, for example, geographic data with two-dimensional coordinates (latitude and longitude) ) In this structure, the nodes of the tree are the cells of space. For example, for two-dimensional space: first, the entire parent area will be divided into a strictly defined resolution grid, then each grid cell in which the number of objects exceeds the set maximum of objects in the cell will be divided into the next level subgrid. This process will continue until maximum nesting is achieved (if installed), or until everything is divided up to cells not exceeding the maximum of objects.


    In the case of three-dimensional or multidimensional space, these will be rectangular parallelepipeds (cuboids) or parallelepots.


    Quadtree

    Quadtree is a subset of the Grid-based Spatial index, in which there are always 4 children in the parent cell and the grid resolution varies depending on the nature or complexity of the data.



    R-tree

    R-Tree (Regions Tree) is also a tree-like data structure similar to the Spatial Grid, proposed in 1984 by Antonin Guttman. This data structure also splits the space into many hierarchically nested cells, but which, unlike the Spatial Grid, are not required to completely cover the parent cell and can intersect.
    Various algorithms can be used to split overflowed vertices, which causes the division of R-trees into subtypes: with quadratic and linear complexity (Guttman, of course, described with exponential complexity - Exhaustive Search, but he, of course, is not used anywhere).
    The quadratic subtype is divided into two rectangles with a minimum area covering all objects. Linear - divided by maximum distance.



    Hash


    Hash indexes were proposed by Arthur Fuller, and they do not store the values ​​themselves, but their hashes, which reduces the size (and, accordingly, increases the processing speed) of indexes from large fields. Thus, when querying using HASH indexes, not the one searched for from the field value will be compared, but the hash from the sought value with the field hashes.
    Due to the nonlinear nature of the hash functions, this index cannot be sorted by value, which makes it impossible to use more / less and “is null” comparisons. In addition, since hashes are not unique, collision resolution methods are used for matching hashes.

    Bitmap


    Bitmap index - the method of bit indexes is to create separate bitmaps (a sequence of 0 and 1) for each possible column value, where each bit corresponds to a row with an indexed value, and its value equal to 1 means that the record corresponding to the bit position contains an indexed value for this column or property.

    EmpIDFloor
    1Male
    2Female
    3Female
    4Male
    5Female


    Bitmaps
    ValueStart the end Bit mask
    Male First line addressLast line address10010
    Female First line addressLast line address01101

    The main advantage of bit indexes is that on large sets with low power and good clustering by their values, the index will be less than B * -tree. (Read more here or here )

    Reverse index


    Reverse index is also a B-tree index but with a reversed key, used mainly for monotonically increasing values ​​(for example, auto-incrementing identifier) ​​in OLTP systems in order to remove competition for the last leaf block of the index, because by flipping the value, two adjacent index entries fall into different index blocks. It cannot be used for range searching.
    Example:
    Field in the table (bin) Reverse index key (bin)
    00000001 10,000,000
    ... ...
    00001001 10010000
    00001010 01010000
    00001011 11010000
    As you can see, the value in the index changes much more than the value in the table itself, and therefore in the b-tree structure, they will fall into different blocks.

    Inverted index


    An inverted index is a full-text index that stores for each key token a sorted list of addresses of table entries that contain a given key.
    1 Mom washed the frame
    2 Dad washed the frame
    3 Dad washed the car
    4 Mom polished the car

    In a simplified form, it will look like this:
    Mother 1.4
    Soaps 1
    Ramu 1,2
    Dad 2,3
    Polished 4
    A car 3.4


    Partial index


    Partial index is an index built on the part of a table that satisfies a certain condition of the index itself. This index was created to reduce the size of the index.

    Function-based index


    The most flexible type of indexes are functional indexes, that is, indexes whose keys store the result of user-defined functions. Functional indexes are often built for fields whose values ​​are pre-processed before comparison in the SQL command. For example, when comparing string data without case sensitivity, the UPPER function is often used. Creating a functional index with the UPPER function improves the effectiveness of such comparisons.
    In addition, a functional index can help to implement any other missing type of indexes for a given DBMS (except, perhaps, a bit index, for example, Hash for Oracle)

    Index Type Summary Table


    MySQL PostgreSQL MS SQLOracle
    B-tree indexthere is there is there is there is
    Supported Spatial IndexesQuadratic R-TreeRtree_GiST (uses linear splitting)4-level Grid-based spatial index (separate for geographic and geodetic data)R-Tree with quadratic partition; Quadtree
    Hash indexOnly in tables like Memorythere is Not Not
    Bitmap indexNot there is Not there is
    Reverse indexNot Not Not there is
    Inverted indexthere is there is there is there is
    Partial indexNot there is there isNot
    Function based indexNot there is there is there is

    It is worth mentioning that in PostgreSQL GiST allows you to create an index based on R-Tree for any native data type. To do this, you need to implement all 7 functions of the R-Tree mechanism.

    You can read more here:
    Oracle Spatial User's Guide and Reference
    Spatial data in MS SQL
    MS SQL: Spatial Indexing Overview
    Hilbert R-tree
    Spatial types PostgreSQL
    Spatial functions PostgreSQL
    Indexing spatial data in DBMS Microsoft SQL Server 2000
    Papadias D., Theodoridis T. Spatial Relations , Minimum Bounding Rectangles and Spatial Data Structures // Technical Report KDB-SLAB-TR-94-04,
    Faloutsos C., Kamel I. Hilbert R-Tree: An Improved R-Tree UsingFractals // Department of CS, University of Maryland, TechnicalResearch Report TR-93-19,
    Wikipedia: Hilbert R-tree
    External Search Methods
    Arthur Fuller. Intelligent database design using hash keys

    PS. Perhaps I forgot to mention something, write to the PM or in the comments - I will add.

    Also popular now: