xtender August 27, 2010 at 08:44

Index Types Overview Oracle, MySQL, PostgreSQL, MS SQL

In one of the comments there was a request to tell you more about indexes, and since there are practically no summary data on supported indexes of various DBMSs in Runet, in this review I will consider what types of indexes are supported in the most popular DBMSs

B-tree

The B-Tree index family is the most commonly used type of index, organized as a balanced tree of ordered keys. They are supported by almost all DBMSs, both relational and non-relational, and for almost all data types. Since most probably know them well (or can read about them here , for example ), the only thing that should be noted here is that this type of index is optimal for a set with a good distribution of values and high power (cardinality -number of unique values).

Spatial indexes

At the moment, all DBMS data has spatial data types and functions for working with them; for Oracle, it is a set of types and functions in the MDSYS scheme; for PostgreSQL, it is point, line, lseg, polygon, box, path, polygon, circle, in MySQL - geometry, point, linestring, polygon, multipoint, multilinestring, multipolygon, geometrycollection, MS SQL - Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, GeometryCollection.
In the spatial query operation scheme, two stages or two stages of filtering are usually distinguished. DBMSs with poor spatial support work out only the first stage (coarse filtering, MySQL). As a rule, an approximate, approximated representation of objects is used at this stage. The most common type of approximation is the Minimum Bounding Rectangle [MBR] [100].
For spatial data types, there are special indexing methods based on R-trees (R-Tree index) and grids (Grid-based Spatial index).

Spatial grid

Spatial grid index is a tree structure similar to a B-tree, but is used to organize access to Spatial data, that is, to index multidimensional information, such as, for example, geographic data with two-dimensional coordinates (latitude and longitude) ) In this structure, the nodes of the tree are the cells of space. For example, for two-dimensional space: first, the entire parent area will be divided into a strictly defined resolution grid, then each grid cell in which the number of objects exceeds the set maximum of objects in the cell will be divided into the next level subgrid. This process will continue until maximum nesting is achieved (if installed), or until everything is divided up to cells not exceeding the maximum of objects.

In the case of three-dimensional or multidimensional space, these will be rectangular parallelepipeds (cuboids) or parallelepots.

Quadtree

Quadtree is a subset of the Grid-based Spatial index, in which there are always 4 children in the parent cell and the grid resolution varies depending on the nature or complexity of the data.

R-tree

R-Tree (Regions Tree) is also a tree-like data structure similar to the Spatial Grid, proposed in 1984 by Antonin Guttman. This data structure also splits the space into many hierarchically nested cells, but which, unlike the Spatial Grid, are not required to completely cover the parent cell and can intersect.
Various algorithms can be used to split overflowed vertices, which causes the division of R-trees into subtypes: with quadratic and linear complexity (Guttman, of course, described with exponential complexity - Exhaustive Search, but he, of course, is not used anywhere).
The quadratic subtype is divided into two rectangles with a minimum area covering all objects. Linear - divided by maximum distance.

Hash

Hash indexes were proposed by Arthur Fuller, and they do not store the values themselves, but their hashes, which reduces the size (and, accordingly, increases the processing speed) of indexes from large fields. Thus, when querying using HASH indexes, not the one searched for from the field value will be compared, but the hash from the sought value with the field hashes.
Due to the nonlinear nature of the hash functions, this index cannot be sorted by value, which makes it impossible to use more / less and “is null” comparisons. In addition, since hashes are not unique, collision resolution methods are used for matching hashes.

Bitmap

Bitmap index - the method of bit indexes is to create separate bitmaps (a sequence of 0 and 1) for each possible column value, where each bit corresponds to a row with an indexed value, and its value equal to 1 means that the record corresponding to the bit position contains an indexed value for this column or property.

EmpID	Floor
1	Male
2	Female
3	Female
4	Male
5	Female

Bitmaps

Value	Start	the end	Bit mask
Male	First line address	Last line address	10010
Female	First line address	Last line address	01101

The main advantage of bit indexes is that on large sets with low power and good clustering by their values, the index will be less than B * -tree. (Read more here or here )

Reverse index

Reverse index is also a B-tree index but with a reversed key, used mainly for monotonically increasing values (for example, auto-incrementing identifier) in OLTP systems in order to remove competition for the last leaf block of the index, because by flipping the value, two adjacent index entries fall into different index blocks. It cannot be used for range searching.
Example:

Field in the table (bin)	Reverse index key (bin)
00000001	10,000,000
...	...
00001001	10010000
00001010	01010000
00001011	11010000

As you can see, the value in the index changes much more than the value in the table itself, and therefore in the b-tree structure, they will fall into different blocks.

Inverted index

An inverted index is a full-text index that stores for each key token a sorted list of addresses of table entries that contain a given key.

1	Mom washed the frame
2	Dad washed the frame
3	Dad washed the car
4	Mom polished the car

In a simplified form, it will look like this:

Mother	1.4
Soaps	1
Ramu	1,2
Dad	2,3
Polished	4
A car	3.4

Partial index

Partial index is an index built on the part of a table that satisfies a certain condition of the index itself. This index was created to reduce the size of the index.

Function-based index

The most flexible type of indexes are functional indexes, that is, indexes whose keys store the result of user-defined functions. Functional indexes are often built for fields whose values are pre-processed before comparison in the SQL command. For example, when comparing string data without case sensitivity, the UPPER function is often used. Creating a functional index with the UPPER function improves the effectiveness of such comparisons.
In addition, a functional index can help to implement any other missing type of indexes for a given DBMS (except, perhaps, a bit index, for example, Hash for Oracle)

Index Type Summary Table

	MySQL	PostgreSQL	MS SQL	Oracle
B-tree index	there is	there is	there is	there is
Supported Spatial Indexes	Quadratic R-Tree	Rtree_GiST (uses linear splitting)	4-level Grid-based spatial index (separate for geographic and geodetic data)	R-Tree with quadratic partition; Quadtree
Hash index	Only in tables like Memory	there is	Not	Not
Bitmap index	Not	there is	Not	there is
Reverse index	Not	Not	Not	there is
Inverted index	there is	there is	there is	there is
Partial index	Not	there is	there is	Not
Function based index	Not	there is	there is	there is

It is worth mentioning that in PostgreSQL GiST allows you to create an index based on R-Tree for any native data type. To do this, you need to implement all 7 functions of the R-Tree mechanism.

You can read more here:
Oracle Spatial User's Guide and Reference
Spatial data in MS SQL
MS SQL: Spatial Indexing Overview
Hilbert R-tree
Spatial types PostgreSQL
Spatial functions PostgreSQL
Indexing spatial data in DBMS Microsoft SQL Server 2000
Papadias D., Theodoridis T. Spatial Relations , Minimum Bounding Rectangles and Spatial Data Structures // Technical Report KDB-SLAB-TR-94-04,
Faloutsos C., Kamel I. Hilbert R-Tree: An Improved R-Tree UsingFractals // Department of CS, University of Maryland, TechnicalResearch Report TR-93-19,
Wikipedia: Hilbert R-tree
External Search Methods
Arthur Fuller. Intelligent database design using hash keys

PS. Perhaps I forgot to mention something, write to the PM or in the comments - I will add.

Tags: