Data in MarkLogic Server [Part2]

    A little more about how MarkLogic Server stores data.

    About data formats

    MarkLogic Server is an XML database, but in addition to XML, it can store JSON, text and binary data. In this case, JSON documents are transformed into XML when they enter the database. Text documents are indexed as XML text objects without a “parent” object. Binary documents are not indexed by default, but it is possible to create an index of their metadata and extracted content.

    About indexes

    MarkLogic indexes are used everywhere and this is done to increase database performance. Text index and Structure index are available from the box.which index all XML data and are used when executing XQuery queries, which allows to achieve high efficiency. Metadata indexes are also available: Collection Indexes , Directory Indexes , Security Indexes , Properties Indexes .

    It is worth noting that indexes in MarkLogic Server can exceed the size of the XML data themselves by 2 or even 3 times. But such a situation is possible only with a large number of indexes involved. This is also affected by the fact that MarkLogic compresses XML data during storage. Out of the box, MarkLogic usually has a small index size relative to the source data.

    About the internal view

    Let's take a closer look at how data is stored in MarkLogic Server. The key concepts here are:

    Database is the highest abstraction over the internal representation of data in MarkLogic Server. It provides access to data as a single entity, regardless of the mechanisms of scaling and internal representation.

    The Database object combines security settings, xml document schemes, a set of triggers, in-memory cache settings, indexes, options that regulate search, logging settings, replication options, backup settings and a set of Forest objects.

    Forest- these are the objects in which data and indexes are stored. A database can have more than one Forest object and they can be located on one or on different servers. The “local-disk failover” mechanism manipulates Forest objects, for this one Forest object is assigned one or more “replica forest” objects, which improves reliability.

    Forest has significantly fewer settings compared to Database objects. For Forest, you can configure the location of data on the “data directory” file system , specify the location for storing large objects “large data directory” or the location of the so-called “fast data directory” , i.e. directories on a fast file system.“Fast data directory” is used to store transaction log and data fragments. This directory must be located on a storage device different from the one on which the “data directory" is located . When filling in the “fast data directory”, large objects from it merge with the data in the “data directory” . Inside Forest, data is stored in Stand objects.

    Stand - It is an integral part of Forest objects. Each Stand is a packed binary file stored in subdirectories of the Forest object. The Stand object itself consists of XML fragments.

    Also popular now: