BaseX Unknown NoSQL Universe

    Far, far, somewhere on the edge of the Galaxy, I found a very impressive NoSQL solution ...

    Love, apathy, hatred, admiration, pride, anger, joy - these are the emotions I had for a whole year. The more I studied this product, the stronger the feelings.

    The marketing seed from the authors goes something like this:
    BaseX is a very lightweight, high-performance and scalable XML database with an XPath / XQuery 3.0 processor that has full support for W3C Update and Full Text specifications. An interactive and user-friendly graphical interface makes it easy to examine your XML documents.

    It sounds very tasty, but reality, as always, painfully painfully strikes the most insecure places

    What's inside

    BaseX is an open source software product written in Java and distributed under the BSD license. From the box we get a utility with a graphical interface (management, analysis, editing code with syntax highlighting), server, client, web server. In reality, we get even more, but first things first.


    The BaseX team has developed its own repository, which even scientific work is dedicated to . Although inside the data is still stored in a “disassembled” form, this does not prevent it from being a clean document-oriented database without a fixed structure. And the very concept of a database here is somewhat specific.

    In BaseX, a database is a folder in which certain resources are stored. The resource can be either an ordinary file or an XML document. Files are stored as they are in the file system, and XML documents are transformed into an internal representation. The database can store both XML documents and other resources at the same time. This method of data storage makes it easy to transfer databases with simple copying.

    The tables familiar to the relational model are absent as a class. But there are XML documents and their collections. Well, what else could be in an XML database? :)


    BaseX can index the XML structure, attributes, text, and even make a full-text index (for a limited number of languages). Barrel of tar is that one of the types of indexes is static, i.e. updating the data leads to invalidation of the index, and the second, dynamic, is slow. Speed ​​degradation in data insertion operations reaches order.

    A database can have only one index type, static or updatable, without the right to change the index. To carry out the change operation, you need to export the data, create a database with a different type of index, and fill the data again.

    In general, indexing is poor. Yes, this is a useful mechanism that can speed up read requests by several orders of magnitude, but the possibilities are very weak. Maybe I'm too used to relational database indexes.


    BaseX does not have the concept of a transaction that everyone is used to in the world of relational DBMSs. You cannot explicitly start a transaction, perform several actions, and then complete it. A transaction is a server command or an executable script. BaseX is not multi-versioned and write transactions block the database. Since version 7.6, lock management has been moved to the server level rather than the file system, which significantly accelerated the execution of requests.

    From all of the above, a very simple conclusion can be made - BaseX does not like recording. Intense write load leads to an increase in latency of requests. But in reading, he manifests himself very, very.


    Reservations are made out of the box, but this is more a forehead decision than an elegant mechanism. As we already know, the database is an ordinary directory. BaseX archives it corny and folds it like a regular zip file. Dumb as a corner of the house. Everything works fine on small files, but if there are several gigabytes of data in the database, then everything becomes expectedly sad. For offline solutions, the time to generate a backup copy is not a problem, but for a constantly working system this can cause a short-term refusal of service.

    You can run the backup procedure as for a specific database. so for everyone at once. Recovery is as simple as backing up. Since there can be many zip files, you can recover to a specific backup. Server shutdown is not required for this.

    Replication and Redo Logs

    With this, BaseX has sadness. I have raised this issue more than once with the project manager, Christian Grün, and he promised to consider the possibility of introducing this functionality in the near future, but so far the question is open.


    Another sad trouble ...

    Administration Features

    They are very simple. You can start users, give them the right to 4 types of actions: read, write, create databases and administer. Not much, but hard to imagine another. Again, since the database is a collection of files, additional “protection” can be organized at the file system level.

    Action log

    This is also there, but to be honest, I don’t really like its format and structure. The main complaint about the data storage format. Some information can be pulled out of it, but it will be somewhat difficult to reconstruct the picture of what is happening.

    Client server architecture

    BaseX easily rises as a server. Special drivers for work are not required, since work with the server goes through ports. To raise several servers at the same time, you just need to distribute them on different ports. They can easily execute each other's requests, for this you need to write a few lines of code (or copy from the documentation :) The

    protocol for exchanging data with the server is well documented, writing a client for different languages ​​does not make any problems. Currently there are ready-made clients for the following languages ​​or systems: C #, VB, Scala, Java, ActionScript, Perl, PHP, Python, Rebol, Ruby, Haskell, Lisp, node.js, Qt, and, of course, C.


    This, in my humble opinion, is the biggest plus of this product. For me personally, XQuery is much more attractive than SQL, and there are several reasons for this.


    I will not give direct comparisons of XQuery and SQL, but for me the first is much more logical, more consistent, more readable. I was able to write normal, more or less complex queries on XQuery already on the second day. True, two months later we rewrote them, but the fact remains.


    XQuery supports functions and you can either create them directly in the request code or connect them in the form of modules. In general, XQuery can be called a functional language. Functions allow you to make the code as concise as possible.


    BaseX supports modularity at two levels: Java modules and xqm (written in XQuery). If analogs are carried out, then these are pure stored procedures


    Another technology that helps write clean and concise code. XPath is used both in primary sampling and as an analogue of JOINs from the SQL world.

    Preset Modules

    As I mentioned above, BaseX provides a lot of functionality right out of the box. These modules do not need to be connected additionally and they are immediately available from the query language.

    If you shorten the list and do not spray too much, the list of features looks like this: a module for viewing system data (lists of users, sessions, logs), archiving data (zip), a client for connecting to other BaseX servers and remote query execution, a module for converting data formats, cryptography, on working with csv files, a module for managing databases, their creation, optimization, transfer, recovery, a module for querying information on uri, working with files, a module for full-text analysis, for working with hash functions, converter ation of HTML documents to XML, to work with HTTP-requests, request information about indexes and inspection databases, parsing and serialization JSON'a module, data mapping, working with mathematical operations, data formatting, system calls, profiling,
    JDBC database connector, module for working with streaming data, module for unit testing, document validation for DTD and XSD, XSL transformation. The full list here is

    Not bad for a booger like BaseX!

    Application area

    Having studied all the pros and cons over the year, we can distinguish several main areas of application of this product
    • Analytical systems . Since BaseX works with XML, which does not have a rigid structure, and even uses XQuery as a query language, this makes it an ideal tool for analytics and dataming
    • Standalone products . Because out of the box we get a lot of things. You can even raise a web server.
    • Educational systems . There are all prerequisites and a full set of functionality for this.
    • Unstructured data storage systems .XML will endure everything :)


    I would like to thank the BaseX team for the impressive product, which opened up a lot of new opportunities and technologies for me!

    I would recommend taking a closer look at this product, even if not using it in real projects, only as a basis for studying XQuery.

    Official project site:
    Download: from here

    Also popular now: