Bagri - NoSQL open source database built on top of a distributed cache
Today I want to tell you about an open source project called Bagri . Bagri is a distributed document database, or, as it is now fashionable to say, a NoSQL database written in Java and designed taking into account requirements mainly used in the corporate sector, such as high availability, fault tolerance, scalability and transaction support.
It is good to use the system first of all in those cases when the workflow is based on XML. These are finance, logistics, insurance, medicine, and other industries where the format of documents exchanged between participants is strictly defined by corporate XSD schemes. The system allows you to not parse every incoming document, but put it in the database as is, and then effectively execute any queries on stored documents using the powerful XQuery 3.1 toolkit.
Bagri is built on top of distributed cache products such as Hazelcast, Coherence, Infinispan and other similar systems. It is because of the distributed cache capabilities that Bagri supports the requirements of the corporate sector right out of the box. The distributed cache is used by the system not only as a data warehouse, but also as a distributed processing system for this data, which allows you to efficiently and quickly process any large amounts of loosely structured data. Transaction in the system is solved using an algorithm that implements multi-version concurrency control
Data is delivered to the system as XML or JSON documents. There is also the opportunity to implement your extension to Bagri and register a plug-in for working with new document formats, as well as with external document storage systems. The auxiliary project bagri-extensions contains the extensions developed by the team (the connector to MongoDB is currently implemented). XQuery is
used as a query language , in the future it is planned to also support SQL syntax, this task is available in the project github.
Bagri does not require prior knowledge of the data scheme, but forms a dictionary of data (unique paths in document structures) “on the fly” while parsing incoming documents. T.O. Bagri is completely schemaless and does not need to rebuild tables for new types of documents, i.e. he does not need create table / add column commands in principle.
For communication between the client and the server, Bagri offers two APIs: the standard XQJ API declared in JSR 225 and its own XDM API, which provides additional functionality that is not available in XQJ. In fact, XQJ interfaces are analogous to the functionality provided by the JDBC driver when working with relational databases. Together with the XQJ driver, the official XQJ TCK is delivered with the system, which you can run and make sure that the driver passes all XQJ tests 100%.
All documents in Bagri are stored in schemes, the closest analogue in the relational database (RDBMS) is the database. Currently, Hazelcast is used as the distributed cache on which the system is built, and a separate Hazelcast cluster is allocated for each scheme. Schemes exist independently of each other, i.e. there is no “struggle” for resources between schemes (in Hazelcast, each cluster is configured separately and has its own resource pool).
Document meta data (namespaces, document types, unique paths) are stored in the appropriate caches and are replicated between all cluster nodes. T.O. access to read metadata on work nodes is as fast as possible. The data of the documents themselves is separated from the metadata and stored in distributed caches, while the data related to one document is always stored on the same node. There are also caches for storing indexed values, for compiled queries, for a transaction log, and of course for the results of executed queries.
The Bagri client connects to the server using the internal mechanisms of the client cache software. XQuery client requests are packaged into jobs and executed on server nodes through the distributed ExecutorService provided by the Hazelcast platform.
Results are returned to the client through a dedicated asynchronous channel (Hazelcast queue)
The entire system configuration is stored in two files: access.xml for setting roles and users, and config.xml with settings for schemes and Bagri extensions. A detailed description of the format of these files and all the parameters used in them can be found in the installation and configuration instructions of the system . You can change the schema settings directly in files, or through JMX interfaces for managing schemas deployed on the Bagri administrative server.
Let's move on from theory to practice and see how we can work with Bagri from our Java code through XQJ interfaces.
Inside the spring context, declare the BagriXQDataSource bin and configure its four main parameters: remote server address, schema name, username and password.
Then we read the text file and based on it we create a new document in Bagri:
The request above calls the external store-document function defined in the bgdm namespace. The function accepts 3 parameters as input: uri, under which the document will be saved, the text content of the document and an optional set of options that defines additional parameters of the document saving function. The request is validated on the client side and then sent to the server along with the parameters.
On the server side, the incoming document is assigned a unique identifier. Further, the contents of the document are parsed in accordance with the specified format of the document and divided into pairs path / values, while all unique paths are stored in the replicated directory of document paths. At the end of the parsing procedure, all the contents of the document, divided into such pairs, are stored in the distributed system caches, and the document header with service information is also cached. If indexes are registered in the schema, all indexed values are also stored in the index cache. Confirmation of the successful saving of the document is sent back to the client side.
After we safely saved our document in Bagri, let's look at how we can make requests to documents stored in the system.
Get the XQJ connection:
Prepare the XQuery query:
Set the value of the search parameter:
We execute the request on the server:
And look through the results:
I think it will be just as interesting to talk about what happens on the server when this code is executed: The
query passes through the XQuery processor (currently it is Saxon ), which forms the query execution tree (compiled query, XQueryExpression). Then it is translated into a set of simple requests for cached data along the specified paths:
These simple requests are executed in parallel on all nodes of the distributed system cache. Documents found are delivered to the processor for further processing. If possible, indexes are used if the requested paths have been indexed. After the final processing of the received documents by the processor, the results are transmitted to the client back through a dedicated asynchronous channel.
Bagri provides rich opportunities to expand the behavior of the system. For example, you can connect a trigger to any change in the state of a document (before / after insert / update / delete) and execute additional business logic at these points. To do this, just implement the com.bagri.xdm.cache.api.DocumentTrigger interface, as shown in one of the examples supplied with the system (see samples / bagri-samples-ext):
And then register the trigger in the circuit in the config.xml file:
As shown above, we registered the library (bagri-samples-ext-1.0.0-EA1.jar) containing the trigger implementation. Libraries can also contain additional functions written in Java, which can be called from XQuery queries, as well as extensions for processing new data formats or connecting to external document storage systems.
Bagri can be deployed in the following ways:
Bagri's visual administrative interface is currently implemented as a plug-in for VisualVM and allows you to:
This module is still under active development, the functionality of the plugin is constantly growing. In case of detection of errors, as well as in the case of proposals for missing functionality, they can always (and should be!) Be included in the issues of the project.
Screenshots of the administrative console - see below.
So, we examined the most basic features of a distributed Bagri document database. We hope that you are interested in this project and you will try to use it in your daily work.
For my part, in the near future I’ll try to write some more articles for you that cover the topics of comparing Bagri with other similar products, such as BaseX, MongoDB, Cassandra and the possibility of expanding the system by using the built-in APIs (DataFormat API and DataStore API).
Bagri, like any other open source project, requires Java developers interested in this topic, so if after reading this article you will be interested in the project, welcome to Bagri Github , there are many really interesting tasks.
When does it make sense to use Bagri
It is good to use the system first of all in those cases when the workflow is based on XML. These are finance, logistics, insurance, medicine, and other industries where the format of documents exchanged between participants is strictly defined by corporate XSD schemes. The system allows you to not parse every incoming document, but put it in the database as is, and then effectively execute any queries on stored documents using the powerful XQuery 3.1 toolkit.
Bagri is built on top of distributed cache products such as Hazelcast, Coherence, Infinispan and other similar systems. It is because of the distributed cache capabilities that Bagri supports the requirements of the corporate sector right out of the box. The distributed cache is used by the system not only as a data warehouse, but also as a distributed processing system for this data, which allows you to efficiently and quickly process any large amounts of loosely structured data. Transaction in the system is solved using an algorithm that implements multi-version concurrency control
Data is delivered to the system as XML or JSON documents. There is also the opportunity to implement your extension to Bagri and register a plug-in for working with new document formats, as well as with external document storage systems. The auxiliary project bagri-extensions contains the extensions developed by the team (the connector to MongoDB is currently implemented). XQuery is
used as a query language , in the future it is planned to also support SQL syntax, this task is available in the project github.
Bagri does not require prior knowledge of the data scheme, but forms a dictionary of data (unique paths in document structures) “on the fly” while parsing incoming documents. T.O. Bagri is completely schemaless and does not need to rebuild tables for new types of documents, i.e. he does not need create table / add column commands in principle.
For communication between the client and the server, Bagri offers two APIs: the standard XQJ API declared in JSR 225 and its own XDM API, which provides additional functionality that is not available in XQJ. In fact, XQJ interfaces are analogous to the functionality provided by the JDBC driver when working with relational databases. Together with the XQJ driver, the official XQJ TCK is delivered with the system, which you can run and make sure that the driver passes all XQJ tests 100%.
How distributed cache functionality is used in Bagri
All documents in Bagri are stored in schemes, the closest analogue in the relational database (RDBMS) is the database. Currently, Hazelcast is used as the distributed cache on which the system is built, and a separate Hazelcast cluster is allocated for each scheme. Schemes exist independently of each other, i.e. there is no “struggle” for resources between schemes (in Hazelcast, each cluster is configured separately and has its own resource pool).
Document meta data (namespaces, document types, unique paths) are stored in the appropriate caches and are replicated between all cluster nodes. T.O. access to read metadata on work nodes is as fast as possible. The data of the documents themselves is separated from the metadata and stored in distributed caches, while the data related to one document is always stored on the same node. There are also caches for storing indexed values, for compiled queries, for a transaction log, and of course for the results of executed queries.
The Bagri client connects to the server using the internal mechanisms of the client cache software. XQuery client requests are packaged into jobs and executed on server nodes through the distributed ExecutorService provided by the Hazelcast platform.
Results are returned to the client through a dedicated asynchronous channel (Hazelcast queue)
System configuration
The entire system configuration is stored in two files: access.xml for setting roles and users, and config.xml with settings for schemes and Bagri extensions. A detailed description of the format of these files and all the parameters used in them can be found in the installation and configuration instructions of the system . You can change the schema settings directly in files, or through JMX interfaces for managing schemas deployed on the Bagri administrative server.
Data Examples
Let's move on from theory to practice and see how we can work with Bagri from our Java code through XQJ interfaces.
Inside the spring context, declare the BagriXQDataSource bin and configure its four main parameters: remote server address, schema name, username and password.
${schema.address} ${schema.name} ${schema.user} ${schema.password} context = new ClassPathXmlApplicationContext("spring/xqj-client-context.xml");
XQConnection xqc = context.getBean(XQConnection.class);
Then we read the text file and based on it we create a new document in Bagri:
String content = readTextFile(fileName);
String query = "declare namespace bgdm=\"http://bagridb.com/bagri-xdm\";\n" +
"declare variable $uri external;\n" +
"declare variable $content external;\n" +
"declare variable $props external;\n" +
"let $id := bgdm:store-document($uri, $content, $props)\n" +
"return $id\n";
XQPreparedExpression xqpe = xqc.prepareExpression(query);
xqpe.bindString(new QName("uri"), fileName, xqc.createAtomicType(XQBASETYPE_ANYURI));
xqpe.bindString(new QName("content"), content, xqc.createAtomicType(XQBASETYPE_STRING));
List props = new ArrayList<>(2);
props.add(“xdm.document.data.format=xml"); //can be “json” or something else..
xqpe.bindSequence(new QName("props"), xqConn.createSequence(props.iterator()));
XQSequence xqs = xqpe.executeQuery();
xqs.next();
long id = xqs.getLong();
The request above calls the external store-document function defined in the bgdm namespace. The function accepts 3 parameters as input: uri, under which the document will be saved, the text content of the document and an optional set of options that defines additional parameters of the document saving function. The request is validated on the client side and then sent to the server along with the parameters.
On the server side, the incoming document is assigned a unique identifier. Further, the contents of the document are parsed in accordance with the specified format of the document and divided into pairs path / values, while all unique paths are stored in the replicated directory of document paths. At the end of the parsing procedure, all the contents of the document, divided into such pairs, are stored in the distributed system caches, and the document header with service information is also cached. If indexes are registered in the schema, all indexed values are also stored in the index cache. Confirmation of the successful saving of the document is sent back to the client side.
After we safely saved our document in Bagri, let's look at how we can make requests to documents stored in the system.
Get the XQJ connection:
XQConnection xqc = context.getBean(XQConnection.class);
Prepare the XQuery query:
String query = "declare namespace s=\"http://tpox-benchmark.com/security\";\n" +
"declare variable $sym external;\n" +
"for $sec in fn:collection(\“securities\")/s:Security\n" +
"where $sec/s:Symbol=$sym\n" +
"return $sec\n";
XQPreparedExpression xqpe = xqc.prepareExpression(query);
Set the value of the search parameter:
xqpe.bindString(new QName("sym"), “IBM”, null);
We execute the request on the server:
XQResultSequence xqs = xqpe.executeQuery();
And look through the results:
while (xqs.next()) {
System.out.println(xqs.getItemAsString(null));
}
I think it will be just as interesting to talk about what happens on the server when this code is executed: The
query passes through the XQuery processor (currently it is Saxon ), which forms the query execution tree (compiled query, XQueryExpression). Then it is translated into a set of simple requests for cached data along the specified paths:
[PathExpression [path=/ns2:Security/ns2:Symbol/text(), param=var0, docType=2, compType=EQ]], params={var0=IBM}
These simple requests are executed in parallel on all nodes of the distributed system cache. Documents found are delivered to the processor for further processing. If possible, indexes are used if the requested paths have been indexed. After the final processing of the received documents by the processor, the results are transmitted to the client back through a dedicated asynchronous channel.
System Expansion Options
Bagri provides rich opportunities to expand the behavior of the system. For example, you can connect a trigger to any change in the state of a document (before / after insert / update / delete) and execute additional business logic at these points. To do this, just implement the com.bagri.xdm.cache.api.DocumentTrigger interface, as shown in one of the examples supplied with the system (see samples / bagri-samples-ext):
public class SampleTrigger implements DocumentTrigger {
private static final transient Logger logger = LoggerFactory.getLogger(SampleTrigger.class);
public void beforeInsert(Document doc, SchemaRepository repo) {
logger.trace("beforeInsert; doc: {}; repo: {}", doc, repo);
}
public void afterInsert(Document doc, SchemaRepository repo) {
logger.trace("afterInsert; doc: {}; repo: {}", doc, repo);
}
public void beforeUpdate(Document doc, SchemaRepository repo) {
logger.trace("beforeUpdate; doc: {}; repo: {}", doc, repo);
}
public void afterUpdate(Document doc, SchemaRepository repo) {
logger.trace("afterUpdate; doc: {}; repo: {}", doc, repo);
}
public void beforeDelete(Document doc, SchemaRepository repo) {
logger.trace("beforeDelete; doc: {}; repo: {}", doc, repo);
}
public void afterDelete(Document doc, SchemaRepository repo) {
logger.trace("afterDelete; doc: {}; repo: {}", doc, repo);
}
}
And then register the trigger in the circuit in the config.xml file:
1 2016-09-01T15:00:58.096+04:00 admin sample schema
………
1 2016-09-01T15:00:58.096+04:00 admin /{http://tpox-benchmark.com/security}Security false true 1 trigger_library com.bagri.samples.ext.SampleTrigger 1 2016-09-01T15:00:58.096+04:00 admin bagri-samples-ext-1.0.0-EA1.jar Sample extension trigger Library true
As shown above, we registered the library (bagri-samples-ext-1.0.0-EA1.jar) containing the trigger implementation. Libraries can also contain additional functions written in Java, which can be called from XQuery queries, as well as extensions for processing new data formats or connecting to external document storage systems.
System Deployment Options
Bagri can be deployed in the following ways:
- Standalone Java app - Suitable for small applications that need to process a limited set of data. Everything works within a single JVM with a single circuit, providing maximum performance on limited (memory of one JVM) data volumes.
- Client-server, distributed database - clients communicate with the distributed storage system through the XDM / XQJ driver. Distributed processing of requests in memory, the possibility of on-line processing of an unlimited amount of data
- Administration server - provides additional functionality for collecting statistics and monitoring the status of the working nodes of the system. It is usually deployed as a separate node, but this is not a required component; the system can work without it.
Visual administrative interface
Bagri's visual administrative interface is currently implemented as a plug-in for VisualVM and allows you to:
- configure users and roles
- Connect external Java function libraries and additional XQuery modules for use in queries
- configure schemes and their components: collections, meta-data dictionaries, indexes and triggers, user and role access to schemes
- visually view documents and collections
- execute XQuery queries and get results through the built-in console
This module is still under active development, the functionality of the plugin is constantly growing. In case of detection of errors, as well as in the case of proposals for missing functionality, they can always (and should be!) Be included in the issues of the project.
Screenshots of the administrative console - see below.
So, we examined the most basic features of a distributed Bagri document database. We hope that you are interested in this project and you will try to use it in your daily work.
For my part, in the near future I’ll try to write some more articles for you that cover the topics of comparing Bagri with other similar products, such as BaseX, MongoDB, Cassandra and the possibility of expanding the system by using the built-in APIs (DataFormat API and DataStore API).
Bagri, like any other open source project, requires Java developers interested in this topic, so if after reading this article you will be interested in the project, welcome to Bagri Github , there are many really interesting tasks.