igor_suhorukov April 3, 2019 at 09:01

H2 evolution - window functions, CTE, JSON / XML in an embedded database

There are open source projects that have become commercially successful mainstream, such as PostgreSQL / Elasticsearch. Others, for example RethinkDB, lost the market and stopped development. And the H2 database project of an embedded database written in java is developing and living in its niche.

To demonstrate the functionality of SonarQube, Jira, Confluence, the H2 database is used at the first start. H2 is the base for running SQL tests in memory in almost any JVM project. There is an example of an application less known to users - this is the use of H2 in a distributed ignite-sql and this is already a production ready script for using an embedded database as part of another solution. Less than a month ago, version 1.4.199 was released in which you can now write fairly complex SQL queries.

In projects, I never relied on H2 as a complete database with saving data to disk. Rather, as a module for transforming data in JVM memory, with good SQL support. But for this application, it was greatly limited by the lack of window functions. And now, after more than half a year since the beginning of the development of the functional , now H2database has caught up with SQLite . And this is due to the great merit of Yevgeny Ryazanov from Irkutsk - I have not seen such a development pace like his before in non-profit open source projects. Also, commits of other Russian-speaking contributors regularly appear in the project repository. And at the moments of release - the founder of the project Thomas Mueller .

H2 has support for recursive queries (CTE) . This is the standard way in SQL to work with hierarchical data in tables and query decomposition (here you can step on the rake of the scheduler). Recursive queries are described in the publication with examples.

To work with poorly structured data, an upcoming implementation of the SQL / JSON standard will appear . In the meantime, for its needs, it has expanded H2 with the help of a custom function XQuery 3.1 processor based on BaseX . The code is available in the github H2XQueryAdapter project . This is a table function that can extract data from XML or JSON format using XQuery and check the type and restriction of not null for the values returned by the function. Further, the full power of SQL expressions is available to transform the result of an XQuery transform into the JVM process memory.

The xquery () table function is overloaded and has two options - with one parameter xQuery query and one with xQuery query and the second - sql query string to form the parameters for xQuery itself.

An approach with such transformations proved to be excellent in a project for processing a petabyte volume of raw data in a biomedical data storage project .

create table xresult (GR VARCHAR(500) not null,AR varchar, VER VARCHAR(50)) as 
 select * from xquery('declare variable $getHeader as xs:boolean external := false(); declare variable $getData as xs:boolean external := true(); 

{
if($getHeader) then( 
    VARCHAR(500) not nullvarcharVARCHAR(50)
 ),
 if($getData) then((
    for $row in doc("http://central.maven.org/maven2/org/springframework/spring-context/5.1.4.RELEASE/spring-context-5.1.4.RELEASE.pom")//*:dependency
    return 
    {$row/*:groupId/text()}{$row/*:artifactId/text()}{$row/*:version/text()}
 ))
}
')

This implementation has limitations on the format of the xquery query.

First, you need to declare two external variables getHeader and getData - this is necessary so that the table function does not cause data transformation several times, discarding unnecessary results in order to get only the names and type of columns. Multiple function calls are features of the H2 base working with java functions returning a list of values.

Secondly, the format of the result should be like for csv serialization

<колонка1>тип данных...<колонкаN>тип данных<колонка1>значение...<колонкаN>значение

Thirdly, you need to declare the data type, dimension and required fields, for example: decimal (20,4) not null

With column data type parsing, I tried to reuse the SQL parser from H2, but it turned out to be so strongly connected with other database objects that it refused to work without creating a database and session. I console myself with the thought that the developers did so to simplify the design of the application and not to make a parser for all occasions for any BNF grammars.

You can run this example in java debug mode. New transformation rules can be developed in the familiar xquery editor or in the open source BaseX GUI .

It is possible to turn almost any java collection or POJO into a virtual H2 table. H2POJOTable Project Codeon github. In the example, by reference, the platform-based MemoryManagerMXBeans turns into a H2 table function. Perhaps the approach will be a little comfort to those who are sad about the lack of LINQ and support for operations on sets in Java.

try (Statement statement = connection.createStatement()) {
    String pojoTableAlias = "create alias MemoryManagerMXBeans as $$ \n" +
            "import java.lang.management.ManagementFactory;\n" +
            "import java.lang.management.MemoryManagerMXBean;\n" +
            "import org.h2.expression.function.pojo.*;\n" +
            "import java.sql.*;\n" +
            "import java.util.Collections;\n" +
            "@CODE\n" +
            "    ResultSet getRuntimeStat(Connection connection) throws Exception{\n" +
            "        return H2PojoAdapter.toTable(connection, new CollectionWraper<>(MemoryManagerMXBean.class," +
            "           ManagementFactory::getMemoryManagerMXBeans, Collections.emptyMap()));\n" +
            "    }\n" +
            "\n$$";
    statement.executeUpdate(pojoTableAlias);
}
try (Statement statement = connection.createStatement()) {
    try (ResultSet resultSet = statement.executeQuery("select * from MemoryManagerMXBeans()")) {
        int columnCount = assertResultSet(resultSet, new String[]{"memoryPoolNames", "name", "valid"});
        assertThat(columnCount).isGreaterThan(1);
    }
}

Sometimes the old functionality in H2 breaks down and is not so demanded by users. As an example, in my work projects, programs read data from the AWS S3 URL. Therefore, I hope that the known error will still be fixed by accepting my pull request . On the way to fixing this error are unstable tests for TLS, which also do not work under Java 11.

H2 allows you to use the ODBC PostgreSQL driver emulating a subset of its network protocol. Which also theoretically allows you to bind it through FDW in PostgreSQL.

In addition to the database itself, the delivery of H2 also includes a minimalist web console with support for auto-completion when editing, servlet or standalone startup option. H2 looks like a “Swiss knife” for developers - a compact and versatile tool if your project already uses the JVM. When trying to use this console with a "crooked" jdbc driver, the DBMS Redshift made its first red-eyed pull request into the project. Noel Grandin , one of the project participants, helped me with the code review and accepted the corrections.

If you need an analogue of Berkeley DB Java Edition - the project has MVStore- persistent storage for key-value data and part-time MVCC "engine" by default in the latest versions of the database. It is striking that the database even has basic support for geofunctions and full-text search.

Thanks to H2database contributors, everyone uses this database and reports errors! H2 database has been developing since 2005, and now supports window functions, recursive queries, it is one of the most powerful SQL "engines" for processing data in the JVM memory and is expanded by table functions for working with loosely structured data.

Tags:

H2 evolution - window functions, CTE, JSON / XML in an embedded database

Also popular now: