Popular open source - part one: 3 tools for working with data

    We decided to prepare a series of digests with reviews of the most popular open source projects. The list includes the most open source solutions discussed at Hacker News. The theme of the first collection is tools and services for working with logs and databases .

    / photo AKT.UZ PD

    We will talk about solutions such as FoundationDB , LogDevice and Queryparser . Last year they were actively discussed on Hacker News. The interest was due to the fact that large IT companies - Apple, Uber and Facebook - were involved in their development. This means that all three tools are suitable for working with large-scale and high-loaded IT infrastructure.


    FoundationDB is a multi-model DBMS that is of type NoSQL . She presented in 2012, three engineers from the company Visual Sciences, who worked on the data visualization platform (today it is part of Adobe Analytics).

    Unlike other similar systems, operations in FoundationDB are consistent with the principles of ACID : atomicity, consistency, isolation, and durability of data. DBMSs that adhere to this model are considered the most reliable and predictable, but in NoSQL, some ACID principles are often sacrificed for greater performance.

    Another advantage of FoundationDB is a powerful low-level interface. With it, any system can use a DBMS for distributed data storage. For example, on the basis of FoundationDB, you can build frontends for larger universal database management systems.

    Thanks to these features, FoundationDB quickly became popular. It was implemented by several cloud services: Wavefront monitoring service (now part of VMware) and Snowflake and SkuVault data storage systems. The popularity of FoundationDB was influenced by the fact that since the beginning of its existence, the source code of the project has been open.

    Everything changed in 2015 when Apple acquired the company . The IT giant closed access to the FoundationDB code and began using the DBMS in its own online services. This solution posed some problems for developers who used FoundationDB in their projects. But in April 2018, Apple decided to return the open source DBMS "under the wing". This has benefited not only the IT community, but also Apple itself. Over two weeks , more than seven thousand developers showed interest in the project , and a hundred new threads were opened at the thematic forum .

    IT giant decided to continue to adhere to the strategy of "openness". In November 2018 was presentedA new component of the DBMS - Document Layer - it allows you to create document repositories. In the future, it is planned to develop additional tools. And anyone can contribute to the creation of the product. You can learn how to do this in the official repository on GitHub - there is a detailed instruction .


    LogDevice is a distributed log storage system that was created on Facebook. It is optimized for recording sequentially incoming data: any information in the system is not stored as a separate file, but as part of a certain “stream of records”. This allows you to accurately determine the order of the data.

    Initially, the project was used for internal tasks of Facebook, but in September 2018, the company opened its source code. Up to this point, LogDevice was not so well known in the IT community, but some of the readers of Hacker News are already interested in the tool. For example, noted its potential in data storage systems for machine learning.

    But it is believed that the popularity of the decision of Facebook will acquire slowly. There are a large number of similar tools on the market (for example, Apache Kafka). And they already have a large number of integrations, and the LogDevice is just to be acquired by them. By the way, now the developers of the tool are working on introducing LogDevice integration with the container orchestration system Kubernetes.

    Everyone is welcome to participate - the requirements for the code are described in a separate repository document on GitHub .

    / photo Alexander Day CC BY


    Queryparser is a parsing system for three SQL dialects: Vertica, Hive and Presto. Like LogDevice, Queryparser was originally created for the internal tasks of a large IT company - this time the project originated in Uber.

    In 2015, the company's engineers decided to update the object name system in the databases and replace the names in the integer format with numbers according to the UUID standard . To rewrite all identifiers, engineers needed to identify all references in the tables. This turned out to be a difficult task: Tens of thousands of data tables that belonged to different departments of the company were stored in Uber. To establish links between multiple databases, the developers created Queryparser.

    The tool successfully coped with the task, but engineers found other possible uses for it. For example, automatic monitoring of changes in databases. Queryparser saves all queries about merging data streams or creating new ones and notifies database users who are affected by these changes.

    Queryparser also helped Uber collect statistics about SQL queries and optimize storage: rarely used tables were deleted, and the databases that often referred to each other were combined.

    The source code Queryparser is open from the very beginning of the project, but only in 2018 Uber released a detailed article about the tool. It can be considered as a guide to working with the system. And in the repository you can find installation instructions and instructions for those who want to participate in the development of Queryparser.

    In the future, Uber plans to develop the solution further. For example, add support for new SQL dialects: PostgreSQL, MySQL and SQLite. Also among the company's objectives is to add data type checking in queries and transfer queries from one dialect to another.

    Next time we will continue the story of the popular open source projects of 2018. Let's talk about open cloud management solutions and developer tools.

    A couple of posts from the First Corporate IaaS blog:

    What we write about in the Telegram-channel:

    Also popular now: