Airpal: SQL Web Application
The other day, we released our new tool, Airpal. This is a web-based database application designed to complement Facebook’s PrestoDB when analyzing information. And in this post we would like to talk about its capabilities and features.
Airpal is an open source project. A distinctive feature of the application is that it was originally created with an eye on users who are not technical specialists. However, we wanted to give them a tool that allows them to fully work with data arrays, and at the same time help professionals increase the effectiveness of the analysis.
People who regularly engage in sampling and processing data from SQL know that these processes are not always conveniently organized. It is necessary to remember how the requests were formulated, copy and paste them into the command line, launch numerous terminals - all this slows down the work and is not particularly pleasant. In addition, if the team has newcomers, then for them the learning curve can be pretty steep. And we decided that all these annoying problems could be solved with a good UI.
Airpal works in conjunction with PrestoDB, an open SQL query engine for working with big data. At the same time, not only programmers and data processing specialists can work with Presto in this case, but also employees of other professions. In some cases, such a tandem can serve as an inexpensive alternative to data storage. In addition, Airpal can act as a user-friendly interface for accessing large amounts of data stored in a Hadoop cluster.
Airpal allowed us to make access to big data much more democratic. The application has been used inside our company for about a year now, and about a third of all employees work with databases through this tool. How many bookings were made per day? How many people over the past night spent in places found through our service? How many offers are available in a particular region, city, district? How many users speak German or Spanish? Today, in the form of Hive tables in HDFS , we store about 1.5 petabytes of data - 25 million bookings in 34,000 cities around the world. And with only a relatively small number of important, key tables, we can use Presto as the default request handler.
Previously, we used the Redshift web service to work with the database . But for a number of reasons, he did not satisfy us: he required the installation of a set of ETL tools (extract, transform, load) for preliminary data preparation, he had a limit on the number of simultaneously executed requests. In addition, Redshift has very little error messages, which made debugging very difficult.
In terms of arbitrary queries and iteration during analysis, Presto is much smarter and faster than traditional MapReduce-tools. However, for us, the most useful consequence of implementing Presto was the lack of the need to complicate the process of “interactive” query generation. Since we work with our own Hive storage, we can afford to have a “single source of reliable data”, without heavy copies at a separate storage level. And the fact that we do not need to change the RC format in which the data is stored made Presto the perfect choice for our infrastructure. Key features of Airpal:
- Flexible user access control.
- Ability to search the tables and the tables themselves.
- View metadata, sections, patterns, and sample strings.
- Convenient query creation editor.
- View query execution status.
- Saving the history of all requests with the ability to search on it.
- Access to the interface from a regular web browser.
- Return the results as CSV files.
- Saving queries for future reuse.
- Using Dropwizard to provide a REST service in Java.
- To deliver messages from the server to the client, SSE (Server Sent Events) is used.
Trying to preserve the Presto ideology, we tried to simplify the installation of Airpal so that it can be tested without any difficulties. Details can be found on GitHub .