aleks_raiden March 20, 2009 at 18:12

March against RDBMS or distributed storage projects (key-value stores)

Do you often create projects? And, probably, everywhere you use a database, in particular, MySQL (and someone and PostgreSQL). But what’s interesting, from experience and just after reading the descriptions of various architectures, it is clear that far from everywhere in the project key features of databases are needed, in many cases the database is used simply as some kind of repository of ordinary data. For example, databases are usually not used in caching systems; moreover, caching is used just in order to avoid unnecessary requests. And what is used most often for caching? Memcached And what is it? It is a hash-based distributed storage system. In general terms, it is simply a repository of key-value pairs, on which only basic operations can be performed - writing, reading, deleting and checking for presence. Yes, yes, there are no filters, samples, sorting, the maximum is a tag system for selecting all related records with a single request. And in many cases, such functionality is quite enough.

I'm by no means a fanatic, and in real projects the best combination would be from a regular, relational database and a specialized data warehouse. More advanced systems that store not just key-value pairs, but also additional meta-information about the object, are already approaching databases in terms of capabilities, they are sometimes called document-oriented databases (repositories), since the unit of information on which work is being done, is the document and its associated data.

The second criterion or feature is the distribution. For a DBMS, this is often solved quite complicatedly or with the help of third-party tools. Data Warehouses Built on DHT ( Distributed Hash Table)) and are initially ready for distributed work, providing scalability and fault tolerance for individual nodes. In some systems, this is solved at the expense of the environment (for example, if the storage runs on top of Erlang VM ), the latter use built-in distributed work tools (for example, JGroups for Java systems), or their own solutions, like Memcached .

Of no less importance is the complete readiness of such systems for working in the Cloud environment; it is not for nothing that such storage works with Amazon (S3 and SimpleDB). The well-known BigTable from Google is also, for the most part, just a system for storing and processing key / value pairs. Due to the simplicity and even triviality of the API (but not always of the internal device, although it is simpler than standard SQL DB), the solutions scale perfectly (both read and write), including dynamically, without interruption . So if you have or will have a cluster, take a look at such decisions. But there is one point worth mentioning - very often such systems work only with storing data in memory, but if permanent storage is required, back-end systems are used, including storage in a regular relational database,

Why can this be applied? Yes, wherever you have a need to store a large (almost unlimited) amount of data that can be divided into separate independent blocks. This can be individual articles, photos, videos or other large binary objects, log entries, user profiles, session data (by the way, we previously announced our experimental open development, Java session server for distributed storage of PHP application sessions, there is a similar solution in Industrial Zend Platform) In most cases, everything is limited to either a set of binary data, or a text string with data or code in serialized form, so you can either use the data further in the processing program, or immediately give it to the client - this is exactly what the Nginx plugin that looks in Memcached does and if there is the requested content, it gives directly, bypassing the appeal to your script in general. Now, for example, I am designing a chat server, there, just as the main data store, a distributed cache will be used (a Java system that uses cache with replication via JGroups), which is essentially the same data store in the form of a key and value.

Ok, enough theory, let's see what storage systems exist on the market (of course, open source).

Project Voldemort is one of the most interesting projects (I plan to tell you more about it). It is written in Java, implemented partitioning and data replication. If permanent storage is required, using BerkleyDB or MySQL, you can add your own storage, the storage system is based on plugins, therefore it is quite simple. Unfortunately, it seems that there is only an API for Java applications (or the use of the Facebook Thrift protocol for other clients). From the data, structured data (arrays), blob (binary data packets) and plain text can be stored. There are certain difficulties with hot scaling; adding new nodes to the cluster is not the easiest action.
Scalaris is an Erlang-based transactional storage system that only works with data in memory, not using persistent storage on disk. For fault tolerance, sharding and replication are used, as well as “non-blocking Paxos commit protocol” (it will be necessary to study in more detail what it is). The server has an API for Java, Erlang and built-in JSON RPC for interacting with other clients. It can scale quite easily at any time (Erlang's platform is perfectly adapted for this).
MemcacheDB - we already wrote earlier about this system , it uses only replication and storage on disk using BerkeleyDB. Probably the simplest of all projects, both in installation and in use, and if we already use the infrastructure based on Memcached, then this system fits perfectly into it.
ThruDB is a project based on the Apache Thrift framework (an open development project for the Thrift protocol developed by Facebook). In fact, this is not even one project, but a whole family of services for building infrastructure (which, in turn, are based on other open source development) - in fact, the storage service itself with backends for MySQL, Amazon S3, BerkeleyDB, as well as a queue service messaging service, scaling service, document storage system and even indexing and data retrieval service (uses CLucene , Java Lucene port in C). There are client libraries for different languages, in principle, the Thrift protocol port is enough. A very interesting solution if you need many of these functions at once.
Apache CouchDB is an Erlang-based document-based storage system that uses the RESTful HTTP / JSON API to interact with clients. For distribution, incremental two-way replication and automatic conflict resolution are used. By the way, it can use JavaScript as a document query language. For stability, data storage on disk (native format) and replication on other nodes are used. Based on the network nature of the protocol, the database can work with any client and platform that can generate a JSON HTTP request, including directly requests from a web page (JS).

The list did not include several more systems - for example, Hadoop HBase , Cassandra , Hypertable , Dynomite , Kai , Ringo .

It is interesting to note that mainly for these types of systems they use either specialized languages and platforms (Erlang is almost out of competition here) or serious systems like Java that have become classics and mainstream, and only in rare cases are based on their own C / C ++ developments.

Developing a high-performance system, not necessarily a web? Do you need specific data storage, while you want to receive it in the simplest way, scale it “up and down” without even stopping for a second? There may be a lot of data, but all of them are simple and come down to strings or serialized structures and binary blocks? Do you need reliable data storage, distributed and fault tolerant? If at least one of these questions answers “yes”, you should look at least a couple of projects from the list, perhaps they will allow your project to withstand the load and develop confidently.

PS The original article that pushed me to write - there is a good comparative table of systems.

Tags:

March against RDBMS or distributed storage projects (key-value stores)

Also popular now: