How FriendFeed uses MySQL to store data without a schema

Transfer

Conditions

We use MySQL to store any FriendFeed data . Our database is growing with the number of users. We now have over 250 million entries, these are user posts (posts), comments, ratings (“likes”).

As the database grew, from time to time we dealt with scalability problems. We solved problems in standard ways: slave servers used for read-only, memcache for increasing read throughput and partitioning for increasing write throughput. However, as they grow, the scalability methods used have made it difficult to add new functionality.

In particular, changing the database schema or adding indexes to the existing 10-20 million records led to a complete server lock for several hours. Removing old indexes took time, and not deleting hit performance, as the database continued to use them on every INSERT. There are complex procedures with which you can get around these problems (for example, creating a new index on a slave server, and then exchanging master'a and slave places), however, these procedures are so difficult and dangerous that they completely deprived us of the desire to add something new requiring a change in schema or index. And since our databases are highly distributed, MySQL relational things like JOINs never worked for us. Then we decided to look for a solution to the problems that lies outside of relational databases.

There are many projects designed to solve the problem of storing data with a flexible scheme and building indexes on the fly (for example, CouchDB ). However, apparently, none of them are used by large sites. In the tests we read about and ran ourselves, none of the projects showed themselves to be stable, mature enough for our purposes (see this somewhat outdated article on CouchDB , for example). And all this time, MySQL worked. He did not spoil the data. Replication worked. We have already sufficiently understood all of its bottlenecks. We liked MySQL precisely as storage, outside of relational templates.

After weighing everything, we decided to create a data storage system without a scheme on top of MySQL, instead of using a completely new solution. In this article I will try to describe the main details of the system. We are also curious how other sites solved these problems. Well, we think that our work will be useful to other developers.

Introduction

Our database stores data without a scheme in the form of many fields (for example, JSON objects or dictionaries (dictionary) in Python). The only required field is id, a 16-byte UUID. The rest should not be important for our storage, that's why it is created. We “modify” the scheme by simply adding new fields.

We will index the records data and save the index in a separate MySQL table. If we want to index 3 fields of each record, we get 3 MySQL tables - one for each index. If we no longer need an index, we stop writing to the index table, and we can delete the table if desired. If a new index is required, we create a new MySQL table for it and start the asynchronous process to populate the index without interrupting the rest of the tasks.

As a result, we get more tables than before, but adding and removing indexes has been simplified. We have seriously optimized the process for populating indexes (which we called “The Cleaner”) so that it creates indexes quickly without disrupting the site. Now we can add new properties and index for days, not weeks. Also, the exchange of master for slave or other dangerous operations is no longer required.

Details

In MySQL, our records are stored as follows:

Copy Source | Copy HTMLCREATE TABLE entities ( added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, id BINARY(16) NOT NULL, updated TIMESTAMP NOT NULL, body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;

The added_id column is needed because InnoDB physically stores the data in primary key order. AUTO_INCREMENT keys ensure that new records are written to the hard drive after the old ones, which helps both reading and writing (access to new records usually happens more often than to old records, therefore FriendFeed pages are sorted in reverse chronological order). The body of the record is stored as a compressed (zlib) Python- pickled dictionary.

Indexes are stored in separate tables. For the new index, we create a table with the attributes by which we want to search. For example, an entry in FriendFeed looks something like this:

Copy Source | Copy HTML{ "id": "71f0c4d2291844cca2df6f486e96e37c", "user_id": "f48b0440ca0c4f66991c4d5f6a078eaf", "feed_id": "f48b0440ca0c4f66991c4d5f6a078eaf", "title": "We just launched a new backend system for FriendFeed!", "link": "http://friendfeed.com/e/71f0c4d2-2918-44cc-a2df-6f486e96e37c", "published": 1235697046, "updated": 1235697046, }

We want to index the user_id field to display all the entries that the user has done. Our index table is as follows:

Copy Source | Copy HTMLCREATE TABLE index_user_id ( user_id BINARY(16) NOT NULL, entity_id BINARY(16) NOT NULL UNIQUE, PRIMARY KEY (user_id, entity_id) ) ENGINE=InnoDB;

Our library automatically creates indexes. To start our repository, which saves such records with the index described above, we write (in Python):

Copy Source | Copy HTMLuser_id_index = friendfeed.datastore.Index( table="index_user_id", properties=["user_id"], shard_on="user_id") datastore = friendfeed.datastore.DataStore( mysql_shards=["127.0.0.1:3306", "127.0.0.1:3307"], indexes=[user_id_index]) new_entity = { "id": binascii.a2b_hex("71f0c4d2291844cca2df6f486e96e37c"), "user_id": binascii.a2b_hex("f48b0440ca0c4f66991c4d5f6a078eaf"), "feed_id": binascii.a2b_hex("f48b0440ca0c4f66991c4d5f6a078eaf"), "title": u"We just launched a new backend system for FriendFeed!", "link": u"http://friendfeed.com/e/71f0c4d2-2918-44cc-a2df-6f486e96e37c", "published": 1235697046, "updated": 1235697046, } datastore.put(new_entity) entity = datastore.get(binascii.a2b_hex("71f0c4d2291844cca2df6f486e96e37c")) entity = user_id_index.get_all(datastore, user_id=binascii.a2b_hex("f48b0440ca0c4f66991c4d5f6a078eaf"))

The Index class looks at the user_id field in all records and automatically creates an index in the index_user_id table. Since our database is sharding, the shard_on argument is used to determine in which segment the index will be stored (in our case entity ["user_id"]% num_shards)

To execute a query using the created index, an object of the Index class is used (see user_id_index. get_all). The “storage” algorithm makes the “join” of the index_user_id tables and the table with records, first, going over all the index_user_id tables on all database segments to obtain a list of record IDs and then retrieving these records from the entities table.

To create a new index, for example, using the link attribute, we will create a table:

Copy Source | Copy HTMLCREATE TABLE index_link ( link VARCHAR(735) NOT NULL, entity_id BINARY(16) NOT NULL UNIQUE, PRIMARY KEY (link, entity_id) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The inclusion code for the new index will be:

Copy Source | Copy HTMLuser_id_index = friendfeed.datastore.Index( table="index_user_id", properties=["user_id"], shard_on="user_id") link_index = friendfeed.datastore.Index( table="index_link", properties=["link"], shard_on="link") datastore = friendfeed.datastore.DataStore( mysql_shards=["127.0.0.1:3306", "127.0.0.1:3307"], indexes=[user_id_index, link_index])

We can also populate the index asynchronously (even during actual operation) using the process:

./rundatastorecleaner.py --index = index_link

Consistency and atomicity

Due to the fact that the database is segmented, the index for a particular record can be on different segments. What happens if the process terminates unexpectedly before it writes all the indexes on the tables?

The most ambitious FriendFeed engineers believed that transactions were necessary in this situation. However, we wanted to keep our system as simple as possible. We decided to relax the restrictions:

A set of attributes stored in the main table of records - canonical
An index may return inappropriate entries.

As a result, we create a new record in the following order:

We save the record to the main table using the ACID guarantees of InnoDB (Atomicity, Consistency, Isolation, Durability).
We save indexes in all index tables on all segments

When we read from the index of tables, we know that the result may be inaccurate (that is, the result may contain unnecessary objects if the record was not completed in step 2). To make sure that we return the correct records, we re-filter the result obtained from the table index:

We read entity_id from all index tables participating in request
We read all records by received id
We filter (in Python) all records that do not fit the query criteria.

To correct indexes, the “Cleaner” process was created, which was mentioned earlier. It starts on the table of records, writing down the missing indexes, deleting the old ones and correcting the incorrect ones. He begins with new records, in practice, all inaccuracies are corrected very quickly (within a few seconds).

Performance

We have optimized our primary keys a little in the new system and are pleased with the result. Below is the diagram of the delays before the FriendFeed page was submitted for the last month (we launched a new backend a few days ago):

In particular, the delay in our system is stable, even during peaks in the middle of the day. Below is the diagram for the last 24 hours:

Compare with the delays a week ago:

It has become much easier to work with the new system. We have already changed the indexes several times on the working system, and now we are starting the migration of our main tables in order to move on.

How FriendFeed uses MySQL to store schema-less data , by Bret Taylor • February 27, 2009
I also recommend reading the discussion of this article on the popular mysqlperformanceblog.com

Tags: