
Scalability of relational databases
- Transfer
Q:
Facebook uses MySQL knowing that it does not scale well (or is there some special magic here?). I wanted to ask, for what reasons did they choose MySQL? Do JOINs Use? And do you plan to switch to another database?
A:
Answers Adam D'Angelo , a former CTO of Facebook, now he is developing his startup Quora :
- If you break down data on different servers at the application level, then MySQL scalability is not such a big problem. For 2008, on Facebook [1] we had 1800 MySQL servers for which only two administrators were required. Of course, you cannot make a JOIN with data from different servers, but NoSQL databases will not allow you to do this either. There is no evidence that Facebook uses Cassandra as the main repository, and it seems that the only reason it is needed there is to search for incoming messages. [2]
- In fact, distributed databases like Cassandra, MongoDB, and CouchDB [3] are not very scalable or stable. For example, Twitter guys have been trying to switch from MySQL to Cassandr for a whole year. Of course, if someone talks about how he used any of these databases as the main storage for 1000 cars during the year, then I will change my mind.
- It’s a bad idea to risk your core base for a new technology. It will be a disaster to lose or ruin the base, and you may not be able to restore everything. In addition, if you are not a developer of one of these newfangled databases and one of the few who use them in combat mode, you can only pray that the developer will correct errors and scalability problems as they become available.
- In fact, you can go very far on a single MySQL without worrying about splitting data at the application level. You can easily "scale" the server into a bunch of cores and tons of RAM, well, do not forget about replication. In addition, if the server has a memchached layer (which simply scales), the only thing your database does is write new data. And for storing large objects, you can use S3 or any other distributed hash table. Therefore, while you are sure that you can scale the base as it grows, you don’t need to shoulder the burden of making the database scalable an order of magnitude more than it really needs.
- Most problems arise when you try to split data across a large number of servers yourself. But you can use an intermediate layer between the base, which is responsible for this kind of splitting, which, in fact, was done in FriendFeed. [4]
- I believe that the relational model is the right way to structure data in most applications where users create content. Schemes allow you to contain data in a certain form as new versions of the service are developed, they also serve as documentation and allow you to avoid heaps of errors. SQL also allows you to process data as needed, rather than receiving tons of raw information, which then still needs to be further processed in the application. I think that all the hype surrounding NoSQL will end immediately, as someone finally develops a distributed relational database with free semantics.
References:
[1] Facebook Now Running 10,000 Web Servers
[2] What portions of Facebook use Cassandra today?
[3] How scalable is CouchDB in practice, not just in theory?
[4] How FriendFeed uses MySQL to store schema-less data