b441berith November 29, 2011 at 11:44

Key points of the HighLoad ++ 2011 conference

In October 2011, Moscow hosted the annual conference of developers of highly loaded HighLoad ++ projects.
I decided to share with the readers the main points from the conference. Since all information is open and available on the conference page , I decided that putting all the theses together would not be such a bad idea. I note right away that the report does not contain detailed information about each report - only key points are touched.
So what was said at HighLoad ++ 2011.

Designing a cloud-based web service “in an adult way” / Sergey Ryzhikov (1C-Bitrix)

1C has three data centers located in Russia, the USA and Europe. Loss of communication between data centers can take hours. Asynchronous master replication is configured between database servers.
All architecture is built from Amazon Web Services.
For static content, Amazon S3 is used. Among other things, the advantage is the low price of such a solution compared to EBS.
Used by Elastic Cloud Balancing, CloudWatch and Auto Scaling.
Cars with a DB - on EC2. Each with 17.1Gb RAM. Software RAID arrays are built from EBS disks. RAID-10 is selected as the fastest and most reliable.
Used by InnoDB.
Backups are done using snapshots in EC2 using FS freeze.
Amazon RBS is not used for the following reasons:

There is no full root in the database
Not transparent
Risk of long downtime

Odnoklassniki binary data warehouse architecture / Alexander Khristoforov, Oleg Anastasiev (Odnoklassniki)

The possibility of using one-db was initially evaluated. There were the following restrictions:

Poor performance
Long backups (up to 17 hours)

HDFS, Cassandra, Voldemort, Krati were also evaluated.
The current solution implements a zookeeper cluster and a storage cluster.
At each node of the storage cluster, there are N stores. In each of them data segments (256 Mb) and a RAID1 log array are located on a separate disk. IO processing is handled by the NIO Socket Server (Mina).
Reservation occurs using xfs_io.
The index uses a hash table based on a regular array. Stored directly in direct memory.
At startup, data integrity is checked. Logs are synchronized and cleaned as needed.
Replication Factor - 3.
Routing is based on partitioning. The hash ID is calculated and the remainder of dividing by N storages is calculated. The calculated partition value is searched in the routing table and the data disk is searched.
The concept of regions is used. No data movement is required for expansion.
Zookeeper is used for coordination. Its advantage is reliability and performance. Zookeeper stores the following data:

Available servers and their addresses
Drive locations and statuses
Routing tables
Distributed lock

The system is deployed on 24 servers .: 20 - storage, 2 - logs, 2 - backup.

Why you should not use MongoDB / Sergey Tulentsev

Key points of the report:

Map / Reduce slow and single threaded
Each operation in map / reduce imposes read or write lock
Memory Mapped Files Problem - Poor memory management in the system
Not very convenient when all shards are equal. Built-in auto-sharding is poorly configured

However, the author says that you can use MongoDB by default. Until there is a question about the speed of processing individual queries and the need for relational databases.

For the first time in RuNet: the tale of 100M letters per day / Andrey Sas (Badoo)

Used asynchronous send. The sending queue is implemented on the basis of files - native tools, logging, ease of reading / writing.
Instead of sendmail, an SSMTP client is used.
In-memory is not used for fault tolerance (fear of losing letters).
Implemented a local DNS query cache, increased the number of DNS and SMTP workers.

A large book of recipes or frequently asked questions on the management of complex systems / Alexander Titov, Igor Kurochkin (Skype)

It is suggested to use the Cobbler tool. The tool supports a wide range of linux distributions, has convenient interaction mechanisms.
For configuration management - Chef.
Monitoring - Zabbix API.
Backups of statistics, databases and repositories.

Designing large-scale data collection applications / Josh Berkus

It is recommended to use the Mozilla Soccoro project to collect statistics about falls. To store information, hbase is used as the most scalable solution. At the same time, the data itself is stored in hbase (40 TB on 30 nodes). For storage of metadata (500 GB) 2 PostgreSQL servers are responsible. 6 servers are engaged in load balancing.
As a tool for configuration management - puppet.

12 Redis use cases - in Tarantool / Alexander Kalendarev, Konstantin Osipov (Mail.ru)

Tarantool - NoSQL DBMS for storing the hottest data. Tarantool stores all data in RAM and logs all changes to disk, thus guaranteeing data reliability in the event of a failure. Storing data in memory allows you to perform queries with maximum performance - 200-300k queries per second on conventional modern equipment.
Scaling is proposed to be done on the basis of tarantool proxy and shards.
Soon tarantool will also acquire the following features:

Transaction support
Master replication master
Cluster manager
Load balancer

Thus, mail.ru representatives speak of Tarantool as a replacement for Redis. Losing in productivity ~ by 5%.

Secrets of garbage collection in Java / Alexey Ragozin

Counting links in memory is not the best tool. It doesn’t save you from cyclic graphs, it doesn’t work well with multithreading. It is also 15-30% of the CPU load.
According to the generational hypothesis, most objects die young. As long as they are, they are referenced by a small number of other objects. Thus, by separating the storage of “young” and “old” objects, we can achieve increased productivity.
For JVMs like HotSpot, there are many keys. Information on the possibility of using keys is contained in Alexey's blog .
It is recommended that you use the Concurrent Mark Sweep GC for pause sensitive garbage collection applications. Included in particular: -XX: + ExplicitGCInvokesConcurrent
CMS often collect garbage right during application operation: objects of the “new” generation are collected in stop-the-world mode (a fairly quick operation), while the “old” generation is collected in parallel and for a long time. Accordingly, the application must satisfy the conditions of the generation hypothesis.
As a result, pauses can reach no more than 150 ms for a 32 GB heap.
As an option, use off-heap memory. But this is already a much more difficult task.

Apache Cassandra - another NoSQL storage / Vladimir Klimontovich

Apache Cassandra = Amazon Dynamo + Google Bigtable.
The technology of data partitioning based on the Token Ring topology is used. Replication is also based on this topology. In this, Cassandra is similar to Amazon Dynamo.
The key / value data model is taken from Bigtable. Complex queries are available, indexes (a useless thing).
LIFO request caching mechanism.
The easiest way to scale is 2 times. Then the ranges of the partition segments are simply divided by 2. The
nodes communicate with each other based on the Thrift protocol.

AJAX Layout / Oleg Illarionov (VK)

The page is divided into several iframes that send AJAX requests independently.
HTML5 is actively used, in particular - local storage.
In parallel, statics and content are connected.
Pages are cached. They use their own stubs to work with the History API of the browser. When going back - the tree is removed from the DOM, environment variables are copied.
To quickly search for content, the search takes place on the client.

Building a cloud storage for storing and delivering static content based on the integration of Nginx and Openstack Swift / Stanislav Bogatyrev, Nikolay Dvas (Clodo)

For use in the clouds, object storage such as Amazon S3 or Rackspace Cloud Files seems to be the right solution. Clodo cloud storage is built using Swift technology (the underlying Rackspace Cloud Files) and is aimed primarily at storing and distributing content for websites - the main emphasis in building it is made specifically for this application - unlike the more general S3 and Cloud Files The Swift repository works slowly when distributing content to a large number of clients. Therefore, nginx was selected as a frontend, modified in two aspects:
a multi-zone cache was added (it saves 40% of disk space on expensive disks used for caching);
added a mechanism for managing both object-by-volume and pokonteinerny cache cleaning when using a frontend cluster with independent nginx on each.

PS I hope that the article will not cause a negative reaction due to its nature of the description (extremely concise information, abstracts from slides and speeches of speakers). The article does not describe all the reports, on the official website you can find information about all the missing speeches.

PPS Thanks to Oleg Bunin and Ontiko representatives for organizing HighLoad ++, we look forward to the next conference in 2012!

Tags: