NightWind November 9, 2011 at 10:40

Yac 2011: Technical Report

Oh, time, and again,
Yes, and yet another time ...

Not so long ago, the Yandex YaC 2011 conference ended, and now that the speeches have become available, I want to present you a technical report on her visit. In the report, I focused on the information that you can get by looking at the record of a report and decide whether to spend time on it. For some topics, he added additional links to key resources, and also, based on communication with the authors, described the devices of two Yandex NoSQL technologies: Elliptics Network and email storage in Yandex mail.

So, Yac 2011, as it were.

Introduction

The rays of the passing summer determined the beginning of my Monday. A wonderful morning of early autumn, with every step, improved my mood as I walked to the building of the World Trade Center, in which on September 19, 2011 the second technology conference organized by Yandex was held.

Just a year ago, Yandex decided to annually hold a new conference, modestly called, Yet another Conference, and “another conference” turned out to be a cut above the vast majority of those that can be reached in the space around us. On the one hand, the quality of key reports and their direct focus on technical experts held set a very high standard for other technical conferences, and on the other hand, Yandex created a special atmosphere by releasing its key specialists and giving them to “tear to pieces the crowd” . After the performance, many speakers were surrounded by people who wanted to talk, and for hours they answered questions, called a spade a spade and gave out many secrets that are not customary to talk about on stage.

No less important is the fact that the YaC reports are published in the public domain, and everyone can independently view and evaluate them. Of course, the video will not allow you to chat with the speaker directly and find out all the most interesting, but, nevertheless, it provides a valuable resource for self-education.

The purpose of this report is not to retell the reports, because, you yourself can see everything. Instead, it will be more useful to briefly talk about the essence of the reports, share your thoughts and impressions, and recommend those reports that you should spend your time on and watch the recording. However, in the end I’ll move away from this rule a bit in order to share what was not in the record.

Search technology "Spectrum"

The conference was opened by Andrei Plakhov's report, which was dedicated to the new system, launched at the end of 2010, to assess the relevance of the response to a search query. Andrei devoted his story to the formulation of the problem of ambiguous queries and described its solution in general principles.

Yandex processes 5 billion requests per month and many of them are structured in such a way that it’s not easy for a person to understand what exactly the user wanted to know. For example, at the request of the Jaguar, the answer options may relate to an animal, a car, and a drink. Based on the analysis of the query flow, the system finds possible extensions of the current query, for example, “engine size of the Jaguar” or “alcohol content in the Jaguar”. Among all the possible extensions, those that correspond to the category of the request (for example, a drink, something harmful) are selected and the possible extensions are assigned weights based on the potential needs of the category. The system contains a limited number of categories, such as films, books, people, gadgets, etc., which are determined manually.

The recording of the report is worth a look for a general understanding of the problem of ambiguous requests and sample approaches to its solution

Cross-platform development for mobile devices

Dmitry Zhestilevsky, talked about the Platform Abstraction Layer (PAL), which is being developed at Yandex. PAL allows you to create Native C ++ applications for iOS, Android, Symbian and other systems without duplicating code. PAL was based on OpenKODE: a set of open standards that describe the software platform of a mobile device. OpenKODE includes APIs for interacting with the OpenKODE Core operating system (largely copying POSIX), OpenGL ES, OpenSL ES, and others. Yandex implemented some of the OpenKODE standards for each platform, another part was already implemented on the platforms initially (for example, OpenGL ES) and got a universal application development environment. PAL allows you to write and debug code under Windows and immediately compile it under iOS and Android. The OpenKODE set of standards provides rich features, but sometimes tasks appear,

The OpenKODE standard uses the C language and, on the one hand, it is logical to write extensions without using C ++, but on the other hand, over time, it became clear that creating an extension interface in pure C is not very convenient when all the rest of the code uses C ++. Now extensions are full-fledged C ++ objects and there is no additional need to create RAII wrappers over C code.

PAL is not a unique Yandex technology and other companies offer no less interesting analogues. For example, the Marmalade SDK, which was mentioned in the report, provides a completely finished and, from my point of view, the most promising solution. Nevertheless, Yandex began to write its system, primarily because it ensured independence from third-party companies and provided unlimited expansion options.

The report recording is worth a look to understand the problems Yandex has encountered, but to explore approaches to organizing PAL, I recommend that you study sites dedicated to OpenKODE and the Marmalade SDK.

In search of math

The beautiful title of the report could not leave me indifferent, and I, anticipating, made my way closer to the stage. The story was dedicated to the Nigma-Mathematics system, a domestic analogue of Wolfram Alpha. The report was devoted to a general description of the structure of the service, which turned out to be quite typical.

The report is worth looking at to understand the general principles of organizing web services, but if you are interested in the intricacies, it is better to refer to other resources.

Android Application Development in C ++

Yuri Bereza, another PAL developer, spoke about the intricacies of Android development: Java calls from C ++ and C ++ calls from Java, Boost compilation, debugging, assembly features, etc. As a result, he got a very useful report from a practical point of view.

The record, definitely, is worth a look for those who are going to or are already writing in C ++ under Androyd. It was one of the most practical shows.

How we Architected Cloud9 IDE for scale on NodeJS

JavaScript is not one of my interests and in this report I whiled away the time until the first wave of those who wanted to have dinner resolved. But, nevertheless, something I still found out.

Cloud9 has created a JavaScript IDE that runs right in the browser. Indeed, the IDE can highlight syntax, work with version control systems, fully debug: set breakpoints, watch the value of variables, callstack, etc.

I watched, hungry, at all this triumph of HTML 5 and thought about the future of computer technology, the ubiquitous Internet, smart devices, autopilots in cars, thermonuclear fusion for all this electronic happiness.

However, I found one big “drawback” with this IDE: the complete lack of integration with Twitter and Facebook. Therefore, who knows whether the programmer will choose an IDE, torn between such wonderful services?

The report seemed to me largely marketing and therefore I recommend it to be watched only by those who are really interested in the project capabilities.

Why should an ordinary programmer know languages in which almost no one writes

This is not a report, this is an epic, an Odyssey with Ramayana and a Dream in a red tower with the Bhagavad-gita.

No, the truth is, if on the topic of programming languages it would be necessary to create something imperishable and tell in 40 minutes, then it would look something like this.

I advise you to watch, always before bedtime, in a good mood and in a dispersed consciousness.

C ++ 11 is the new C ++ language standard

The path of the standard was thorny and long, but after 8 years we can finally say that this path is completed. Everyone is used to explaining such a long period with the complexities of the language, but maybe this is not the case at all? Maybe the matter is the intervention of otherworldly forces? Judge for yourself, the speaker was simply not allowed into Russia. And if we, here, have long been accustomed to living with our otherworldly forces, then what can they, members of the committee, face against such a disaster?

The Dave Abrams report went over Skype, which in itself was funny. The available technical tricks provided two-way video communication: with a slide show, questions at the end, etc. It was probably strange for Dave to watch a huge audience shot by the camera of a small laptop, and we should look at a person speaking to a large number of people in a home setting.

We talked about the new features of C ++, and I think it’s not worth watching the video, given the technical communication problems. Better spend your time studying the C ++ 11 Wikipedia article.

Unit Testing and Google Mock

Unit tests will only be truly beneficial if they are fast, reliable, and accurate. But how to write a unit test of a system, depending on the database, network or heavy computing? One answer to this question is to test only a specific module, and emulate the work of all dependent modules using Mock objects.

Mock is an object that represents a concrete dummy implementation of an interface intended solely for testing.

Writing a Mock object for each test can often be quite expensive and therefore there is a desire to automate this process in some way. There are many Mock libraries for Java, for which, due to the support of introspection, writing such libraries is much easier. And it seemed to me that the convenient Mock libraries would never appear for C ++, however, Google introduced Google Mock.

Immediately after the report, the library seemed to me "a cadavre, unsatisfied with the stomach." Judge for yourself, with the help of template metaprogramming, your declarative language was created that describes the behavior of the Mock-object. And the question arises: will it take longer to delve into the documentation, in an attempt to explain what exactly Mock should do, instead of quickly writing how it should do it?

My first impression was after I dived into the Google Mock documentation a bit. The library is organized quite reasonably, easy to learn, and now it seems to me, at least, interesting.

It’s hard to say whether or not to watch the video. It seemed to me that you need to get used to the syntax for a while, and the speaker immediately goes into the wilds and spoils the first impression.

Animal Control: Cloudera Distributed System Management and Monitoring Tools

Cloudera specializes in technologies built around Hadoop, the free Apache Software Foundation project designed to store and process large amounts of data on clusters of thousands of individual nodes. Hadoop, in fact, is an ecosystem that combines several separate projects: the distributed HDFS file system, the implementation of the MapReduce paradigm, the non-relational HBase database, etc.

Recently, the popularity of Hadoop is growing rapidly. The number of companies solving their tasks with Hadoop is increasing, the quality of internal implementation is improving, and more and more data is being processed by Hadoop clusters. Clusters with up to 80 petabytes of data are now becoming commonplace.

The report seemed to me somewhat marketing and aimed at specialists directly working with Hadoop, so I think you should not waste your time on it. Instead, I recommend a speech by Konstantin Shvachko from Yahoo, who at the last conference spoke very interestingly about Hadoop.

Facebook Hadoop scalability

The topic of the report again was Hadoop, or rather its modification, aimed at increasing the scalability, which was created at Facebook. The fact is that Facebook uses the fork of the old version of Hadoop and at some point decided to fix some scalability problems of its system, which by that time had already been fixed by the community in the main branch.

The report was devoted to how exactly Facebook fixed its system and will be useful to Hadoop specialists, but those who are superficially familiar with the technology will most likely not find anything useful in the report.

The most sophisticated techniques used by bootkits and polymorphic viruses

The Kaspersky Lab report began with elements of the university course Operating Systems, but quickly moved on to the intricacies of the BIOS device and the OS boot process. Moreover, with a listing of address values of their functionality and tricks that modern malware uses.

Very specific information, at a fairly deep level, so I recommend watching only true connoisseurs who know a lot about such things.

Elliptics Network booth

During the day, a technology exhibition was held in the lobby of the conference, where you could try writing an application using Yandex Mobile MapKit, look at new smartphones, 3D TVs, win some prizes - in general, the classic red corner of marketing. However, among all the stands there were two that really interested me. I will tell you a little more about them since, in fact, there have been no reports on the systems that showed them. At the stands they simply answered the questions of those passing by.

Both stands were devoted to storing and processing large amounts of data in NoSQL storages (the term NoSQL does not mean abandoning the SQL language, but storing information in databases built using a model different from relational). Relational databases have long been the most common way to store and process significant amounts of structured data. Nevertheless, the speed, integrity, as well as the convenience of developing applications using modern SQL DBMSs are achieved due to a compromise with other important factors: scalability and availability. Speaking about SQL DBMS, the term vertical scalability is often used, the essence of which is that from a certain point in the development of the project, the cost of equipment, while increasing productivity by the same amount, is growing rapidly, and, gradually, become prohibitive. The situation with accessibility is even worse: a failure of one node can lead to a server shutdown for a long time and result in large financial losses. To solve the described problems, data storage systems have appeared that, sacrificing functionality, benefit in scalability and accessibility.

One example of alternative SQL repositories is the Elliptics Network project, which Yandex uses to store photos in the Yandex.Fotki service and tiles in Yandex.Maps. Elliptics Network belongs to the class of Distributed Hash Table (DHT), which, compared with relational databases, provides a rather meager set of operations: storing and retrieving unstructured binary data by a unique key (this storage class is called key-value storages), but a number of unique properties. The Elliptics Network cluster is organized according to the principle of a peer-to-peer (P2P) network, in which all nodes perform the same operations and no node stands out among others. An entry key can be an arbitrary entity by which a hash function can be calculated (in this case, SHA512). Using a cryptographic hash function allows you not to consider the probability of its collision and remove the problem of non-unique keys. In addition, the store may not know anything about the true type of key and the method of generating it, that is, perform all operations using only a hash. Each node in the cluster selects several segments from the hash function value ring, for which it will answer and modify the globally accessible (synchronized between nodes) query routing table. When accessing data, the client generates a unique hash value by key, by which a node is searched in the routing table for the range that the value falls into. After receiving the address, the client can directly contact the node and request data from it. In addition, the store may not know anything about the true type of key and the method of generating it, that is, perform all operations using only a hash. Each node in the cluster selects several segments from the hash function value ring, for which it will answer and modify the globally accessible (synchronized between nodes) query routing table. When accessing data, the client generates a unique hash value by key, by which a node is searched in the routing table for the range that the value falls into. After receiving the address, the client can directly contact the node and request data from it. In addition, the store may not know anything about the true type of key and the method of generating it, that is, perform all operations using only a hash. Each node in the cluster selects several segments from the hash function value ring, for which it will answer and modify the globally accessible (synchronized between nodes) query routing table. When accessing data, the client generates a unique hash value by key, by which a node is searched in the routing table for the range that the value falls into. After receiving the address, the client can directly contact the node and request data from it. Each node in the cluster selects several segments from the hash function value ring, for which it will answer and modify the globally accessible (synchronized between nodes) query routing table. When accessing data, the client generates a unique hash value by key, by which a node is searched in the routing table for the range that the value falls into. After receiving the address, the client can directly contact the node and request data from it. Each node in the cluster selects several segments from the hash function value ring, for which it will answer and modify the globally accessible (synchronized between nodes) query routing table. When accessing data, the client generates a unique hash value by key, by which a node is searched in the routing table for the range that the value falls into. After receiving the address, the client can directly contact the node and request data from it.

The lack of centralized elements allows you to proportionally increase productivity by adding new nodes of a similar configuration. This method of scalability is called horizontal. It allows you to significantly save money and create services, the functioning of which would be impossible when using relational databases.

Each node in the cluster can fail at any time, and when there are quite a few nodes, a situation constantly arises in which some nodes are inaccessible. In order for the entire cluster not to stop when the node fails, a replication system is introduced. The operation of such a system is based on the fact that several nodes are responsible for the same key space interval and recording occurs synchronously in all available replicas. If all nodes that are responsible for a certain range of hash values fail, then reading from this range becomes impossible, but writing continues to nodes that are responsible for neighboring ranges. Thus, the failure of an arbitrary number of nodes can make only part of the data inaccessible for reading, and then only those

A year ago, the Elliptics Network supported data versioning, but now it has been cut off as unnecessary. In addition, the Elliptics Network allows the use of interval requests, that is, to receive all the key data which lies in the range from X to Y. Such functionality, obviously, goes beyond the key-value storages and interferes with the well-built DHT architecture. It is implemented as follows: low bytes are discarded from the SHA512 key and are used as a sorting index. Each node stores in memory a red-black tree of hash keys and, naturally, allows for interval requests. The developers say that they do not consider interval requests, the range of which can cover several nodes, although from my point of view it can become a problem of scaling, not to mention that interference with the key generation mechanism prevents the even distribution of data across nodes. And, hypothetically, a sufficiently large and popular data interval, completely falling on one machine, can easily slow it down to a complete stop.

Elliptics Network is centered around several technologies. The Eblob library, which “works even better than it sounds,” allows you to organize on-disk storage of unstructured binary data (BLOBs) and get quick access to them (regular file systems do not handle the load well in situations when the number of files starts to be measured in millions ) The PohmelFS file system, created on the basis of the Elliptics Network, in which the file name is the key and additional mechanisms for maintaining integrity are added, and PohmelFS is compatible with the POSIX standard and can be mounted as a standard Linux file system.

However, my attitude towards the Elliptics Network, albeit with interest, is rather skeptical. In essence, it’s difficult to talk about scalability of the Elliptics Network, since it is used in a cluster consisting of a maximum of a couple of dozens of rather high-performance nodes. An alternative to the project is Hadoop, Apache Casandra, which was already mentioned above (which was very much praised by the author of Elliptics, a year ago, but now the stand was talking about its insufficient performance) and others.

About the Elliptics Network, lively, but superficially, told the author of the project Evgeny Polyakov on the past Yac. Video available.
You can download the sources of the Elliptics Network and read the blog on the project website:
www.ioremap.net
Slides that were spinning in the lobby at the current conference:
www.slideshare.net/AntonKortunov/yac-2011-elliptics-network
In addition, I highly recommend an article on Amazon's other DHT implementation called Dynamo:
s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

Stand Yandex.Mail

The second representative of key-value storages at the conference was the Mulca project, which is a distributed file system used to store letters in Yandex.Mail. The cluster stores 22 trillion records, with a total volume of 6.5 petabytes, on more than a thousand nodes and is constantly growing.

The system, like the Elliptics Network, is built on hash tables, but uses completely different principles. The write key is hashed by two different, non-cryptographic hash functions and each node in memory implements a double hash table. The first level is an array, let's call it H1, pointers of a size equal to several million elements. Each pointer of such an array refers to a list of the second level H2 of a dozen elements, which in turn refers to the place on the disk where the data is. With this design, the search is performed as follows: The value of the first hash function (more precisely, its remainder by dividing by the size H1) is used to obtain the offset in the array H1 and obtain the address of the list H2. In the list H2, a linear search is made for an element that matches the value of the second hash function. After finding such an element, data is read from the disk and only after that the real keys are compared. If the keys do not match, then the search continues on the H2 list, and so on. The H2 list is sorted by time and therefore access to new letters is faster.

Due to the fact that the keys are not stored in memory, the developers managed to reduce the memory requirements from 50 to 12 GB per node. On the other hand, this approach does not lead to a decrease in productivity, since now there are units of double collisions in the system and there is not a single triple collision. Those. the search for the overwhelming number of letters takes place in one reading from the disk and only a few letters require two reads.

Each letter is stored on two nodes in different data centers, and the meta-information on the letters is stored in the Oracle DBMS cluster and the letter key is arranged in such a way that information about the nodes on which the letter is written is added to it. This allows, knowing the letter key, to read directly from the node that stores it.

The record is managed by special nodes, called balancers, which monitor the workload of the remaining nodes and, when generating a new letter key, use the nodes with the lowest load.

The main advantage of the system in Yandex.Mail over the Elliptics Network is that adding a new node does not transfer data between nodes. In the Elliptics Network, by contrast, adding a node results in a redistribution of responsibility for the SHA512 ring ranges between nodes, which in turn leads to mass transfer of data between nodes.

Unfortunately, I did not find any additional materials on the internal structure of Yandex.Mail.

Conclusion

This year, in my subjective opinion, Yandex did not reach the level that he set a year ago: reports were weaker, there was more marketing, and the speakers were less interested in communication. But even with such reservations, YaC remains one of the best technical conferences in Moscow, and its attendance will certainly be useful to technical specialists who are trying to expand their horizons.

References

Records of reports:
yac2011.yandex.ru/archive2011/video1
yac2011.yandex.ru/archive2011/video2
yac2011.yandex.ru/archive2011/video3

Records of last year's reports:
yac2011.yandex.ru/archive2010/materials

Tags: