“Any technical change should answer the question“ why? ”- Classmates about Java and not only



    How does Odnoklassniki combine the use of sun.misc.Unsafe with increased reliability requirements? Why was the Cacti monitoring system being finalized there? How does work in OK intersect with scientific activity? If the social network is called Odnoklassniki, does its entire Java code consist of one class?

    The answers to these and other questions are in our post. On the eve of Joker , where three OK employees will be speakers at once, and another one will participate in the program committee, we asked all four - and not only them. Our questions were answered:

    • Oleg Anastasiev , lead developer (member of the Joker 2016 program committee)
    • Andrey Pangin , Lead Developer (Speaker Joker 2016)
    • Vitaly Khudobakhshov , Leading Analyst (Speaker, Joker 2016)
    • Dmitry Bugaychenko , analyst engineer (speaker of Joker 2016)
    • Andrey Guba , Deputy Technical Director
    • Kristina Steinberg , Head of Human Resources


    Oleg Anastasiev (Lead Developer)


    - Classmates seek to use the new, or do not want to run ahead of the engine? For example, when a new major version of Java is released, do you try to quickly transfer the server to it, or do you live quietly with the old one?

    - We dance from the task: any technical change should answer the simple question “why.” If it answers this question - we will do it, no - we won’t.

    And so it turns out differently with Java versions. We introduced Java 8 at an accelerated pace because there were lambdas. In the development of the portal web parts, we use our own function-oriented framework; I had to write a lot of anonymous classes. And Java 8 fell perfectly on our task: with it it turned out to reduce the code with lambdas, it turned out to be both more readable and faster.

    But in the case of Java 9, based on the task, I don’t see yet how exactly it will improve our lives. Perhaps it will be faster, and then there will be a reason to spend time on the transition, but with this everything will become clear only on the final release. The transition to modules in our case will not give the benefits justifying it.

    Moreover, in a certain respect, Java 9 will make our life more difficult, and not easier: due to the rejection of sun.misc.Unsafe, which we use. Unsafe allows you to conveniently, without leaving Java, implement a lot of low-level code, without it I would have to write this code, for example, in C. Even if JNI worked quickly (and this is not so), I would have to spend a lot more effort on development.

    In addition, since we are a gigantic, highly loaded project, reliability is of course important to us. Therefore, before the transition, we must be sure that everything is already working quite stable and no worse than the previous version. So we are not going to install Java 9 on the day of general availability for sure, although, of course, we will begin testing it.

    - Listen, how is “reliability important” combined with the use of Unsafe in production?

    - Of course, when using Unsafe you can easily break a lot. But if you understand very well what you are doing, then you know what you can break, and you know whether this is important in your particular case.

    For example, in certain cases, the principle of “write once, run anywhere” may break: you get code that will not run correctly on everything. But if for Java as a whole this is important, then we have our own specifics. We run the code on very specific servers. We obviously will not change tomorrow our server park for something completely different. And our goal is to make the code on these servers work as optimally as possible.

    And in this case, the principle “write once, run anywhere” does not start to help, but interfere: it does not allow the programmer to take advantage of the features of the operating system with which he could get much faster and more optimal code. For example, it does not allow taking a memory page directly from the OS. It does not allow recommending the OS how to cache a certain memory area, whether it should be done at all, for how long. In Java, there are basically no such built-in capabilities, and with Unsafe this is easy to implement.

    Oracle’s motivation for abandoning Unsafe is clear: yes, it provides many ways to shoot yourself in the foot, and for many people it all ends there. But I want to note that there is also our case in which the rejection of Unsafe is not “the child was taken away with an ax so that he would not cut his finger”, but “adults are deprived of a simple and useful working tool.” And the VarHandles who succeed him help only in one of our cases.

    In general, ideally, I would like not even Unsafe. I would like Java to be able to integrate more closely with the OS, many libraries implemented in other languages, C and Go, the ability to write low-level code with manual memory management, where necessary, up to the ability to switch to assembler at any time and simply write code on it in places where speed is critical.

    - You, as a member of the Joker program committee, have already seen many reports. Did you especially enjoy what you can recommend?

    - I really liked the practicality of the report by Philip Delgado “DBMS: individual tailoring and fit according to the figure” - about how, knowing well the capabilities of your DBMS, you can quickly solve complex problems and at the same time avoid complicating the application architecture.

    And, of course, I am biased, but the report of Andrei Pangin is very interesting. It's also worth listening to Dmitry Bugaychenko from Odnoklassniki on how to apply streaming analysis of tens of millions of events per second. For such a task, “just taking Spark” is not an option.

    Andrey Pangin (lead developer)


    - What will you talk about at Joker?

    - I had in mind several topics, but the listeners themselves chose performance myths. Well, then I’ll talk about how Java slows down. Or it doesn’t slow down - anyone like it :)

    In general, I will share the JVM features related to performance and tell you how easy it is to make mistakes when analyzing performance problems.

    - A year ago, you in “Without Slides” talked about how everything is technically arranged in OK - but has anything significantly changed over this year? At least quantitatively, at least qualitatively.

    - Of course. The number of servers, storage capacity and traffic - this is what is constantly growing. Traffic alone has doubled over the past year.

    We launched several new independent projects, in particular, OK Live and OK Messages. Naturally, they required new technical solutions.
    A year ago, we didn’t really have video streaming, now online broadcasts are available to all users on any device.

    Significantly redesigned the backend messaging systems aimed at mobile devices and mobile networks, which required the implementation of a custom server with a custom protocol.

    We learned how to "cut" the video on the GPU. According to our measurements, video cards transcode common video formats 3 times faster than CPUs.

    Of the other major technological breakthroughs is the launch of our own “cloud”. While in experimental mode. Previously, as a rule, one application worked on each physical machine. Now, the deployment of services in the "cloud" will allow us to use computing resources more efficiently. And developers will not have to wait until admins install and configure servers: typical tasks for deploying and scaling applications in production will be performed automatically.

    There are many other technical interests that are still in the research phase. When it comes to launch, I will certainly tell you about them.

    - Because of the name Odnoklassniki, we can’t resist the following question: how many Java classes are there in their code?

    - Hard to tell. Odnoklassniki program code includes more than 300 modules. All together, I have never seen them. About four have been pumped out of my work, and this is about 50 thousand classes. The largest module has over 8000 classes.

    - Class!



    Vitaly Khudobakhshov (Leading Analyst)


    - What exactly do you do in Odnoklassniki?

    “I am a leading analyst.” I have to do many different things, for the most part my work is connected with the analysis of large volumes of data using Spark / Scala or other similar tools. I am engaged in data processing and building all kinds of models. Sometimes you have to come up with different algorithms and write an implementation at all levels, including distributing data by users by means of highly loaded services in Java, but for the most part I am developing matmodels.

    - In the material of “Hacker” you mentioned situations when int-addressing is not enough - and how often do you have to deal with similar situations in OK with their data volumes in practice?

    - When processing big data, a situation with a lack of int-addressing did indeed happen several times. I can’t call this problem widespread yet, but it will be increasingly encountered in practice. Int addressing is only part of the problem. For example, many people say that LinkedList is a poor data structure, but using a large ArrayList is often not possible due to fragmentation or Promotion Failure - so this is a bit deeper problem than people think about it. I really used code similar to what I described in Hacker for big calculations, and I would say that there are not enough data structures with long addressing. Actually, if I find the time, I will write my library.

    - Obviously, OK has a lot of data for analysis - and if you approach it not quantitatively, but qualitatively, do they have their own unique specifics?

    - In fact, volume is not the only characteristic of big data, even in the general case. Yes, there is, of course, its own specifics. The most obvious is the social graph, and this is already a lot of specificity. Moreover, when content is generated in such a volume and so different in type (and language), this creates a lot of different difficulties. Users are quite inventive in their tricks and simple-minded in their other actions, all this together creates a lot of difficulties and interesting tasks at different levels.

    - What will you talk about at Joker?

    - I will speakabout a very popular topic of functional programming in the context of big data processing with examples on Scala / Spark. I’ll tell you how functional programming lives in practice and why it has become popular right now. About the main features of OOP and the scope of its application, much is known, there are patterns, there is encapsulation / inheritance / polymorphism, many think that these are some special features of OOP, and few can immediately say that the functional paradigm is general. And, of course, all this is especially interesting in the context of the MapReduce model.

    Dmitry Bugaychenko (engineer-analyst)


    - What exactly do you do in OK?

    - Initially, I was invited to the company to work on a music recommendation system, which turned out to be more than interesting. Further, we continued to develop the gained experience by implementing recommendation systems in other services (groups, videos, and so on).

    At some point, they approached such a complex object as a tape, and here it was necessary to fundamentally change the approach to developing the system: large volumes, strict requirements on the reaction speed, heterogeneity of content types, and so on. As a result, the tasks of the tape became a powerful driver for our analytical infrastructure, which I was also developing.

    Now the basis of my activity consists of three components: the development of analytical infrastructure, experiments with algorithms for constructing the tape, helping colleagues and other teams to implement data analysis in their processes and products.

    - You have a scientific background - does it help when working in Odnoklassniki?

    - Yes, it is quite. A very useful skill is the search for scientific articles and publications. Trying to solve a problem, we always start by looking for what people generally did in this area. And often there are no industrial publications on the relevant topic, but there are many academic publications, and a lot can be learned from there.

    Of course, so that these publications are not just found, but also used, you need to know the language in which the scientific community speaks. Academic language is quite different from industrial.

    - And the work itself on such a large project as Odnoklassniki is closer to science than on something of a smaller scale?

    - Yes, and for three reasons at once. The first is that, on a smaller scale, they try to use off-the-shelf developments to minimize costs. The difference between Odnoklassniki companies is that they do not just use the ready-made, but often develop new solutions, moving the entire field forward. And the volume of investments that a company of the OK level invests in the development of both technologies and processing algorithms is incomparable with what a small company can afford - at least in terms of the amount of computing resources.

    The second reason is, of course, data. Working in Odnoklassniki, we have access to a very large amount of statistical data, which is simply very difficult to obtain in the academic environment.

    The third reason is more technological: you can make an algorithm that will work fine in a small company, but in order for it to work on an OK scale, you also have to solve many new non-trivial problems - both technological and algorithmic. It is very interesting.

    - When working in OK, do you (as well as your colleagues) continue your research activities in parallel? Are scientific articles published based on the experience gained in OK?

    - I work at the university as a teacher. In addition, we stimulate the development of the Russian data science community: we organize contests, hackathons, publish in the public domain part of anonymized datasets built on the basis of real data. And in this regard I continue.

    As for scientific articles, OK-related ones did appear: for example, we have a patented system of musical recommendations about which articles were published. But it turns out to release them infrequently, the freshest one and a half years. We compensate for this by speaking at various conferences, both technological ones like Joker, and academic ones - there is something to tell about recommender systems, about data analysis.

    - To the question about Joker: what will you talk about there?

    - I’ll tell you about the system that Odnoklassniki uses to calculate the CTR of objects in our stream. And also about the different features of standard storages that are used in the Java ecosystem, and about an alternative approach to data processing: not using key-value storage, but using streaming analysis.





    Andrey Guba (Deputy Technical Director)


    - What exactly is included in your circle of tasks?

    - Probably, I can say that I am “engaged in exploitation”. Formally, I oversee the areas of system administration and information security. I lead the teams of the API and the platform. The platform team consists of developers who are involved in some of the most difficult tasks. They write their own protocols that communicate with our applications, their own storage systems - recently launched a new one.

    - I want to know more about the launch: why did you need to write this system, what are its features?

    - We already had our own system in which we stored photos, music, videos. But since volumes are growing (in particular, the video service is developing very actively), the issue of storage costs is becoming increasingly important. And besides, as the number of servers increases, the question of ease of use arises. And we decided to redo the existing system based on these two considerations. We compiled a list of what we want to achieve, and wrote a new one according to it. Now she is already working in production.

    Here, let’s say, a clear metric: it has replication factor 2.1, that is, we store all the data in a 2.1 copy. And before that there was a system distributed between three data centers, where there were three copies, each one. Now we store copies and certain checksums, and we could lose any of the data centers entirely without losing data and preserving the functionality.

    In addition, it is convenient in scaling and operation. For example, replacing a disk is a very simple operation: the old “crashed out” disk is simply deleted, the new one is simply inserted, everything starts automatically and continues to work. Expansion can be carried out by any number of servers, different volume and number of disks.

    It seems to have happened with our cloud: if in the case of data storage we wanted to do it cheaper, then in this case we are going to rationally use the resources of our servers (processor, memory, disk space). There are a lot of servers, not all resources are fully used, somewhere one is heavily loaded, somewhere else. Therefore, we decided to launch a cloud that allows us to use them more efficiently.

    We looked at what is on the market, realized that it’s damp somewhere, somewhere it doesn’t fit the requirements, and decided to write your own. And now in the launch process, it is working in the alpha phase, we are conducting production experiments with it. We start part of the services, look at the statistics and modify it. According to our plans, next year a significant part of the production will work in the cloud.

    - And in the case of system administration, do you also have some own solutions?

    - Yes, standard tools there are also not always suitable, so in some cases you have to modify existing ones, and in part write your own.

    For example, this happened with monitoring and statistics systems. If you use the popular Cacti system on several hundred servers, it will work successfully. But we have 8500 servers, and on such a scale it will not work in a standard form, it is necessary to modify it.

    - In addition to our own tools, what are the specifics of administration in Odnoklassniki?

    - Our goal is to ensure the resiliency of a giant project in a distributed environment. There are several data centers, and we set ourselves the task of making everything work for us if any of them were lost. Accordingly, we must proceed from this to build services, launch and service accordingly. Administrators should be sufficiently trained and they should have all the necessary tools to ensure that everything works both in a normal situation and if a problem occurs up to the failure of the data center.

    And the important part of the work is not technical, but organizational: to ensure that in a relatively small and distributed team all act in a certain way and exchange information. Standard procedures should be developed, everyone should know them, and when new people are added, they should also learn everything they need.

    Christina Steinberg (Head of Human Resources)


    - How many Java developers are there in the company, in which cities are they located, and is there a relocation between them?

    - At the moment, the company has about 120 developers. Located in three cities: Moscow, St. Petersburg and Riga. Relocations do not happen very often, but from time to time someone moves to another office.

    - Since Joker will be held in St. Petersburg, let us clarify separately: which teams / processes are in the St. Petersburg office?

    - There is almost all product development: mobile applications, video, music and so on.

    - What kind of return does participation in Java-conferences bring to you? What feedback do you get?

    - In the Java community, OK is well known for its Java development, it also shows brand research. Many people turn to our speakers for advice, catch them on the stand in order to clarify some points with highly loaded systems. Conference attendees are well aware that no one in Russia has such experience using Java as OK, so after the speeches of our speakers, a line of people wishing to ask questions will form several meters.

    - Some developers have a prejudice towards OK. Do you want to say something like this?

    - I want to say that all people are different, someone likes bicycles, and someone likes cars :) Developers who are familiar with what we do know that OK has very interesting technical problems. There are many challenges to solve the problems of highly loaded systems.

    - Thanks! We will be waiting for all of you at Joker 2016 - for now, let's recall some previous reports by speakers from Odnoklassniki:






    Also popular now: