“We can test Java better than Oracle” - an interview with Andrei Pangin from Odnoklassniki

    Today I have prepared a great interview for you with Andrey Pangin aka apangin , a leading engineer at Odnoklassniki. Andrey has been working as a JVM engineer at Sun Microsystems for more than 6 years, including in the HotSpot team, and for the last 5 years he has been working at Odnoklassniki, solving problems related to JVM and performance there. So Andrei is rightfully considered one of the strongest JVMs in Russia. Andrey is an expert in system programming, he was engaged in storage systems and information transfer systems. He stacked the bricks that underlie the Odnoklassniki portal and provide the reliability and speed of services. Here's what we talked with Andrei:







    • What is the move from Java 7 to Java 8?
    • what happens to sun.misc.Unsafe;
    • Odnoklassniki architecture;
    • engineering tradeoffs, sharding, and GC;
    • storage systems and Cassandra;
    • what Odnoklassniki is ahead of the rest, and what else to learn from Google;
    • how to become a cool system programmer.


    (I know that the beginning has been delayed. We will work to swing faster and get into the topic.)

    For those who again have no time to watch the video, under the cut-off transcript of the interview.

    About moving from Java 7 to Java 8


    - Working in the bowels of the Sun, you looked at Java from the inside, and when you switched to Odnoklassniki, you started looking from the client side. Have you seen (as an application developer) Java 6, 7, 8. Your feeling: got better, worse? For example, do you have the feeling that JDK 8 is a product that is clearly better than JDK 7, or maybe the other way around?

    - There are bugs in all versions, they’re already used to it. It is clear that the JVM is a very complex system. And sometimes you can’t even predict in advance how the JIT compiler will work in a particular scenario. Most importantly, we have 99% of the portal working on this for everyone, and there have been no major drops since 2013. Therefore, we are satisfied with Java.

    - And what percentage of Odnoklassniki, if not a secret, works on Java 7, and what - on Java 8?

    - We still have systems even in Java 6. Fortunately, there are very few of them. There are services that do not restart for years. What is the point of upgrading them? Work and work. And so, everything that we launch now is new, we immediately launch it on JDK 8. Those services, those services that are often deployed, have already almost all moved to JDK 8. Now, probably, about 60 percent are already working in the G8. And every week something new is translated.

    - What are the difficulties when migrating from Java 7 to Java 8?

    - Surprisingly, we saw much less problems migrating from Java 7 to Java 8 than when we migrated from Java 6 to Java 7. Some things worked differently there: for example, sorting started throwing exceptions where it hadn’t been before were changes to Unicode support, etc.

    When switching to Java 8, there were no such problems. But we ran into bugs in the JIT compiler.

    In general, sometimes it seems that if there is some kind of bug, then we will certainly run into it: it seems that we have so many servers and such loads that we can better test Java than Oracle itself.

    For example, now we have disabled the multilevel compilation in production, which the guys from Oracle have so diligently done: there is a critical bug in the C1 compiler, so we immediately compile in C2.

    - Have you fixed this bug yet?

    - Yes, it seems like the JDK 8u60 solved the problem. We must pay tribute, in the JIT compiler, the guys from Oracle quite quickly clean up the problems. But there were serious vulnerabilities: you could write simple Java code that would crash the entire virtual machine. By the way, in version 8u40 two bugs were fixed in the compiler, filed (wound up - author's note) by me.

    About virtualization, Unsafe and Classmates


    - And how big is the Odnoklassniki project as a system? How many subsystems, layers are there?

    - Lots of. There are about two hundred modules, and the largest number ten thousand Java class files each. You can find in the source comments dated even in 2004. But, as you know, Odnoklassniki started in C #, and then a team of several people in just a couple of months of busy days and sleepless nights rewrote everything. That is, nothing remains from the previous version of Odnoklassniki.

    - Now Odnoklassniki is a highly loaded Java project, the largest Java project in Russia. Are there many similar projects in the world?

    - I know that in terms of traffic we enter the world Top-100 according to Alexa. And in Russia, now is the seventh, as I recall. Probably from Java projects - we are one of the largest.

    - VKontakte and Facebook, which in some large parts had a lot of PHP code, released their virtual machines for PHP. Where does this trend for virtualization come from?

    - Virtualization is convenient! In a broad sense, it allows you to work with the layer with which it is more familiar. The final developer should not worry about what microinstructions of the architecture it translates into and how it is processed on the processor.

    - I remember that in 2012 you talked a lot about Unsafe. And now, probably, only the lazy does not speak about Unsafe in the Java party. I want to blame you for promoting unsafe programming techniques.

    - I do not promote them - I use them! Because not everything we need is in Java.

    Java was originally conceived as a hardware-independent platform: you wrote it once, you run it wherever you want. But here we are in Odnoklassniki, we run everything on 64-bit Linux, on Intel processors: we don’t need such Java portability, but we want to use the operating system and the hardware on which we run to the maximum. And for this you have to dig deeper loopholes deep into the operating system.

    Here, offhand, an example. We have a bunch of caches - in general, out of 8 thousand servers that Odnoklassniki has, almost half are all kinds of caches. And then, when snapshots are written, the data from the RAM are dumped to disk. If this is done purely by means of Java, the memory consumption wildly increases. And only a Linux-specific hint can you tell her: “Don’t, don’t cache this data, we won’t need it in the near future.”

    - This cannot be done simply by configuring Linux? I just did not quite understand why to get into the application, why is this pen on its side?

    - For Linux, everything is the same: what Java files write, what other processes in the system. The programmer himself must distinguish what he wants to cache, and this is not. Therefore, the initiative should come from the side of the application.

    About high loads


    - Which parts of Odnoklassniki are the most heavily loaded?

    - One of the most loaded is the Ribbon service, the page that opens as a news feed when a user visits the main page. Here, data from different sources are collected for each.

    Another example is the instant delivery of push notifications about "friendships", news, messages, gifts. There per server account for up to 50,000 requests per second.

    Some push-notifications are actually implemented on long polling - on the client side there are long http-requests that hang either for 10 minutes or until the data arrives. And on the backend side there is a separate service that at any given time knows where, on which frontend or frontends which user is sitting. That is, a person can have several clients open: on a mobile device, on the web version of the portal. There are about a thousand of these machines for you to understand.

    - It turns out that this is a significant percentage of all eight thousand servers. Why don't you use WebSockets, for example?

    - unlike WebSockets, Long polling is supported almost everywhere. Of course, there are outdated, ancient browsers. Internet Explorer 8, for example. But soon we want to abandon it.

    About normalization and denormalization


    - In the Enterprise world, there is usually a lot of Java EE, a lot of Hibernate, Spring, and so on. What technologies do you have?

    - One of our largest modules, which is responsible for most of the business logic, is traditionally called odnoklassniki-ejb. But actually, for today from EJB there was absolutely nothing left. I myself set this task last year - to cut EJB from the project. We had a three-day hackathon, where developers could choose for themselves any project for which usually there is not enough time. And so I decided to completely get rid of Enterprise in our main module. And now it is a regular Java application. Like this.

    As for Spring - Spring, we have a lot.

    “What about Hibernate?”

    “No, no, not him!” We need very precise, full control over what requests and how we execute. We have part of the systems, part of the storage, traditionally remained on Microsoft SQL Server, but now more and more we are moving towards NoSQL solutions.

    For SQL databases to cope with the load, you have to give up a number of features. In particular, we do not use joins, we do not use triggers, stored procedures.

    - That is, students are taught to “normalize”, then they come to you, and you say “denormalize”?

    - Something like that, yes. Of course, where it is possible to transfer the load from SQL servers to the business logic server, we transfer it: simply because we pay for the processor capacities on which our SQL servers work.

    - Why? And what's wrong with join?

    - Our bases are distributed and partitioned. How do you join if you have 1/16 of the data on each server?

    - Do modern systems with sharding work poorly? It would seem, for sure, some good Enterprise ...

    - Everything that is called an enterprise, in fact, does not work on our loads. Yes, it shows itself well in the banking sector, somewhere else, where the load is lower, but more serious requirements for reliability. We are also happy with eventual consistency in many systems: what is the difference, for example, a second sooner or later your friend will see that you changed your profile picture?

    About engineering compromises, sharding and a good garbage collector


    - I generally really like the topic of trade offs. Let's talk about this? What are they in Odnoklassniki? I understand that performance is paramount.

    - One way or another, even if you are a front-end developer, you have to think about productivity.
    Here come two lists, you need to somehow combine them into one.

    Yes, it’s easier to write a simple quadratic algorithm. But when you know that your service will then gain popularity, grow, lists will come in 10 thousand and 100 thousand elements ... you start to think. After all, the quadratic algorithm will slow down.

    - But other teams, probably, have different priorities: performance, for example? And, again, the fact that there are no joins: if the data is denormalized, then a lot of disk space is wasted?

    - Where how. This, again, is a trade-off, as you say. If we expect that we will take up a little more space, if we denormalize and store directly in the fields of the same table. Or we do join on the side of business logic. We get the ID, and then climb them into other subsystems.

    I like to give a simple and understandable example: how to get a list of all your friends - with names, avatars and other things? First, a request is made to a separate subsystem of the graph of connections, and your friends’s IDs are obtained by your ID. The link graph no longer has any information other than an ID. Having received an array of IDs, we are already executing the second request for user caches, which just by these IDs get information about the user.

    This data, of course, is shaded. We have a kind of MapReduce implemented in the remoting module, that is, the system itself can distribute requests by shards, execute them in parallel, and then collect them. The keys know which shard they are stored on.

    - In the scenario you described, we made two requests at least, but this all must somehow run very quickly inside. This is a call to the internal network, and the network is a lot of jumps at once, because one subsystem turns to another ... In such a chain of calls there will be many network requests!

    - Of course. But, in fact, the average waiting period for an Ajax request is 5 milliseconds. This is not counting the channel that goes from the user to our frontends. That is, this is the time inside the portal. It’s clear that 5 milliseconds is the hospital average. There are requests for 50 milliseconds, and there are very short ones.

    “That is, what percentile do we think?” Ninetieth?

    - The ninetieth. For you to understand, one remote request to the server in the same data center takes 300 microseconds, and one millisecond to the server in the other. It’s clear that garbage collection happens: there can be 100 and 200 milliseconds. But they are rare. We are struggling with long GC pauses. A pause of one second is already critical for most of our system.

    - What GC is used?

    - For the most part, a well-tuned CMS. It generally works better than even Garbage-First. But G1 is also used here - for example, in our NewSQL self-written solution, which came to replace the SQL server in order to guarantee response time.

    That is, on average, G1 works a little worse than the obscure CMS, but at the same time gives more guarantees. CMS, for example, has a delay of 50 milliseconds on average, but sometimes it shoots up to 300-400 milliseconds. And G1 collects, perhaps more often, and he has a pause of up to 150 milliseconds on average, but he was given a limit of 200 milliseconds, and he is trying to really withstand it.

    — Примерно представляя, что такое Java, я понимаю, что если мы командуем: «Пожалуйста, G1, пауза 200 миллисекунд», — то в реальности там не будет 200 миллисекунд. Точнее, будет, но с какой-то вероятностью. Как часто сборки мусора вылезают из этих условных 200 миллисекунд?

    — Очень редко. На удивление. Начиная с JDK 8u40, G1 стал хорошим коллектором. Если раньше он даже не мог неиспользуемые классы выгрузить без полной сборки, то с этой версии он уже вполне себе production quality.

    — В JDK9 Garbage-First будет дефолтным коллектором. JDK9 выходит через год, в сентябре 2016-го. Будете переходить на Java 9?

    - Most likely, we will in order to receive updates when the support of the G8 ceases. Actually, this happened with Java 7: we started to move from it, because we want to receive new updates. Itself faced that it was necessary to patch our version of JDK7 to backport rather critical fixes in the JIT compiler which were fixed only in JDK8.

    - And why not take some conditional Azul, which made most of the business that they are engaged in backports and support of old Java?

    - To give the opportunity at the expense of us to earn Azul? :) What's the point? Now we have specialists who can cope with this.

    About Java Performance


    - For a long time there was no news about breakthroughs in performance in JDK, JRE. And did you have such that after switching to some version, productivity significantly increased?

    - There wasn’t such a thing that simply with a replacement version.

    - What is the reason for this? What are the chances that tomorrow some kind of cool optimization will be released in JIT, which will give + 20% performance?

    - This probability is close to zero. Although interesting optimizations do happen, recently vectorization has been done in cycles.

    - Addition and multiplication by integers? It would seem a simple thing ...

    - Yes, we are talking about integer operations with an array in a loop. Even now, this does not always work. But at least optimization already exists. I mean, now all these optimizations give a few percent, a fraction of a percent. There is no talk of any sudden jumps.

    - Question about security. At one point, until 2012, the Java motto was “Compatibility-First.” Since 2012, it has become the “Security-First”. So, as a portal, did you feel this somehow?

    - First of all, for us the problem is bugs that lead to JVM crashes. Otherwise, there have never been cases when we were hacked, knowing some Java vulnerabilities. It’s much easier to find some kind of hole in the API.

    About Storages and Cassandra


    - Let's move away from Java and talk about storage systems. How does it work for you? What file systems are used?

    - A lot of very different: both self-written and standard. Today, probably the most Cassandra-based repositories.

    - Why exactly Cassandra?

    - Others simply do not work: there is no filelover, the ability to replicate almost manually. And replication is an important business requirement. Now we strive to ensure that the entire functionality of the portal works completely even if one of the data centers fails.

    - And how many total data centers?

    - Three. And if one crashes, users will still be able to log in, but some services may not be available. We began work on ensuring reliability with key services, and now we are cleaning our tails.

    - Tell me about the process of transition to a distributed system with three data centers, please. How does it work, who does it?


    - Libraries, repositories and tools themselves are made by the platform team. For example, for heavy content - photos, videos, music - self-made repositories are used. Recently, colleagues from EMC came to visit us at the office - they are known for their solutions in the field of information storage. They talked about their decisions, shared their experiences. But, as it turned out, they can not offer us anything new in comparison with what we already have.

    - Yes, EMC are interesting guys. And what, by the way, in your opinion, is the technical expertise of Odnoklassniki? Is there any world-class experience?

    - The first thing that comes to mind is Cassandra. We will smuggle into it, and are constantly ahead in comparison with what is in the Cassandra master branch now. The global indexes that are just being made there - we already have and are in full use.

    We also have strong expertise in recommendation systems. Initially, we had a recommendation system for music, now it is already attached to many other services: video, groups. The main task of the portal is to show people the content that interests them, and not to show the one that is not interesting to them. This encourages the user to hang longer in Odnoklassniki.

    Well, we have strong JVM expertise, of course.

    - And where do you want expertise, what is missing today, in your opinion?

    - Maybe in terms of image recognition. There's a lot to learn from Google.

    - As I understand it, there is a certain platform team that does everything related to storage, distribution, and caches. And you present your work to other teams in the form of libraries, APIs, services, right?


    - Exactly. We give ready-made solutions. But a developer from another team, of course, must know some features: understand what requests are light, how heavy, what can be done, what cannot.

    - Do you describe this to him in the form of javadoc or otherwise transfer this knowledge?

    - There is such a practice when the authors of these decisions conduct lectures and seminars. Although not very often: mainly for beginners, but experienced developers, as a rule, can learn something new.

    And after all, we still have a code review procedure: if a person took my decision and built something on the basis of it, then he will most likely come to me later and ask: “Andrey, is everything okay here?” If anything, I’ll indicate where and what to fix.

    - And what are you doing so that your API cannot be used incorrectly?

    - Personally, I am not doing very well in this regard: I just try to make the code as concise and simple as possible. And if you use it incorrectly - you are “evil to yourself, Pinocchio” - go and see the documentation or the lecture recording to understand how to do it.

    - But the documentation, in principle, is up to date?

    - Yes, I am very glad that at one time I wrote a detailed FAQ on our remoting and serialization.

    We just use our own system: at one time we switched to it with JBoss Remoting in order to be able to do online updates without downtime services, and to resolve situations when the version with new versions of classes is working in one part of the portal and the old ones on the other .

    And still, when they come to me and ask: “How will it be serialized like that?”, “Is it possible to do such a conversion” - I just post to the page in our internal Wiki.

    - But JBoss Remoting does not know how? Or is it doing worse?

    - It’s even worse than even in standard serialization. It supports some changes, but to a limited extent: there you can delete fields, you can add new ones, but they will be initialized with a default value.

    And here, let's say, change the type from int to long. There was a flag field, 32 was missing - changed to a 64-bit field. This is a typical case, however, standard serialization does not support it.



    How to become a cool system programmer?


    - What advise you to read, where to look for a person who is interested in the Java platform itself and its low-level details?

    - About JVM and HotSpot internals, not much has been written in some third-party sources. Fortunately, this is an OpenSource product: you can download it, see it. Sometimes in the HotSpot code there are more comments than the code itself.

    - This is C ++ mostly, right?

    - Yes. Well, there are general concepts for how JVM virtual machines work. Here I advise you to watch an excellent series of lectures by Oleg Pliss at St. Petersburg JUG.

    - Maybe you would advise some more books?

    - I won’t advise books. Better to ask questions on StackOverflow. I am also sitting there now, and you can ask me.

    I signed up for JVM and Java Performance related questions there and periodically answer non-trivial questions. And sometimes I myself learn something new from there.

    There, of course, there are plenty of inadequate questions and answers, but there are different mechanisms for this, the reputation is the same. But there you can meet leading world experts: periodically Brian Goetz himself looks in there and comments on something.

    - And Habr? Now, I see, you rarely write there.

    - It seems to me that the quality of content on the Habré is slowly falling ... And I'm not only talking about the Java hub.

    Firstly, it affects the fact that Habr was divided into several subprojects. Accordingly, the audience of each decreased. Secondly, when any one is inadequate for just one comment you can be blundered - of course, the motivation for writing further disappears.

    - Your last article is about how the deadlock arises in parallel work of classloaders. Do you think that the author of this code from Google should have fixed this, or is it “a bug and a bug, everyone has bugs”?

    - They fixed it, and we must give credit, fairly quickly. It just belongs to a class of bugs that are completely unobvious. You just can’t find it looking at the code.

    - Really this static analyzer cannot track? It would seem that a simple rule is that this class is a superclass, but this subclass is quite easily done using static analysis. Basically, this is a rule that anyone can catch.


    - I agree. There has simply not been such a rule so far. Our mutual friend lanyalready promised to add corresponding warning-s to FindBugs.

    “But creative plans — reports, articles, books — do you still have?”

    - There are many ideas, I want to write about a lot, but there is not enough time. There are a lot of interesting things about the internals of the JIT compiler, how it behaves in a completely unpredictable way, and what strings there can be pulled.

    I also want to write how to work with Linux with signals. Traditionally, a signal is intercepted only in case of some errors, but HotSpot JVM uses a lot of interesting moments related to signals for its internal purposes. This is quite a good way for system software. Such as Java Runtime, for example.




    In conclusion, as always, useful links:


    Also popular now: