ARG89 August 30, 2016 at 16:40

Let the Holy War begin: Java vs C ++

In anticipation of Joker 2016, we rolled out a post about Java Performance , which caused a storm of emotions among readers. In order to throw fuel into the fan and still try to come to some kind of unified solution, we decided to attract experts from different “camps”:

Dmitry Nesteruk . Expert in .NET, C ++ and development tools, author of courses in technology and mathematics, quantum .
Andrey Pangin . Leading programmer at Odnoklassniki, specializing in heavily loaded backends. He knows the JVM like the back of his hand since he has been developing the HotSpot virtual machine at Sun Microsystems and Oracle for several years. He likes assembler and low-level system programming.
Vladimir Sitnikov . For ten years, he has been working on the performance and scalability of NetCracker OSS - software used by telecom operators to automate network and network equipment management processes. He is interested in Java and Oracle Database performance issues.
Oleg Krasnov . CTO by SEMrush and Adept ANSI C.

Andrey Pangin

- Java and C ++, which now, in your opinion, is the most popular language? Both of them are adults, but who is more mature and refined?

- First of all, I do not think that there is any competition between these languages. Each of them has its own niche, and they coexist perfectly together. Traditionally, the popularity of Java is slightly higher. The Java platform attracts with its powerful tools for debugging and maintaining applications. However, the significance of C ++ is difficult to overestimate. Despite the fact that it is a language with a great history, it continues to develop actively now: only developers managed to get used to C ++ 11, when the C ++ 14 standard came out with many new interesting features.

- What languages can give in the world of highly loaded servers? Does it make sense to develop separate system modules in different languages, sharpening them for specific tasks? If you could (would you like or just have the opportunity), would you use C ++ to solve problems, or would you do everything in only one language?

- Under high-loaded servers, everyone understands something different. For some, it’s thousands of network requests per second, for others it’s parallel computing over large amounts of data. Different tools are better suited for different tasks. We (in Odnoklassniki) have modules written in C ++, in particular, related to image and video processing — SIMD calculations and the most efficient use of the processor are needed there. However, Java is enough for most of our systems. Moreover, the code fragments previously developed in C and called through JNI were gradually rewritten in Java, and as a result we even gained in performance because we got rid of unnecessary copying and overhead of JNI.

- Using Unsafe in Java, is it justified or not? Why not use C ++ then?

- I have a whole report on why we use Unsafe. There are a number of scenarios where you can’t do without Unsafe, in particular, for working with off-heap and interacting with native code.
If we wanted to write an entire application in C ++, we would have to re-implement all our common frameworks and protocols: for collecting statistics, for monitoring, for communication between servers, etc. And so - we only have a small part of the code with Unsafe , which is responsible for low-level operations, but the rest of the development we conduct in the usual Java, adhering to the best patterns for writing simple and understandable code. It is much more convenient when the whole ecosystem is developed on a single platform.

- What are the most common performance problems when developing Enterprise systems and possible solutions?

- Rarely, when we rest on the performance of the Java platform itself. Usually problems can be solved either by replacing the algorithm, or by scaling, that is, by building up the iron. The most common bottleneck is network bandwidth or disk I / O. But if we talk about the JVM, then the garbage collector sometimes gives us the main or even the only performance problem, because pauses longer than 500ms are often critical in our cases. Therefore, we try not to make Heap excessively large: a maximum of 50 gigabytes, but more often even less: from 4 to 8 gigabytes per application. We try to move large volumes beyond hip: we even made a framework for creatinglarge heavily loaded caches . An additional advantage of this cache compared to hip is persistence, that is, the ability to restart the application without losing data. This is achieved through the use of shared memory: immediately after launch, the application maps the shared memory object to the address space of the process, and the cache with all data becomes instantly available.

- What can you say about the imminent release of JDK 9 and its main feature - modularity?

- There was a lot of talk about modularity, the release date was even postponed several times to finally complete this modularity. But at the same time, among my friends, I do not know a single Java developer who would really need this feature. I think it would be better if JDK 9 were released early - the developers would only be grateful. For example, modularity is more likely to hurt than help: after all, one of the side effects is that Unsafe will now be hidden deep inside, and is not accessible without special keys. But in JDK 9 much more pleasant innovations are expected, for which the new version is worth at least trying: improvements in G1, Compact Strings, VarHandles, etc.

- If you look very rough, the difference between C ++ and Java is in the runtime layer, which, among other things, performs all kinds of optimizations. Which is preferable: using the architectural features of the machine manually (C ++), or is it better to rely on dynamic JVM optimizations? If we talk about specific things, is it better to automatically collect garbage or manual control?

- Adaptive compilation and automatic memory management are just the strengths of Java. In this, the virtual machine excelled and exceeded static compilers. But the main thing is not even that. We choose the JVM for the security guarantees that it gives us. First of all - protection from fatal errors due to incorrect operation with memory. It’s an order of magnitude more difficult to look for problems related to pointers or going beyond the bounds of an array in unmanaged code. And the cost of correcting such errors more than covers the benefit of that small speed advantage that gives direct access to memory. As mentioned above, we sometimes use Unsafe, and in these cases we automatically expose ourselves to the same risks as in C ++. Yes, we sometimes have to understand the JVM crash dumps, and this is not a pleasant activity. That's why we still prefer pure Java,

Also, I will have a report on Joker just on the topic: "Myths and facts about Java performance."

Dmitry Nesteruk

- Java and C ++, in your opinion, which language is the most popular now? Both of them are adults, but who is more mature and refined?

- If we talk about the demand, then everything is obvious: Java, of course, is more in demand than other languages. C ++ occupies its niche in three main disciplines (game dev, finance and embedded), well, plus is the main language for HPC and scientific computing. Therefore, if you adhere to selfish interests, then Java is certainly safer as a skill, unless you go purposefully to one of these areas.

As for maturity, everything is complicated here and you must first break down the features of the language, compiler capabilities and features of standard libraries.

Let's start with the first - with languages. Both there and there are problems. The problem with Java is that the language does not develop as fast as its closest competitor, as a result of which features come very slowly and not the way you want. It is noteworthy that C # is younger, but lambdas were the first to appear in it, also LINQ technology (Language Integrated Query - these are such convenient mechanisms for traversing and fetching datasets), and the original solutions based on C # (that is, support for properties and delegates) were also performed competently and successfully.

As for C ++, the main problem here is 100% compatibility of C ++ with the C language, which automatically means a huge baggage of language features that no one needs. On the other hand, the stagnation of C ++ in the 2000s didn’t add popularity to the language either, because developers need to constantly feed new features. Now the situation is better - in C ++ there are lambdas (by the way, more expressive than in C # / Java), type inference for variables and even values returned from functions, in general, the language somehow evolves.

This applies to languages. Now about the compilers. Here, firstly, the comparison is not entirely correct, because JVM is JIT, that is, the idea that you can take bytecode and turn it into such an ideal representation for the current processor, with all the optimizations that apply. This sounds good in theory - I don’t know how it is in Java, but in the .NET world this approach, in comparison with the optimizations of the C ++ compiler, of course, it does practically nothing. If you are doing math, or, I will say this: if, for example, you buy a mathematical .NET library on the network, then this will be just a wrapper around C ++.

Yes, and regarding the C ++ compiler: I use Intel C ++ for computational tasks, that is, the compiler that is supplied by the processor manufacturer itself. There are a huge number of disadvantages: there are fewer language features than in MSVC, a bunch of awkward mistakes that you have to contact support, but we eat this cactus for one simple reason: optimization. Intel's compiler generates the most efficient code. Of course, not a single code: here we have all the power of Intel Parallel Studio, it’s both Threading Building Blocks for parallelization (by the way, an analog of Microsoft Parallel Patterns Library), and Intel Math Kernel Library, which even if you don’t use it directly, you use it indirectly through MATLAB and others. Here it is necessary to clarify that a library like MKL is already optimized by the guys from Intel: here both vectorization and parallelization, and even cluster parallelization via MPI (for example for FFT) is made “out of the box” - that is, take it and use it. And of course, it’s worth mentioning the means of profiling, which are also part of IPS. This is a very powerful toolkit, it essentially aims to help the developer optimize the code in terms of performance, and of correctness too - there is also a memory profile there, so leaks and all that is easy to find.

And finally, about the “ifs” - everything is simple, Java wins, everything is bad in C ++. I won’t even say that the C ++ Standard Library interface itself is a little crazy, but I see the problem not only in the fact that everything is "legacy", but in the fact that there are simply very few features! We just got things like file system support and some kind of stream support, anyway. And then, here I have a string, I want to beat it into substrings by a space - this is not in the standard library, that is, I have to take a third-party library (well, good there is such a thing as Boost, there are a lot of useful things). But development really slows down. Many companies, such as Electronic Arts, write their own STL implementations because they are not happy with the standard one. Well, on the sidelines, many admit that we essentially need a new, from scratch, library, a certain STL2, although it’s more correct, of course,

There are still a lot of problems, for example, the lack of a main package manager, and even if it were, how would libraries fumble? In Java or .NET, you can simply distribute binaries, but in C ++ you basically need to fumble for sorts. Nobody really decided this problem yet, and this also slows down development. sometimes you take someone else’s lib, and then spend half an hour just to make it work for you.

- How do languages generally feel in Enterprise, for example, in the banking sector? For example, in the HFT (High-Frequency-Trading) world, there are heavy loads and high reliability requirements. Also, the financial industry is quite conservative. How does this affect the choice of a technology?

- Enterprise is one such large comb, under which any corporate development is now underway. Globally - this, of course, is C # and Java, and other languages are somewhere on the periphery. As for the banking sector, everything is somewhat more interesting, and it is especially interesting that C ++ appears in some places, well, there are some offices like Bloomberg, which are completely C ++ in general, but this seems to me an anomaly. In general, if you now get MFE, that is, a master's degree in financial engineering, then it is mainly used with C ++, although now popular languages such as Python and R, well, MATLAB also remains relevant.

As for HFT, this is also such a controversial topic, but yes, it mainly gravitates to C ++, and even to C, using all sorts of FPGAs, where system C is present, or people write in all sorts of HDL languages. When speed and performance are important, then the native code is somehow closer, although the argument that de “Java slows down” seems irrelevant to me. There is just manual memory management sometimes, sometimes people need it, everyone is afraid of a big malicious GC that will come and stop all threads at the very moment when you need to make some kind of deal.

In quant finance, the “pros” were rather left out of some conservative considerations, because finmat, in contrast to the usual software development, considers programming as a skill akin to knowledge of the English language, and not as something system-forming. Accordingly, people simply learn C ++ and do not suffer, although now Python and R are somehow even more popular for analysis. But the “pluses” in investment banks are cars.

- For developing software for embedded devices, which language is better in your opinion? How much do these languages allow you to write portable code?

- In general, the topic of embedded is too broad. For many, embedded are all sorts of Rasperry Pi or Arduino, for me it's FPGA, for someone else something. But if to generalize, then embedded it is of course basically C or C ++, if we talk about the application level. Of course, for FPGA development I use either VHDL directly or write MATLAB, which throws out after converting VHDL - the essence remains the same.

Specifically about FPGA, since this is the only topic in which I understand at least something, I can say that languages, and the development approach itself, is a good illustration of how the whole technology can get stuck much worse than C ++ somewhere in older models, languages, and in general. It is very difficult to work with this technology and you start essentially either using all sorts of generators like MATLAB or writing something of your own. That is, for people who work purely at the system level, manually shifting bits is normal, but I, as a person who wants, for example, to simulate a set of business rules in hardware, do not like this approach at all, and the language not enough to explain at a high level what I need.

And I'm just not qualified to talk about Java and embedded.

- If you look very rough, the difference between C ++ and Java is in the runtime layer, which, among other things, performs all kinds of optimizations. Which is preferable: using the architectural features of the machine manually (C ++), or is it better to rely on dynamic JVM optimizations?

- Well, I, it seems, has already touched on this issue, but here everything is not for everyone. Just take me, in practice I use all levels of parallelization, that is, SIMD, OpenMP, MPI, and not to mention any specifics like hardware accelerators. There are some kind of SIMD optimizations in Java, now it looks like .NET is slowly getting better, but in fact C ++ still rules in terms of automatic optimizations, and let's not forget that in C ++ you can manually hammer in assembler blocks. I understand that no one knows the assembler right now and many people haven’t seen C ++ in their eyes, but the point is that when it comes to purely computing, that is, mathematics, and I want to quickly, why not?

I don’t really believe in dynamic optimizations, that’s why: if you have a simple loop, let's say the array is summed in it - yes, it can be recognized, parallelized there. The problem is, if, for example, you dragged some kind of addiction from the outside, what should you do? In OpenMP, we have the appropriate markup, and the dynamic optimizer will not be able to solve such problems, never. Therefore, someone, for example, will look at CUDA and say that this model is absolutely unrealistic, why should I rewrite all the algorithms, and even learn something? And as for me, this is inevitable, because optimizers work very well on understandable, simple things, do all sorts of inlining, but all that performance-critical can be written with your own hands, written in native code and not be tormented.

- How dynamic is the Java and C ++ ecosystem? How often do updates, releases, standards come out? How lively languages (how many language features appear)?

- Well, I think we can say that the “pluses” are immortal, unlike Java as a language, where many new languages with interesting features appeared - these are Scala and Kotlin and others. Another thing is that language and platform are two different things. Java as a language does not suit many, which is why new languages are actually. But as a platform - everything is fine there, apparently, again, there are advantages even in comparison with the nearest competitor (for example, in terms of GC). But as a language - there are plenty of grounds for complaints.

I must speak about C ++ here, probably. Of course, after the community of about 13 did not do anything at the time, new standards and new library features are, of course, good, wonderful, I would even say, compared to complete apathy. A lot has happened in C ++ 11, a lot of really useful advances, I am now writing a completely different C ++ than before. In C ++ 14, they still improved a little, but in C ++ 17 the whole world disappointed again - that is, those features that were waiting for everyone would not be there. The main feature that everyone wanted and want in general is the modules. It’s just that now C ++ compiles very slowly, or rather, primary compilation, because judging by MSVC, its incrementality is just super, but assembling it from scratch, for example, is less than average pleasure. Well, the modules should solve this problem, but no one knows when.

Again, in C ++ there is such a problem that the most basic things are not in the standard library. And this is a newcomer who needs to translate a string, say, into lowercase or beat by tokens, just lead to a stupor. There are many third-party libraries, of course, but the usability of libraries is also a question. Languages that have metadata - you see a function there and you know how to use it, even the documentation will appear in a compliment. And in C ++, you can have a template argument of type Func, that is, a function, and you may not understand the signature of the function, even if you get into sorts. And it is not clear what to do with this, in fact.

In general, to summarize, I would say that both languages are as if alive, and it all depends on what you actually need. In general, you can write productively on both. Regarding Lib, here the pros lose, this is clear to everyone, and surprisingly, because the language seems to be older, much, and the libraries, well, even if they exist, they are not very usable from the C world, or they simply don’t exist and you need to look for it from somewhere outside, download, compile, and only then use it.

Oleg Krasnov

- Why C?

- When I joined SEMrush, there were no significant developments in server logic in other languages. At that time, I mainly programmed in C and decided to develop a product in this language. I believed in myself. =)
For me, C is a simple and convenient language. With sufficient skill and knowledge of libraries, it is perfect for prototyping developments at the scripting language level.
At SEMrush, among the server-side programming languages, the distribution is approximately the following: 1/3 is C and C ++, 1/3 is scripted, 1/3 is Java.

- Java and C, what do you think is the most popular language now? Both of them are adults, but who is more mature and refined?

- My experience suggests that in language C it is worth developing things that relate to productive tasks. For example, this is work with sockets, data multiplexing, highly loaded multithreaded applications, where you can and should fully manage computer resources.
SEMrush does not explicitly distinguish between programming languages by application area. If you need to start a new product, the choice of a language depends on the professionalism of the person who begins the development of architecture and programming. And also on how communicative he is and is able to convey to his colleagues ideas that he wants to realize.
Quite common tasks for us are the collection and processing of data. Among the reasons why, for example, we do not use Java in all products, because we have no wild inheritance in terms of entities. Due to the specifics of our work, its depth in the vertical plane is less than in the horizontal. That is, a large amount of data will rather be transferred between independent entities than between parents and descendants.

- Does it make sense to develop separate system modules in different languages, sharpening them for specific tasks?

- I think it does. In this regard, everything is very well arranged. The development is carried out by small groups of 5-6 people, each of which works on its own product, and the interaction between them is carried out through the API. Both the user interface and services should interact “well” with each other. This is done, for example, using data formats such as JSON and Binary-JSON. So yes, you can use different languages to write the whole system.

We have our own database written in C, and there were no major problems with its operation. When I was developing the architecture of this base, there were no ready-made adequate tools that would suit. This database works on ordinary files and, taking this into account, is quite reliable. If we exclude from consideration all measures to ensure its smooth operation (backup, clustering), then even a “fallen off” disk (hardware malfunctions or unforeseen technical reasons) will not do much harm, no more than 8.5% of the data will be lost. That is, the likelihood that users will suffer is further reduced at the level of business logic. But in general, the system is designed in such a way that data is not lost. Everything is very reliable.
We once conducted tests to maximize the utilization of hardware performance with 12 disks. If RAID5 was used, then the read / write speed was approximately 2.5x the speed of one hard drive. But our system uses 12 disks separately at the level of business logic, this allows us to achieve an 11-fold increase in speed due to the fact that each stream works with its own disk.

- How do languages generally feel in Enterprise?

- We have large products. For example, one of them bypasses the Internet and creates a database of site pages linking to each other. The core is written in C, which allows you to dispose of "iron", almost 100%. More than 150 servers are involved in this product, but this approach allows us to be sure that we are not overpaying for the server park, and, as you know, we are concerned about financial efficiency. Separately, I note that thanks to agile development processes, we have time both for delivering new features to the user and for honing the performance of each product.

- What are the most common performance problems when developing Enterprise systems and possible solutions?

- To be honest, I can’t even recall any particular problems. If suddenly there is not enough power, then the use of assembler, special libraries and the efforts of high-class programmers can solve the problem very quickly. But in 99% of cases we have no productive solutions.
I have no prejudice about Java, but it is more resource intensive. Yes, with its help it is very convenient to solve problems with complex multi-level business logic, build interconnected systems. However, tasks of this kind, such as network interactions, multithreaded programming, or large binary data, in my opinion, are more suitable for C. In Java, this can also be done, however, I would choose C.

- What do you think should be the “ideal” database? Is it worth bothering with application performance if all the power of the system is leveled by transactions to the database and how can this be fixed?

- An ideal database is not universal. It is made for a specific application and should be well consistent with its business logic. Yes, it is more difficult to design and easier to use ready-made solutions, but this approach allows you to "dramatically" increase productivity.

- Now it is very fashionable to work with big data, for the tasks of processing large arrays, which is better suited - Java or C ++?

- Big data is the holy grail for programmers, and for each it looks different. For some, it’s petabytes, and for someone, exabytes. At the initial stage of processing such data, there are large requirements for the level of iron. Therefore, it is most effective to use the C language for their primary processing.

- How dynamic is the Java and C ++ ecosystem? How often do updates, releases, standards come out? How lively languages (how many language features appear)?

- We use the C99 standard, and since then nothing new, rather, improvements to the compiler are more important. Complicating the language is not always good, in my opinion. The latest C ++ standards - C ++ 14 and C ++ 17 (draft) have such a number of new features that it is not clear whether it makes sense to use most of them. I understand that programmers like language development. However, features and amenities must be used correctly. The race for features as an end in itself is ineffective and often harms the main idea of the product.
Periodically, I observe the use of C ++ as C with classes, or even C without classes with some C ++ containers. This is irrational. It turns porridge. If you plan to use C ++, you should carefully study its features and use them most fully.

Vladimir Sitnikov

- What languages can give in the world of highly loaded servers? Does it make sense to develop separate system modules in different languages, sharpening them for specific tasks? If you could (would you like or just have the opportunity), would you use C ++ to solve problems, or would you do everything in only one language?

- In the world of enterprise applications, such a division is not common. An interesting factor is application support. If the application uses 10 different languages and, in fact, 10 different ecosystems, then it is very difficult to maintain. As a result, if one company writes the program and deploys, uses and supports another, then it is extremely important to maintain universality and ease of maintenance. Here, even the use of two different languages can already greatly increase the entry threshold for support. The second important factor is the convenience of analyzing problems. For example, if a java program decided to read and save a multi-gigabyte file in memory, then without problems you can find out which file, why, etc. In the case of C ++, such an analysis is much more difficult to do. As a result, for enterprise development, java is not only a convenient development platform,

- Using Unsafe in Java, is it justified or not? Why not use C ++ then?

- It is naive to believe that Unsafe in Java is needed (used) in order to allocate and free memory bypassing the garbage collector. From the good life of unsafe java code, there are many objects. For example, in OpenJDK, java.util.HashMap uses intermediate Map.Entry objects that do not store the data themselves, but store only references to the key and value. Such an implementation not only increases storage overhead, but also slows down the work, because random access to memory is always more difficult and slower than sequential access. Due to the semantics of the language, in many cases the javac and the JIT compiler are forced to leave this heap of small objects.

If you look at C ++, then the hash tables are implemented without unnecessary entry objects. Data is stored in memory more compactly, and the value is stored next to the key. C ++ allows you to be closer to hardware.

What do java programmers do? Switch to the dark side of Unsafe, off-heap and others like him? First of all, you need to take measurements, and make sure that the problem for your case really exists. Most typical java applications work fairly well with simple HashMap under the hood.

If you are a “professional on a closed track”, then you can try this (Unsafe, VarHandlers, memory mapped files, etc.). If the problem is really important for you, then you should participate in discussions of JEP 169: Value Objects - there is a discussion on how to smooth out the problem of redundant objects in Java without compromising code reliability and development speed.

- What are the most common performance problems when developing Enterprise systems and possible solutions?

- Typical performance problems are usually caused not by the language itself, but by how different applications interact with each other. For example, these are slow SQL queries. Those. inefficient work with the database. If the system executes thousands of queries, then the performance does not depend on the language, but on the algorithms used. “Correct” TK and the completeness of non-functional requirements can solve many potential problems. This problem arises not only when using a database, but it is not solved by itself during the transition to microservices. And with the incorrect granularity of the API of these microservices, out of the blue, 1000 calls may occur and the speed of work will be determined not by the speed of languages, but by the quantity and quality of these calls.

- What can you say about the imminent release of JDK 9 and its main feature - modularity? Is this an attempt to solve the specific difficulties faced by Java developers?

- When using Java, one and inconvenient situations is a long launch of the application. For example, when starting a server application, a container may take 30 or more seconds, depending on the hardware and the number of loaded libraries.

And why does the Java machine recompile all application classes every time it starts? Why can't she reuse the newly generated machine code? The composition of .class files may be updated and the JVM does not have any guarantees regarding the composition of the application classes. At first glance, modularity in JDK9 gives the programmer nothing, but at the second glance it turns out that it gives a lot to the java machine itself. The Java machine can use bolder assumptions about what code can be executed, and thus modularity opens up the possibility of accelerating startup time.

A more prosaic benefit of modularity is reducing the size of the distribution of java machines. It is unlikely that developers will go in order to create their own JVM assemblies in order to save several megabytes, but in some cases the opportunity will be very useful.

- If you look very roughly, then the difference between C ++ and Java is in the runtime layer, which, among other things, performs all kinds of optimizations?

- One of the key features of Java is the ability to separately compile code without sacrificing performance for the final application. For example, a java library can work with the Iterator interface. Yes, when compiling the library into bytecode, there will be “virtual” calls (invokeinterface). But this does not prevent getting good performance if, in fact, the same Iterator implementation is used in a specific place in our program. The JIT compiler sees which, in fact, objects are used, and it generates machine code without unnecessary searches "and where the passed object has the hasNext method". As a result, the programmer writes convenient code on the interfaces, and when executed, the entire iterator turns into a single processor register that stores the "current position".

If we talk about C ++, then if the library was compiled and the library makes a virtual call, then everything, the user has no chance to make the call non-virtual.

In general, one of the most effective ways to optimize is to eliminate unnecessary actions. In order for C ++ libraries to “see” the features of their use, C ++ programmers have to do various squats (clang LLVM, always compile all libraries with application code, etc.).

Whose side are you on?

Tags:

Let the Holy War begin: Java vs C ++

Andrey Pangin

Dmitry Nesteruk

Oleg Krasnov

Vladimir Sitnikov

Also popular now: