Highload in Java: Things to Remember
Highload is a fashionable and rather hackneyed topic at the same time, especially since there is no clear definition of what “Highload” is. For clarity, let's call “Highload” a network application that should handle 1000 requests per second. And the application that processes 1 request per second, respectively, is “not Highload”. My experience shows that between the first and second there is a significant difference in architecture, development approaches and problems. In this article I will try to state these differences, as I understand them.
So…
You just have to put up with it. A very large amount of code is not able to work under heavy load - due to excessive memory requirements, a processor, inefficient synchronization, outdated I / O mechanisms or poor error handling due to lack of resources. Even if your favorite library is written by very smart people and has no flaws, it may not be suitable simply because when designing it, efficiency was sacrificed for convenience or ease of use. Therefore - be prepared to live with self-records or look for those few solutions that will pass stress tests for your specific application. No one can believe in this matter.
A rare highly loaded system can do without caches - especially over a DBMS. Proper caching in a distributed system is a big and complex topic, so it makes sense to think about the data right away: who will update and request it and how, and where and how you can sacrifice the integrity or relevance of the data.
In general, the simpler the system, the faster it works. One must strive for maximum simplicity, often to the detriment of the comprehensibility, conceptuality or beauty of architecture.
When choosing a serialization format, take care to multiply the average packet size by the number of packets per second in advance and compare with the bandwidth of your network. In this regard, Json is better than XML, Protobuf is better than JSON, and sometimes you have to invent your own format, with a better degree of packaging.
Some objects in Java require additional memory in order to work, so you do not need to expect that if -Xmx is set, then the application will definitely fit into the server. For example, each thread in Java requires between 256K and 2 megabytes of off-heap memory for its operation. When you multiply by 1000, you already get quite a lot, so watch out for the number of threads used by your application.
Even if there are no strict latency requirements, garbage collection must be kept in mind. Try to limit the number of allocations and, especially, the total amount of allocated memory for each request. All the necessary metrics are in the Java Mission Control profiler.
The statement “a decent application should always log input and output data” is true, but in a heavily loaded application it will easily become a bottleneck in your system. Contention on the logger, insufficient hard drive speed, gigabytes of logs per hour - all this is a harsh reality. Therefore, you need to choose the data for logging very carefully and remember about the power traces - they take up a lot of space and with a large number of errors can put the application. Often you do not have to write logs to disk at all, but send them over the network to a specialized system (and remember that the network is also not rubber, huh).
What if an application receives more requests than it can chew? If the base suddenly began to respond half as fast? If network loss started? A good application should not hang tight, but say “come tomorrow”. Moral - there must be timeouts for all interactions with external systems, the application must limit the number of concurrently processed requests, and the client must know what to do if the application is busy or does not respond.
Another complex and extensive topic, but in a nutshell - if the application hangs up the car, then somewhere there should be graphs of memory consumption, swap, disk, CPU, threads, system descriptors.
An application can easily die under high load, behaving decently under a low load. If it worked for weeks on 1 request per second, it could die for 1000 due to a negligible memory or resource leak, network congestion, disk overload or overflow, and another 1001 reasons. Therefore - always drive the build at full load before release to the prod.
That's all. Comment, correct, share your experience.
So…
Many frameworks have performance limits.
You just have to put up with it. A very large amount of code is not able to work under heavy load - due to excessive memory requirements, a processor, inefficient synchronization, outdated I / O mechanisms or poor error handling due to lack of resources. Even if your favorite library is written by very smart people and has no flaws, it may not be suitable simply because when designing it, efficiency was sacrificed for convenience or ease of use. Therefore - be prepared to live with self-records or look for those few solutions that will pass stress tests for your specific application. No one can believe in this matter.
Caching, caching and caching again
A rare highly loaded system can do without caches - especially over a DBMS. Proper caching in a distributed system is a big and complex topic, so it makes sense to think about the data right away: who will update and request it and how, and where and how you can sacrifice the integrity or relevance of the data.
Simplicity
In general, the simpler the system, the faster it works. One must strive for maximum simplicity, often to the detriment of the comprehensibility, conceptuality or beauty of architecture.
Network bandwidth has limits
When choosing a serialization format, take care to multiply the average packet size by the number of packets per second in advance and compare with the bandwidth of your network. In this regard, Json is better than XML, Protobuf is better than JSON, and sometimes you have to invent your own format, with a better degree of packaging.
Do not forget about off-heap
Some objects in Java require additional memory in order to work, so you do not need to expect that if -Xmx is set, then the application will definitely fit into the server. For example, each thread in Java requires between 256K and 2 megabytes of off-heap memory for its operation. When you multiply by 1000, you already get quite a lot, so watch out for the number of threads used by your application.
Gc not rubber
Even if there are no strict latency requirements, garbage collection must be kept in mind. Try to limit the number of allocations and, especially, the total amount of allocated memory for each request. All the necessary metrics are in the Java Mission Control profiler.
More accurate with logging
The statement “a decent application should always log input and output data” is true, but in a heavily loaded application it will easily become a bottleneck in your system. Contention on the logger, insufficient hard drive speed, gigabytes of logs per hour - all this is a harsh reality. Therefore, you need to choose the data for logging very carefully and remember about the power traces - they take up a lot of space and with a large number of errors can put the application. Often you do not have to write logs to disk at all, but send them over the network to a specialized system (and remember that the network is also not rubber, huh).
Out of Resource Behavior
What if an application receives more requests than it can chew? If the base suddenly began to respond half as fast? If network loss started? A good application should not hang tight, but say “come tomorrow”. Moral - there must be timeouts for all interactions with external systems, the application must limit the number of concurrently processed requests, and the client must know what to do if the application is busy or does not respond.
Do normal monitoring
Another complex and extensive topic, but in a nutshell - if the application hangs up the car, then somewhere there should be graphs of memory consumption, swap, disk, CPU, threads, system descriptors.
And always test the application under load
An application can easily die under high load, behaving decently under a low load. If it worked for weeks on 1 request per second, it could die for 1000 due to a negligible memory or resource leak, network congestion, disk overload or overflow, and another 1001 reasons. Therefore - always drive the build at full load before release to the prod.
That's all. Comment, correct, share your experience.