 July 2, 2019 at 18:17
 July 2, 2019 at 18:17Best reports with HighLoad ++ 2018
Friends, we agreed with the Ontiko company that we will publish the best reports from their conferences on our Youtube channel and share them with you. So we want to not only spread knowledge, but also help our readers and viewers to develop professionally. Catch a selection of the 15 best reports that were made at Highload ++ 2018.
Georgy Kirichenko, Mail.ru Group
Tarantool replication is used to provide high availability by backing up servers or clustering servers for load balancing, and can also be used to perform upgrade operations. Recent versions of Tarantool have several additional features that make it easy to configure and use replication in a cluster.
The report examined the basic principles of the device and the features of asynchronous replication in Tarantool. We dwell on the internal structure of the state vector - vclock. They discuss ways to ensure data consistency and dwell on new features. The basic principles of the configuration, their applicability and the most common errors are considered, and ways to solve the problems with configuration and operation are discussed.
Philip Culin, Deep Forest
Technical details of locks. As the lock mechanism is now organized. Who, what, where, when and how. Why is it so organized. Why ILV is blocking entire networks. What is the problem of the current locking mechanism from a technical point of view. In what direction should we move from a technical point of view within the framework of minimal changes to today's regulatory framework.
Alexander Alekseytsev, OZON.RU
This report is about an automatic warehouse replenishment system. The brain of the system is ML for sales forecasting: setting a task and choosing a loss function, working with attributes, generating a data set, choosing a model, pitfalls of the lightGBM learning process, evaluating results. System skeleton - Spark / Hadoop: daily delivery / validation of data, increasing the reliability of the system. Business realities of procurement of goods: supplier selection, insurance stocks, struggle with the level of service of suppliers.
Alexander also spoke about the use of trained lightGBM models to assess the elasticity of demand for goods at the price of marketing campaign planning and the effect of them. Different types of functions of the dependence of demand on prices for different types of goods and much more received as a "side" effect from the main task.
Anton Soldatov, IPONWEB
IPONWEB has been using Lua to describe business logic for over 10 years. In 2015, they forked LuaJIT and have since been working with their own language implementation. This component of the technological stack is critical for the business, therefore, its stability is given special attention.
Anton told how they created a test base for implementation from scratch; sorted out several cases where tests turned out to be powerless before the complexity of the system under test, and as a result, something broke on the battle servers “suddenly” and “irregularly”. The experience they gained in fixing such errors can be applied to working with LuaJIT. And finally, Anton shared the tools and tricks that their companies use when debugging.
Alexander Tokarev, DataArt
A report on where and how best to organize row level security for a highly loaded project. He described the choice of how to implement row level security in a highly loaded enterprise project (4000 users, 10000 requests at the same time, transactional and olap load at the same time). He analyzed three implementation technologies for row level security in Oracle DBMS, and why it was chosen security in the database, and not on the application server. He talked about the choice made, about the problems and future plans.
Alexander Samoilov, Security Code
Linux Netfilter is at the heart of a huge amount of ITU, both open and commercial. This is a proven, reliable and, more recently, even quite productive solution. But in modern realities, when dozens of gigabits of traffic often have to be passed through the ITU, and the number of filtering rules can exceed a thousand, Linux Netfilter is the bottleneck.
Alexander talked about how they rewrote the Linux network subsystem, which turned out to be fast — dozens of gigabits of stateful and stateless filtering, session tracking, NAT and routing, easy to manage — taught the subsystem to understand the commands of well-known iproute2 and nftables utilities, regardless of the number filtering rules.
Vladislav Shpileva, Tarantool
Until 2018, the only means of horizontal scaling of the Tarantool DBMS was Shard - a module that implements sharding, a special case of horizontal scaling. Shard implements sharding by function from the primary key, supports changing cluster topology, rebalancing. At the same time, he has three significant drawbacks that prevented the use of Shard in one of the important projects.
At the beginning of the year, the development of the new VShard module was completed - this is an alternative implementation of sharding. In it, the rebalancing is performed in stages, you can set an arbitrary shard function to ensure the locality of the associated data, the result of the calculation of the shard function is stored in each record and not recalculated. Vladislav spoke about the internal device of VShard, about its subsystems and implementation with examples of use, and about the new features of VShard 0.2.
Alvaro Hernandez, OnGres (talk in English)
BBM (Black Berry Messenger) is one of the world's largest instant messengers with the functions of text, voice and video communication, its subscriber base is more than 150 million users. He worked on the on-premise DBMS Oracle. We helped to migrate it to PostgreSQL running on GCP with real-time replication with virtually no downtime. Alvaro described in detail the process and pitfalls, techniques, technologies, and best practices for migrating Oracle to PostgreSQL without downtime. Today, many people are interested in such migration, but it requires high qualification and involvement in a process in which it faces many difficulties.
Vadim Podolny, Physical Instrument
From this report, you will learn about the new platform of a distributed control system for nuclear power plants and how to manage the most complex automation facilities in the world. Real-time control of the work of more than 150 special subsystems responsible for various technological processes of nuclear power plants. More than 100K of data sources from sensors and up to 500K of calculated parameters. 5 varieties of physical processes.
With some deviations, the entire system turns into a huge DDoS-source of useful diagnostic information, which interferes with the normal control of the object. You will learn how we “resolve” such problems, learn about the hardware and software architecture of such systems as backup and replication, why data redundancy and technological diversity are needed. How load management is provided, how QoS works. And what will happen if the normal operation system is shut down, as, for example, it was at Fukushima.
Alexander Tobol, Classmates
Service Video in Odnoklassniki - the second site in Runet for video views: 600 million views daily. The streaming platform OK now allows you to conduct professional broadcasts in 4K, stream from your phone to FullHD and give users more than 3 Tb / s of traffic.
Alexander told about:
Ilya Kosmodemyansky, Data Egret
I / O performance issues have been on the daily agenda of database administrators since databases exist. Linux, probably the most popular database operating system, has overhauled the IO stack over the past few years.
Ilya talked about what is happening, why the IO stack needs urgent improvement, and what this can lead to for databases. How the new NVMe and blk-mq drivers will be improved. As a useful memo, Ilya proposed a checklist of PostgreSQL and Linux settings to maximize the performance of the I / O subsystem in the new kernels.
Alexey Akulovich, VK
Alexey raised a lot of topics and questions that people have "from the outside."
For instance:
Oleg Obleukhov, Facebook
Oleg talked about how Facebook balances the load, and what does the DNS infrastructure have to do with it, how resource records fall into the global infrastructure of Facebook, and how the company uses DNS in the dogfooding organization.
Dmitry Stolyarov, Flant
Dmitry shared his experience and told with specific examples in which cases it makes sense to place databases (and, in general, stateful applications) in Kubernetes, and in which it is unjustified, or even harmful and dangerous.
This selection is on our Technostream Youtube channel . We created it to share a variety of educational materials, including the lectures of our educational technology projects. We already wrote on Habré about Technostream, so if you have not heard about it, read it . And come back often, something interesting constantly appears there.
Tarantool Replication: Configuration and Use
Georgy Kirichenko, Mail.ru Group
Tarantool replication is used to provide high availability by backing up servers or clustering servers for load balancing, and can also be used to perform upgrade operations. Recent versions of Tarantool have several additional features that make it easy to configure and use replication in a cluster.
The report examined the basic principles of the device and the features of asynchronous replication in Tarantool. We dwell on the internal structure of the state vector - vclock. They discuss ways to ensure data consistency and dwell on new features. The basic principles of the configuration, their applicability and the most common errors are considered, and ways to solve the problems with configuration and operation are discussed.
Technical aspects of blocking the Internet in Russia. Challenges and Prospects
Philip Culin, Deep Forest
Technical details of locks. As the lock mechanism is now organized. Who, what, where, when and how. Why is it so organized. Why ILV is blocking entire networks. What is the problem of the current locking mechanism from a technical point of view. In what direction should we move from a technical point of view within the framework of minimal changes to today's regulatory framework.
Predicting online store sales using gradient boosting (lightGBM)
Alexander Alekseytsev, OZON.RU
This report is about an automatic warehouse replenishment system. The brain of the system is ML for sales forecasting: setting a task and choosing a loss function, working with attributes, generating a data set, choosing a model, pitfalls of the lightGBM learning process, evaluating results. System skeleton - Spark / Hadoop: daily delivery / validation of data, increasing the reliability of the system. Business realities of procurement of goods: supplier selection, insurance stocks, struggle with the level of service of suppliers.
Alexander also spoke about the use of trained lightGBM models to assess the elasticity of demand for goods at the price of marketing campaign planning and the effect of them. Different types of functions of the dependence of demand on prices for different types of goods and much more received as a "side" effect from the main task.
How we work on the stability of our Lua implementation
Anton Soldatov, IPONWEB
IPONWEB has been using Lua to describe business logic for over 10 years. In 2015, they forked LuaJIT and have since been working with their own language implementation. This component of the technological stack is critical for the business, therefore, its stability is given special attention.
Anton told how they created a test base for implementation from scratch; sorted out several cases where tests turned out to be powerless before the complexity of the system under test, and as a result, something broke on the battle servers “suddenly” and “irregularly”. The experience they gained in fixing such errors can be applied to working with LuaJIT. And finally, Anton shared the tools and tricks that their companies use when debugging.
Place of row level security in a high-load project
Alexander Tokarev, DataArt
A report on where and how best to organize row level security for a highly loaded project. He described the choice of how to implement row level security in a highly loaded enterprise project (4000 users, 10000 requests at the same time, transactional and olap load at the same time). He analyzed three implementation technologies for row level security in Oracle DBMS, and why it was chosen security in the database, and not on the application server. He talked about the choice made, about the problems and future plans.
How we made our own Netfilter with Intel DPDK and prefix trees
Alexander Samoilov, Security Code
Linux Netfilter is at the heart of a huge amount of ITU, both open and commercial. This is a proven, reliable and, more recently, even quite productive solution. But in modern realities, when dozens of gigabits of traffic often have to be passed through the ITU, and the number of filtering rules can exceed a thousand, Linux Netfilter is the bottleneck.
Alexander talked about how they rewrote the Linux network subsystem, which turned out to be fast — dozens of gigabits of stateful and stateless filtering, session tracking, NAT and routing, easy to manage — taught the subsystem to understand the commands of well-known iproute2 and nftables utilities, regardless of the number filtering rules.
VShard - horizontal scaling in Tarantool
Vladislav Shpileva, Tarantool
Until 2018, the only means of horizontal scaling of the Tarantool DBMS was Shard - a module that implements sharding, a special case of horizontal scaling. Shard implements sharding by function from the primary key, supports changing cluster topology, rebalancing. At the same time, he has three significant drawbacks that prevented the use of Shard in one of the important projects.
At the beginning of the year, the development of the new VShard module was completed - this is an alternative implementation of sharding. In it, the rebalancing is performed in stages, you can set an arbitrary shard function to ensure the locality of the associated data, the result of the calculation of the shard function is stored in each record and not recalculated. Vladislav spoke about the internal device of VShard, about its subsystems and implementation with examples of use, and about the new features of VShard 0.2.
BBM's 150M + users Oracle to Postgres migration without downtime
Alvaro Hernandez, OnGres (talk in English)
BBM (Black Berry Messenger) is one of the world's largest instant messengers with the functions of text, voice and video communication, its subscriber base is more than 150 million users. He worked on the on-premise DBMS Oracle. We helped to migrate it to PostgreSQL running on GCP with real-time replication with virtually no downtime. Alvaro described in detail the process and pitfalls, techniques, technologies, and best practices for migrating Oracle to PostgreSQL without downtime. Today, many people are interested in such migration, but it requires high qualification and involvement in a process in which it faces many difficulties.
Highly loaded distributed control system of a modern nuclear power plant
Vadim Podolny, Physical Instrument
From this report, you will learn about the new platform of a distributed control system for nuclear power plants and how to manage the most complex automation facilities in the world. Real-time control of the work of more than 150 special subsystems responsible for various technological processes of nuclear power plants. More than 100K of data sources from sensors and up to 500K of calculated parameters. 5 varieties of physical processes.
With some deviations, the entire system turns into a huge DDoS-source of useful diagnostic information, which interferes with the normal control of the object. You will learn how we “resolve” such problems, learn about the hardware and software architecture of such systems as backup and replication, why data redundancy and technological diversity are needed. How load management is provided, how QoS works. And what will happen if the normal operation system is shut down, as, for example, it was at Fukushima.
4K Million Online Streaming Platform
Alexander Tobol, Classmates
Service Video in Odnoklassniki - the second site in Runet for video views: 600 million views daily. The streaming platform OK now allows you to conduct professional broadcasts in 4K, stream from your phone to FullHD and give users more than 3 Tb / s of traffic.
Alexander told about:
- 4K video streaming pipeline for millions online;
- Content Delivery System Architecture
- TCP tuning for 4K distribution;
- how and why you need to abandon ffmpeg and about cutting video on the GPU;
- what to do if the capacities run out and the users keep coming;
- streaming problems on TCP;
- the future of video streaming.
Recent changes in Linux IO stack from DBA point of view
Ilya Kosmodemyansky, Data Egret
I / O performance issues have been on the daily agenda of database administrators since databases exist. Linux, probably the most popular database operating system, has overhauled the IO stack over the past few years.
Ilya talked about what is happening, why the IO stack needs urgent improvement, and what this can lead to for databases. How the new NVMe and blk-mq drivers will be improved. As a useful memo, Ilya proposed a checklist of PostgreSQL and Linux settings to maximize the performance of the I / O subsystem in the new kernels.
FAQ on architecture and work VKontakte
Alexey Akulovich, VK
Alexey raised a lot of topics and questions that people have "from the outside."
For instance:
- The general architecture of the interaction of our servers.
- Is there a “regular” PHP in VKontakte, where and why. And what other nuclear weapons are used?
- How to update code on tens of thousands of servers in seconds.
- Fault tolerance of memcache clusters with constantly breaking servers.
- Why VKontakte has its own engines (DB), how many of them, and how they live with them.
- How binlog differs from snapshot, and how to "roll back DELETE".
- How can you monitor all this.
Facebook DNS
Oleg Obleukhov, Facebook
Oleg talked about how Facebook balances the load, and what does the DNS infrastructure have to do with it, how resource records fall into the global infrastructure of Facebook, and how the company uses DNS in the dogfooding organization.
Databases and KubernetesDevOps and Operation
Dmitry Stolyarov, Flant
Dmitry shared his experience and told with specific examples in which cases it makes sense to place databases (and, in general, stateful applications) in Kubernetes, and in which it is unjustified, or even harmful and dangerous.
This selection is on our Technostream Youtube channel . We created it to share a variety of educational materials, including the lectures of our educational technology projects. We already wrote on Habré about Technostream, so if you have not heard about it, read it . And come back often, something interesting constantly appears there.