INIT April 5, 2015 at 15:05

Lectures of the Technopark. 3 semester. Designing Highly Loaded Systems

Tutorial

And again on the air our constant heading “Lectures of the Technopark”. This time we suggest you familiarize yourself with the materials of the course "Designing Highly Loaded Systems". The purpose of the course is to provide students with the skills to design highly effective software systems.

Lecture 1. Introduction to Highload

At the beginning of the lecture, a definition is given - what can be considered a highly loaded system and in what units the load is measured. The features of such systems are explained, briefly talking about the Slashdot effect. Criteria of high site availability in terms of downtime for a month and a year are given. Various web server architectures, a typical website device, and LAMP technology are described. The following describes the methods for connecting dynamic content: CGI, FastCGI, UWSGI, mod_perl, mod_php, self-written modules, node.js, content_by_lua. The concept of blocking operations and methods of non-blocking processing are considered.

Lecture 2. Network Subsystem

The lecture begins with an explanation of what factors affect the throughput of the system: network delays, speed of light and the distance between the data centers, TCP-handshake, packetloss and TCP-retransmit. Explains how to identify bottlenecks in terms of bandwidth. The concept of Looking Glass is considered as one of the tools for diagnosing bandwidth problems. The following is a discussion of the OSI model for TCP / IP and the nuances of routing. Then it talks about possible network problems and various ways to solve them (UDP, multicast, Jumbo-frames, socket per process, multi-threaded network cards). A substantial part of the lecture is devoted to optimizing the TCP / IP stack for high load.

Lecture 3. HTTP protocol and web optimization

Some web optimization rules are listed. The features of browsers used to optimize page load time are considered. It touches on gzip data compression, reducing the number of queries, minimizing the number of queries to DNS, as well as script files and CSS for statically forced caching. Explains how to analyze information obtained using the Conditional GET. After that, we talk about the possibilities for optimizing redirects and CSS sprites. Then information is given about keep-alive, chunked, and the proper use of cookies. Explains the benefits of multiple connections to the domain, the removal of long requests in AJAX or iframe.

Lecture 4. Load Scaling

First, a definition of the scaling of the load and its types (vertical and horizontal) is given. The following details the load balancing algorithms (random, round-robin, weighted round-robin, least connections, least response time, load-based). The following balancing tools are considered: Round-Robin DNS, xixi DNS, L4-balancers (Cisco CSS, LVS), L7-balancers (Cisco ACE, LVS, nginx).

Lecture 5. RAM

The hardware configuration of a typical server is analyzed, the physical memory device and the reasons for the decrease in overall system performance are explained. It tells how caching is organized at the hardware level, and also discusses practical ways to speed up working with server memory (sequential reading with a margin, reading without jumps between rows, prefetching).

Lecture 6. Databases and disk subsystem

First, it tells about the development of hard drives and the current state in terms of performance with linear, random and competitive access. The features, advantages and disadvantages of different types of disk arrays, including software, are compared. Then, Ext4 and XFS file systems are examined. Mentioned is the third level of virtualization of hard drives - LVM (Logical volume manager). The second part of the lecture is devoted to databases. First, the advantages and disadvantages of the MySQL and PostgreSQL DBMSs are revealed in detail. The structure of the costs of fulfilling the request, as well as the planning of the request itself, is analyzed. It also talks about methods of accelerating systems built on databases: tuning, replication, sharding, minimizing network latency, NoSQL, writing a specialized database.

Lecture 7. Typical architectural solutions

At the beginning of the lecture, it explains how frontend and backend servers differ, the creation of specialized server groups by load types (by functions, by importance, stability, and shards) is considered. The criteria of complexity and reliability of various architectural solutions are listed, and tips are given for choosing components, technologies and programming languages. The following describes optimization methods (replacing equipment, using a different algorithm, writing code, parallelizing tasks to different servers, etc.). After that, we discuss ways to handle errors when making requests, ways to cache data to reduce load during peak hours. Then it tells about the recording and processing of logs, about monitoring the load and operation of both the entire system and its components.

Previous issues:

Subscribe to the Technopark youtube channel !

Tags: