Experience in developing a highly loaded system within the HighLoad Cup

Mail.Ru has offered an interesting championship for backend developers: HighLoad Cup. Which allows you not only to get good prizes, but also to raise your skill as a backend developer. The experience of developing and setting up the environment will be described under the cut.

1. Input


You need to write a fast server that will provide a Web-API for the travelers service.

There are three kinds of entities in the initial data for the server: User (Traveler), Location (Landmark), Visit (Visits). Each has its own set of fields.

The following queries must be implemented:

Get // to get entity data
Get / users // visits to get a list of visits by the user
Get / locations // avg to get the average point of interest
POST // to update
POST // new to create

The maximum penalty time per request is equal to the tank timeout and is 2 seconds (2k microseconds).

The solution should be in one docker container.
Iron used for verification: Intel Xeon x86_64 2 GHz 4 cores, 4 GB RAM, 10 GB HDD.

So, the task is essentially simple, but the knowledge in the docker is 0, the experience of developing under high load in the region of 50%.
For writing php7 + nginx + mysql was chosen as the accumulated experience could be used in subsequent work.

2. Docker


Let's see what Docker is.
Docker - software for automating the deployment and management of applications in a virtualization environment at the operating system level. Allows you to "pack" the application with all its surroundings and dependencies into a container that can be ported to any Linux system with cgroups support in the kernel, and also provides a container management environment.
It sounds just fine, in short, we don’t need to configure nginx / php / apache locally for each project and not get additional dependencies on other projects. For example, there is a site that is not compatible with php7, to work with it you need to switch the php module in apache2 to the desired version. With docker, everything is simple - we launch the container with the project and develop it. We switched to another project, stop the current container and raise a new one.

Docker ideology 1 process - 1 container. That is, nginx with php in its container, mysql in its own. To combine and configure them, docker-compose is used.

Example docker-compose.yml file
version: '2'
services:
 mysql:
   image: mysql:5.7   #из официального репозитория
   environment:
     MYSQL_ROOT_PASSWORD: 12345   #установка root пароля
   volumes:
     - ./db:/var/lib/mysql # сохранение файлов БД на хосте
   ports:
     - 3306:3306  #настройка проброса портов - хост_машина:контейнер
 nginx:
   build:
     context: ./
     dockerfile: Dockerfile  #сборка из докер файла
   depends_on: [mysql]   #установка зависимости
   ports:
     - 80:80
   volumes:
     - ./:/var/www/html #монтирование папки с исходным кодом, меняем его без перезапуска контейнера


We launch:

docker-compose -f docker-compose.yml up

Everything works, there is a connection. We try to send a solution for verification and we carefully read the task - everything should be in 1 container. And the container, in turn, works while the process running through the CMD or ENTRYPOINT command is alive. Since we have several services, you need to use the process manager - supervisord.

Dockerfile Configuration
FROM ubuntu:17.10
RUN apt-get update && apt-get -y upgrade \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y mysql-server mysql-client mysql-common  \
    && rm -rf /var/lib/mysql && mkdir -p /var/lib/mysql /var/run/mysqld \
    	&& chown -R mysql:mysql /var/lib/mysql /var/run/mysqld \
    	&& chmod 777 /var/run/mysqld \
    	&& rm /etc/mysql/my.cnf \
    && 	apt-get install -y curl supervisor nginx \
        php7.1-fpm php7.1-json \
        php7.1-mysql php7.1-opcache \
        php7.1-zip
ADD ./config/mysqld.cnf /etc/mysql/my.cnf
COPY config/www.conf /etc/php/7.1/fpm/pool.d/www.conf
COPY config/nginx.conf 			/etc/nginx/nginx.conf
COPY config/nginx-vhost.conf 		/etc/nginx/conf.d/default.conf
COPY config/opcache.ini 		/etc/php/7.1/mods-available/opcache.ini
COPY config/supervisord.conf 		/etc/supervisord.conf
COPY scripts/ 				/usr/local/bin/
COPY src /var/www/html     #необходимо чтобы исходники при старте проверки были уже внутри контейнера
#Отладка
#RUN mkdir /tmp/data /tmp/db
#COPY data_full.zip /tmp/data/data.zip
ENV PHP_MODULE_OPCACHE on
ENV PHP_DISPLAY_ERRORS on
RUN chmod 755 /usr/local/bin/docker-entrypoint.sh /usr/local/bin/startup.sh
RUN chmod +x /usr/local/bin/docker-entrypoint.sh /usr/local/bin/startup.sh
WORKDIR /var/www/html
RUN service php7.1-fpm start
EXPOSE 80 3306
CMD ["/usr/local/bin/docker-entrypoint.sh"]


The CMD command ["/usr/local/bin/docker-entrypoint.sh"] makes a small configuration of the environment after starting the container and starting the process manager.

Process Manager Setup
[unix_http_server]
file=/var/run/supervisor.sock
[supervisord]
logfile=/tmp/supervisord.log
logfile_maxbytes=50MB
logfile_backups=10
loglevel=info
pidfile=/tmp/supervisord.pid
nodaemon=false
minfds=1024
minprocs=200
user=root
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///var/run/supervisor.sock
[program:php-fpm]
command=/usr/sbin/php-fpm7.1
autostart=true
autorestart=true
priority=5
stdout_events_enabled=true
stderr_events_enabled=true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:nginx]
command=/usr/sbin/nginx -g "daemon off;"
autostart=true
autorestart=true
priority=10
stdout_events_enabled=true
stderr_events_enabled=true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:mysql]
command=mysqld_safe
autostart=true
autorestart=true
priority=1
stdout_events_enabled=true
stderr_events_enabled=true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:startup]
command=/usr/local/bin/startup.sh
startretries=0
priority=1100
stdout_events_enabled=true
stderr_events_enabled=true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0


Using the priority parameter, you can change the startup order, and stdout_logfile / stderr_logfile allows you to display service logs in the container log. The most recent is the startup.sh script, which contains filling the database with data from the archive.
Now you can finally send your brainchild for the first test. The docker commands are similar to git, for sending we use:

docker tag <ваш контейнер-решение> stor.highloadcup.ru/travels/<ваш репозиторий>
docker push stor.highloadcup.ru/travels/<ваш репозиторий>

You can also register on the official website https://cloud.docker.com and add the container there. There you can configure automatic assembly when updating a branch on github or bitbucket and then use an already prepared image in other projects as a basis.

3. Service development


To ensure high performance, it was decided to abandon all frameworks and use bare php + pdo. The framework, although greatly facilitates the development, but pulls a bunch of dependencies that use script execution time.
The starting point will be the index.php script with routing requests and returning results (Router + Controller). Using urls like:

//

Itself implies the use of regulars to determine the route and parameters. It is very flexible and makes it easy to expand the service. But the if option turned out to be faster (Although there is a chance of error, why? Read below).

index.php
$uri = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);
$routes = explode('/', $uri);  //Получение сущности и параметров
$entity = $routes[1] ?? 0;
$id = $routes[2] ?? 0;
$action = $routes[3] ?? 0;
$className = __NAMESPACE__.'\\'.ucfirst($entity);
if (!class_exists($className)) {  //проверка что такая сущность есть
    header('HTTP/1.0 404 Not Found');
    die();
}
$db = new \PDO(
    'mysql:dbname=travel;host=localhost;port=3306', 'root', null, [
        \PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES \'UTF8\'',
        \PDO::ATTR_PERSISTENT         => true
    ]
);  //Подключение к БД
/** @var \Travel\AbstractEntity $class */
$class = new $className($db);
//Обработка POST запросов (добавление/обновление данных)
if ($_SERVER['REQUEST_METHOD'] === 'POST') {
    if (isset($_SERVER['Content-Type'])) {   //Принимаем только json данные
        $type = trim(explode(';', $_SERVER['Content-Type'])[0]);
        if ($type !== 'application/json') {
            header('HTTP/1.0 400 Bad Values');
            die();
        }
    }
    $inputJSON = file_get_contents('php://input');
    $input = json_decode($inputJSON, true);
    //Обновление
    if ($input && $class->checkFields($input, $id !== 'new')) {
        $itemId = (int)$id;
        if ($itemId > 0 && $class->hasItem($itemId)) {
            $class->update($input, $itemId);
            header('Content-Type: application/json; charset=utf-8');
            header('Content-Length: 2');
            echo '{}';
            die();
        }
        //Добавление нового элемента
        if ($id === 'new') {
            $class->insert($input);
            header('Content-Type: application/json; charset=utf-8');
            header('Content-Length: 2');
            echo '{}';
            die();
        }
        //иначе ничего не подошло - ошибка
        header('HTTP/1.0 404 Not Found');
        die();
    }
    //Или отправили плохие данные
    header('HTTP/1.0 400 Bad Values');
    die();
}
//Обработка GET запросов
if ((int)$id > 0) {
    if (!$action) { //нет никаких доп действий, просто возврат сущности
        $res = $class->findById($id);
        if ($res) {
            $val = json_encode($class->hydrate($res));
            header('Content-Type: application/json; charset=utf-8');
            header('Content-Length: '.strlen($val));
            echo $val;
            die();
        }
        header('HTTP/1.0 404 Not Found');
        die();
    }
   //иначе доп действия с сущностями
    $res = $class->hasItem($id);
    if (!$res) {
        header('HTTP/1.0 404 Not Found');
        die();
    }
    $filter = [];
    if (!empty($_GET)) {   //Применение фильтра
        $filter = $class->getFilter($_GET);
        if (!$filter) {
            header('HTTP/1.0 400 Bad Values');
            die();
        }
    }
    header('Content-Type: application/json; charset=utf-8');
    echo json_encode([$action => $class->{$action}($id, $filter)]);
    die();
}
header('HTTP/1.0 404 Not Found');
die();


It looks clumsy, but it works quickly. Next is the main class for processing AbstractEntity data. I will not bring it here, since everything is simple there - insert / update / select (all source code can be viewed on GiHub ). Classes with entities are already formed from it. For example, take the Users entity.

Filter
It checks the data from the GET request for validity and generates a filter for the request in the database. In the code below, there are no checks / filtering for injections, etc. Do not repeat this at home on combat projects.

public function getFilter(array $data)
{
        $columns = [
            'fromDate'   => 'visited_at > ',
            'toDate'     => 'visited_at < ',
            'country'    => 'country = ',
            'toDistance' => 'distance < ',
        ];
        $filter = [];
        foreach ($data as $key => $datum) {
            if (!isset($columns[$key])) {
                return false;
            }
            if (($key === 'fromDate' || $key === 'toDate' || $key === 'toDistance') && !is_numeric($datum)) {
                return false;
            }
            $filter[] = $columns[$key]."'".$datum."'";
        }
        return $filter;
}

Getting places visited by the user
Places and ratings for a specific user are displayed, the filter obtained above can also be applied.

public function visits(int $id, array $filter = [])
{
        $sql = 'select mark, visited_at, place from visits LEFT JOIN locations ON locations.id = visits.location where user = '.$id;
        if (count($filter)) {
            $sql .= ' and '.implode(' and ', $filter);
        }
        $sql .= ' order by visited_at asc';
        $rows = $this->_db->query($sql);
        if (!$rows) {
            return false;
        }
        $items = $rows->fetchAll(\PDO::FETCH_ASSOC);
        foreach ($items as &$item) {
            $item['mark'] = (int)$item['mark'];
            $item['visited_at'] = (int)$item['visited_at'];
        }
        return $items;
}

Calculation of age
This was probably the most discussed topic in a telegram chat. The user's date of birth is set in the timestamp format (number of seconds from the beginning of the Linux era), for example 12333444. But the countdown has been going on since 1970, and there are still people who were born before the 70s. In this case, the timestamp will be negative, for example -123324. Users can be filtered by age, for example, select everyone who is over 18 years old. In order not to calculate age each time when querying the database, I calculated it before adding the user to the database and saved it in an additional field.

Age calculation function:

public static function getAge($y, $m, $d)
{
        if ($m > date('m', TEST_TIMESTAMP) || ($m == date('m', TEST_TIMESTAMP) && $d > date('d', TEST_TIMESTAMP))) {
            return (date('Y', TEST_TIMESTAMP) - $y - 1);
        }
        return (date('Y', TEST_TIMESTAMP) - $y);
}

A “crutch” with TEST_TIMESTAMP is needed to pass the tests, since data + responses are generated simultaneously and unchanged over time. The php date function perfectly converts a negative timestamp to a date, given leap years.

The database database
was created exactly for the entity, all field sizes were TK. DB Engine InnoDb. Indexes have been added to the fields participating in the filter or sorting.

Setting up a web server and database
To improve performance, the settings found on the Internet were used, they should have been the beginning from where to twist the knob for fine-tuning services.

4. Report processing, adjustment of service settings


The source code for php turned out to be of a minimum size and it quickly became clear that I am turning from a backend developer into a system administrator. Rapid tests run on a small amount of data and more serve to verify the correctness of the answers than to test the application under load. A full-fledged tests could be run only 2 times in 12 hours. Testing on my computer did not always lead to understandable results - it could work quickly for me, and falling with error 502 during the test. Because of this, I could not configure memcached, which should speed up server responses.

The only positive thing was the use of the MyISAM engine instead of InnoDb. Tests gave 133 penalty seconds, instead of 250 on InnoDb.

Now about that which did not allow to configure nginx / mysql / php-fpm well -A significant variation in the results of one solution at different times of the day . This thoroughly upset me, as I also had a spread in the evening / morning results for the same solution. I don’t know how the infrastructure of the “combat” verification was arranged for them, but it’s obvious that something could interfere and load the car (it is possible to prepare the next launch solution). And when the score is already milliseconds in the rating, it becomes impossible to fine-tune the server.

Below are the configurations that I stopped at:

mysql
[mysqld_safe]
socket		= /var/run/mysqld/mysqld.sock
nice		= 0
[mysqld]
#
# * Basic Settings
#
user		= mysql
pid-file	= /var/run/mysqld/mysqld.pid
socket		= /var/run/mysqld/mysqld.sock
port		= 3306
basedir		= /usr
datadir		= /var/lib/mysql
tmpdir		= /tmp
lc-messages-dir	= /usr/share/mysql
skip-external-locking
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address		= 127.0.0.1
#
# * Fine Tuning
#
key_buffer_size		= 16M
max_allowed_packet	= 16M
thread_stack		= 192K
thread_cache_size       = 32
sort_buffer_size = 256K
read_buffer_size = 128K
read_rnd_buffer_size = 256K
myisam_sort_buffer_size = 64M
myisam_use_mmap = 1
myisam-recover-options  = BACKUP
table_open_cache       = 64
#
# * Query Cache Configuration
#
query_cache_limit	= 10M
query_cache_size        = 64M
query_cache_type = 1
join_buffer_size = 4M
#
# Error log - should be very few entries.
#
log_error = /var/log/mysql/error.log
expire_logs_days	= 10
max_binlog_size   = 100M
#
# * InnoDB
#
innodb_buffer_pool_size = 2048M
innodb_log_file_size = 256M
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 2
innodb_thread_concurrency = 8
innodb_read_io_threads = 64
innodb_write_io_threads = 64
innodb_io_capacity = 50000
innodb_flush_method = O_DIRECT
transaction-isolation = READ-COMMITTED
innodb_support_xa = 0
innodb_commit_concurrency = 8
innodb_old_blocks_time = 1000


nginx
user  www-data;
worker_processes  auto;
error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;
events {
    worker_connections  2048;
    multi_accept on;
    use epoll;
}
http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';
    sendfile        on;
    tcp_nodelay on;
    tcp_nopush     on;
    access_log off;
    client_max_body_size 50M;
    client_body_buffer_size 1m;
    client_body_timeout 15;
    client_header_timeout 15;
    keepalive_timeout 2 2;
    send_timeout 15;
    open_file_cache          max=2000 inactive=20s;
    open_file_cache_valid    60s;
    open_file_cache_min_uses 5;
    open_file_cache_errors   off;
    gzip_static on;
    gzip  on;
    gzip_vary  on;
    gzip_min_length     1400;
    gzip_buffers        16 8k;
    gzip_comp_level   6;
    gzip_http_version 1.1;
    gzip_proxied any;
    gzip_disable "MSIE [1-6]\.(?!.*SV1)";
    gzip_types  text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript application/json image/svg+xml svg svgz;
    include /etc/nginx/conf.d/*.conf;
}


nginx-vhost
server {
    listen 80;
    server_name _;
    chunked_transfer_encoding off;
    root /var/www/html;
    index index.php index.html index.htm;
    error_log /var/log/nginx/error.log crit;
    location / {
        try_files $uri $uri/ /index.php?$args;
    }
    location ~ \.php$ {
        try_files $uri =404;
        include /etc/nginx/fastcgi_params;
        fastcgi_pass    unix:/var/run/php/php7.1-fpm.sock;
        fastcgi_index   index.php;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_read_timeout 3s;
    }
}


In php-fpm, nothing concrete was achieved.

Before the final, the data volume was increased, unfortunately I did not have enough time to try to further optimize my solution. But after the final, a sandbox was opened where you can still try to drive your decisions and compare the results with the top.

5. Conclusions


I am glad that I participated in this championship. I understood the principle of the docker, a deeper configuration of servers for high loads. And also I liked the competitive spirit and chatting telegram. For the entire championship, C ++ and go programmers were at the top. One could follow suit and write in any of these languages ​​as well. But I wanted to look at my results in what I know and what I work with. Thanks to Mail.Ru for this.

6. References


1. Source code
2. highloadcup.ru first round of competitions

Also popular now: