Increase container density on a node using PFCACHE technology



    One of the goals of the hosting provider is the maximum possible disposal of existing equipment to provide quality service to end users. The resources of the end servers are always limited, however, the number of hosted client services, and in our case we are talking about VPS, can vary significantly. Read about how to get on a Christmas tree and eat a burger under a cat.

    Consolidating a VPS on a node in such a way that clients do not feel this at all helps a lot to increase the economic performance of any hosting provider. Of course, a node should not burst at the seams if containers are crammed into it to the eyeballs, and all clients immediately feel any surge in load.

    How many VPS can be placed on one node depends on many factors, such obvious as:

    1. Characteristics of the iron of the node itself
    2. VPS Size
    3. VPS load pattern
    4. Software Technologies That Help Optimize Density

    In this case, we will share our experience using Pfcache technology for Virtuozzo.
    We use the 6th branch, but everything said is true for the 7th.

    Pfcache is a Virtuozzo mechanism that helps deduplicate IOPS and RAM in containers, allocating the same files in containers to a separate common zone.

    In fact, it consists of:

    1. Kernel code
    2. User-space daemon
    3. User-space utilities

    On the node side, we highlight a whole section in which files will be created that all VPS on the node will directly use. The ploop block device is mounted in this section. Further, at the start of the container, it receives a reference to this section:

    [root@pcs13 ~]# cat /proc/mounts
    ...
    /dev/ploop62124p1 /vz/pfcache ext4 rw,relatime,barrier=1,data=ordered,balloon_ino=12 0 0
    ...
    /dev/ploop22927p1 /vz/root/418 ext4 rw,relatime,barrier=1,data=ordered,balloon_ino=12,pfcache_csum,pfcache=/vz/pfcache 0 0
    /dev/ploop29642p1 /vz/root/264 ext4 rw,relatime,barrier=1,data=ordered,balloon_ino=12,pfcache_csum,pfcache=/vz/pfcache 0 0
    ...
    

    Here are sample statistics on the number of files on one of our nodes:

    [root@pcs13 ~]# find /vz/pfcache -type f | wc -l
    45851
    [root@pcs13 ~]# du -sck -h /vz/pfcache
    2.4G    /vz/pfcache
    2.4G    total
    

    The principle of pfcache is as follows:

    • The user-space Pfcached daemon prescribes the sha-1 hash of the file in the xattr attribute of this file. Files are not processed all, but only in the directories / usr, / bin, / usr / sbin, / sbin, / lib, / lib64
    • It is most likely that the files in these directories will be “shared” and will be used by several containers;
    • Pfcached periodically collects statistics on reading files from the kernel, analyzes it, and adds files to the cache if their use is frequent;
    • These directories may be different, and are configured in the configuration files.
    • When reading a file, it is checked whether it contains the specified hash in the extended xattr attributes. If it contains, the “common” file is opened, instead of the container file. This substitution occurs unnoticed by the container code, and hides in the kernel;
    • When writing to a file, the hash is invalidated. Thus, at the next opening, the container file will be opened directly, and not its cache.

    By keeping shared files from / vz / pfcache in the page cache, we are saving this cache as well as saving IOPS. Instead of reading ten files from disk, we read one that goes directly to the page cache.

    struct inode {
    ...
     struct file             *i_peer_file;
    ...
    };
    struct address_space {
    ...
     struct list_head        i_peer_list;
    ...
    }
    

    The VMA list for the file remains the same (deduplicate memory) and read less often from disk (save iops). Our "common fund" is hosted on SSD - an additional gain in speed.

    Example for caching the / bin / bash file:

    [root@pcs13 ~]# ls -li /vz/root/2388/bin/bash
    524650 -rwxr-xr-x 1 root root 1021112 Oct  7  2018 /vz/root/2388/bin/bash
    [root@pcs13 ~]# pfcache dump /vz/root/2388 | grep 524650
    8e3aa19fdc42e87659746f6dc8ea3af74ab30362 i:524650      g:1357611108  f:CP
    [root@pcs13 ~]# sha1sum /vz/root/2388/bin/bash
    8e3aa19fdc42e87659746f6dc8ea3af74ab30362  /vz/root/2388/bin/bash
    [root@pcs13 /]# getfattr -ntrusted.pfcache /vz/root/2388/bin/bash
    # file: vz/root/2388/bin/bash
    trusted.pfcache="8e3aa19fdc42e87659746f6dc8ea3af74ab30362"
    [root@pcs13 ~]# sha1sum /vz/pfcache/8e/3aa19fdc42e87659746f6dc8ea3af74ab30362
    8e3aa19fdc42e87659746f6dc8ea3af74ab30362  /vz/pfcache/8e/3aa19fdc42e87659746f6dc8ea3af74ab30362
    

    We calculate the efficiency of use with a ready-made script .

    This script passes through all containers on the node, calculating the cached files of each container.

    [root@pcs16 ~]# /pcs/distr/pfcache-examine.pl
    ...
    Pfcache cache uses 831 MB of memory
    Total use of pfcached files in containers is 39837 MB of memory
    Pfcache effectiveness: 39006 MB
    

    Thus, we save about 40 gigabytes of files in containers from memory, they will be loaded from the cache.

    For this mechanism to work even better, it is necessary to place the maximum “identical” VPS on the node. For example, those for which the user does not have root access and on which the environment from the expanded image is configured.

    You can tune pfcache operation through the config file
    /etc/vz/pfcache.conf

    MINSIZE, MAXSIZE - minimum / maximum file size for caching
    TIMEOUT - timeout between caching attempts A

    complete list of parameters can be found at the link .

    Also popular now: