Linux from scratch distribution for building Docker images - our experience with dappdeps


    Assembling images for Docker based on a base image typically involves invoking commands in the environment of that base image. For example, calling the apt-get command, which is in the base image, to install new packages.

    Often there is a need to reinstall a certain set of utilities into the base system, with the help of which the installation or assembly of some files that are required in the final image takes place. For example, to build a Go-application, you need to install the Go compiler, put all the application source code in the base image, compile the required program. However, in the final image, only a compiled program is required without the entire set of utilities that was used to compile this program.

    The problem is well-known: one of the ways to solve it may be to build an auxiliary image and transfer files from the auxiliary image to the resulting one. To do this, Docker multi-stage builds or artifact images appeared in dapp ( Updated August 13, 2019: the dapp project has now been renamed to werf , its code has been rewritten to Go, and the documentation has been significantly improved). And this approach ideally solves a problem like transferring the results of compiling source codes into a final image. However, he does not solve all possible problems ...

    Here is another example: Chef in local mode is used to build the image. To do this, chefdk is installed in the base image, recipes are mounted or added, these recipes are launched, which configure the image, install new components, packages, config files, and more. Similarly, another configuration management system, for example, Ansible, can be used. However, the installed chefdk takes about 500 MB and significantly increases the size of the final image - leaving it there makes no sense.

    But multi-stage builds in Docker will not solve this problem. What if the user does not want to know what the side effect of the program is, in particular, what files it creates? For example, in order not to keep unnecessary explicit descriptions of all exported paths from the image. I just want to run the program, get some kind of result in the image, but so that the program and all the environment needed for its work to remain outside the final image .

    In the case of chefdk, it would be possible to mount the directory with this chefdk into the build image at build time. But there are problems with this solution:

    1. Not every program needed for assembly is installed in a separate directory, which is easy to mount in an assembly image. In the case of Ansible, you need to mount Python in a non-standard place so as not to conflict with system Python, which may already cause problems.
    2. The mounted program will depend on the underlying image used. If the program is built for Ubuntu, then it may not start in an environment not intended for it - for example, in Alpine. Even chefdk, which is an omnibus package with all its dependencies, still depends on the system glibc and will not work in Alpine, which uses musl libc.

    But what if we can prepare some static unchanging set of all possible useful utilities, which will be so cleverly linked that it will work in any basic image , even scratch? After connecting such / such images to the base one, only the empty mount-point directory in which these utilities were connected will remain in the final image.

    In search of adventures


    Theory


    Need to get the image, which contains a set of programs in a statically defined custom directory - eg /myutils. Any program in /myutilsshould depend only on the libraries in /myutils.

    A dynamically compiled program on Linux depends on the location of the ld-linux linker on the system. For example, the binary bashin ubuntu:16.04compiled so that it depends on the linker /lib64/ld-linux-x86-64.so.2:

    $ ldd /bin/bash
            linux-vdso.so.1 =>  (0x00007ffca67d8000)
            libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007fd8505a6000)
            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd8503a2000)
            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd84ffd8000)
            /lib64/ld-linux-x86-64.so.2 (0x00007fd8507cf000)

    Moreover, this dependence is static and compiled into the binary itself:

    $ grep "/lib64/ld-linux-x86-64.so.2" /bin/bash
    Binary file /bin/bash matches

    Thus, it is necessary: ​​a) to compile the conditional /myutils/bin/bashso that it uses the linker /myutils/lib64/ld-linux-x86-64.so.2; b) so that the linker is /myutils/lib64/ld-linux-x86-64.so.2configured to dynamically link libraries from /myutils/{lib64,lib}.

    The first step is to assemble the image toolchain, which will contain everything that is necessary for the assembly and subsequent work of other programs in a non-standard root directory. To do this, we come in handy with the Linux From Scratch project instructions .


    Build the dappdeps distribution


    Why is the set of images of our “distribution” called dappdeps ? Because these images are used by the dapp collector - they are assembled for the needs of this project.

    So, our final goal :

    • A dappdeps / toolchain image with the GCC compiler for building other applications and the glibc library.
    • A dappdeps / base image with a set of programs and all dependent libraries: bash, gtar, sudo, coreutils, findutils, diffutils, sed, rsync, shadow, termcap.
    • An image of dappdeps / gitartifact with the Git utility and all the dependencies.
    • The dappdeps / chefdk image with the chefdk omnibus package that contains all Chef dependencies, including Ruby interpreter.
    • The dappdeps / ansible image with the Ansible utility, which contains all the dependencies, including Python interpreter.

    Dappdeps images may depend on each other . For example, building dappdeps / base requires the toolchain and glibc from the dappdeps / toolchain image. After compiling all the utilities in dappdeps / base, files from dappdeps / toolchain will be required to run them in runtime.

    The main condition is that the utilities from these images should be located in a non-standard place , namely, in /.dapp/deps/, and not depend on any utilities or libraries in standard system paths. Also, dappdeps images should not have any other files except /.dapp/deps.

    Such images will allow you to create containers based on them with volumes containing utilities, and mount them into other containers using the option --volumes-fromfor Docker.

    Collecting dappdeps / toolchain


    Chapter 5 “Constructing a Temporary System” of the Linux From Scratch manual just describes the process of building a temporary chroot environment in /toolswith some set of utilities, which then assemble the main target distribution.

    In our case, we slightly change the directory of the chroot environment. --prefixWe will indicate in the parameter during compilation /.dapp/deps/toolchain/0.1.1. This is the directory that will appear in the assembly container when dappdeps / toolchain is mounted in it - it contains all the necessary utilities and libraries. All we need is GNU binutils, GCC, and glibc.

    The image is collected using Docker multi-stage builds. In the image based on the ubuntu:16.04whole environment is prepared and compilation and installation of programs in/.dapp/deps/toolchain/0.1.1. Then this directory is copied to the scratch image dappdeps / toolchain: 0.1.1. Dockerfile can be found here .

    The final dappdeps / toolchain image is the “temporary system" in LFS terminology. GCC in this system is still tied to the system paths to libraries, however we will not ensure that GCC works in any base image. The dappdeps / toolchain image is auxiliary, it will be used later, including to build are already really independent of the common system libraries of programs.

    Using Omnibus with dappdeps / toolchain



    Omnibus is used to build projects like chefdk or GitLab . It allows you to create self-contained bundles with the program and all dependent libraries, except for the system linker and libc. All instructions are described in readable, convenient Ruby recipes. The Omnibus project also has a library of already written omnibus-software recipes . So, let's try to describe the assembly of the remaining dappdeps distributions using Omnibus

    . However, in order to get rid of dependence on the system linker and libc, we will collect all the programs in Omnibus using the compiler from dappdeps / toolchain. In this case, the programs will be tied to glibc, which is also in the dappdeps / toolchain.

    To do this, save the contents of dappdeps / toolchain as an archive:

    $ docker pull dappdeps/toolchain:0.1.1
    $ docker save dappdeps/toolchain:0.1.1 -o dappdeps-toolchain.tar

    Add this archive through the directive Dockerfile ADDand unzip the contents of the archive to the root of the assembly container:

    ADD ./dappdeps-toolchain.tar /dappdeps-toolchain
    RUN tar xf /dappdeps-toolchain/**/layer.tar -C /

    Before starting the assembly via omnibus, add the PATHpath to the variable /.dapp/deps/toolchain/0.1.1/binas a priority, so that GCC from dappdeps / toolchain is used.

    The result of Omnibus is a package (in our case, DEB), the contents of which are unpacked and transferred to /.dapp/deps/{base|gitartifact|...}using Docker multi-stage builds similar to dappdeps / toolchain.

    Build dappdeps / base


    The project for Omnibus is described using the project file omnibus/config/projects/dappdeps-base.rb:

    name 'dappdeps-base'
    license 'MIT'
    license_file 'LICENSE.txt'
    DOCKER_IMAGE_VERSION = "0.2.3"
    install_dir "/.dapp/deps/base/#{DOCKER_IMAGE_VERSION}"
    build_version DOCKER_IMAGE_VERSION
    build_iteration 1
    dependency "dappdeps-base"

    This file contains all the dependencies of the dappdeps-base Omnibus package and the target installation directory. Dependencies can be located either in a separate repository (for example, omnibus-software ), or in a directory omnibus/config/software. Each file in this directory describes instructions for installing a package / component. For dappdeps-base in the Omnibus software-written prescriptions, missing from the standard repository software-omnibus: acl, attr, coreutils, diffutils, findutils, gtar, rsync, sed, shadow, sudo, termcap.

    Let's rsynclook at an example of what the software recipe for Omnibus looks like:

    name 'rsync'
    default_version '3.1.2'
    license 'GPL-3.0'
    license_file 'COPYING'
    version('3.1.2') { source md5: '0f758d7e000c0f7f7d3792610fad70cb' }
    source url: "https://download.samba.org/pub/rsync/src/rsync-#{version}.tar.gz"
    dependency 'attr'
    dependency 'acl'
    dependency 'popt'
    relative_path "rsync-#{version}"
    build do
      env = with_standard_compiler_flags(with_embedded_path)
      command "./configure --prefix=#{install_dir}/embedded", env: env
      command "make -j #{workers}", env: env
      command 'make install', env: env
    end

    The directive sourceindicates the URL from where to download the source codes. Dependencies on other components are specified by directive dependencyby name. The name of the component to be assembled is specified by the directive name. Each software recipe, in turn, may indicate dependencies on other components. Inside the block build, standard build commands from source codes are indicated.

    The Omnibus and Dockerfile project for dappdeps / base can be found here .

    Collecting dappdeps / gitartifact


    In the case of dappdeps-gitartifact, only a Git build recipe is needed, and it is already in omnibus-software - all that remains is to connect it to the current Omnibus. Otherwise, everything is similar.

    The Omnibus and Dockerfile project for dappdeps / gitartifact can be found here .

    Collecting dappdeps / chefdk


    For chefdk, there is already a ready-made Omnibus project . It remains only to add it to the assembly container via the Dockerfile and replace the standard installation paths of chefdk /opt/chefdkwith /.dapp/deps/chefdk/2.3.17-2(our installation path will include a version of Chef).

    The dockerfile for building dappdeps / chefdk can be found here .

    Collecting dappdeps / ansible


    To build Ansible, we also start an Omnibus project in which we install the Python interpreter, pip and describe the software recipe for Ansible:

    name "ansible"
    ANSIBLE_GIT_TAG = "v2.4.4.0+dapp-6"
    dependency "python"
    dependency "pip"
    build do
      command "#{install_dir}/embedded/bin/pip install https://github.com/flant/ansible/archive/#{ANSIBLE_GIT_TAG}.tar.gz"
      command "#{install_dir}/embedded/bin/pip install pyopenssl"
    end

    As you can see, the image with Ansible is built-in Python, pip and installed via pip Ansible with dependencies.

    The Omnibus and Dockerfile project for dappdeps / ansible can be found here .

    How to use dappdeps distribution?


    To use dappdeps images through mounting volumes, you must first create a container for each image and specify which volume is stored in this container. This is what Docker requires at the moment.

    $ docker create --name dappdeps-toolchain --volume /.dapp/deps/toolchain/0.1.1 dappdeps/toolchain:0.1.1 no-such-cmd
    13edda732176a44d7d822202d8327565b78f4a2190368bb1df46cdad1e127b6e
    $ docker ps -a | grep dappdeps-toolchain
    13edda732176        dappdeps/toolchain:0.1.1      "no-such-cmd"       About a minute ago   Created                                 dappdeps-toolchain

    The container is called dappdeps-toolchain: by this name all declared volumes of this container can be used to mount to other containers with --volumes-from. An arbitrary text command parameter no-such-cmdmust be specified for Docker, but this container will never be launched - it will remain in state Created.

    Create the remaining containers:

    $ docker create --name dappdeps-base --volume /.dapp/deps/base/0.2.3 dappdeps/base:0.2.3 no-such-cmd
    20f524c5b8b4a59112b4b7cb85e47eee660c7906fb72a4935a767a215c89964e
    $ docker create --name dappdeps-ansible --volume /.dapp/deps/ansible/2.4.4.0-10 dappdeps/ansible:2.4.4.0-10 no-such-cmd
    cd01ae8b69cd68e0611bb6c323040ce202e8e7e6456a3f03a4d0a3ffbbf2c510
    $ docker create --name dappdeps-gitartifact --volume /.dapp/deps/gitartifact/0.2.1 dappdeps/gitartifact:0.2.1 no-such-cmd
    2c12a8743c2b238d90debaf066e29685b41b138c10f2b893a815931df866576d
    $ docker create --name dappdeps-chefdk --volume /.dapp/deps/chefdk/2.3.17-2 dappdeps/chefdk:2.3.17-2 no-such-cmd
    4dffe74c49c8e4cdf9d749177ae9efec3bdae6e37c8b6df41b6eb527a5c1d891

    So we got to the climax for which all this fuss was thought. So, as a demonstration of the possibilities, install the packages in the Alpine image nginxand tree, launching Ansible from dappdeps / ansible via Bash from dappdeps / base :

    $ docker run -ti --name mycontainer --volumes-from dappdeps-toolchain --volumes-from dappdeps-base --volumes-from dappdeps-gitartifact --volumes-from dappdeps-ansible --volumes-from dappdeps-chefdk alpine:latest /.dapp/deps/base/0.2.3/embedded/bin/bash -lc '/.dapp/deps/ansible/2.4.4.0-10/embedded/bin/ansible localhost -m apk -a "name=nginx,tree update_cache=yes"'
     [WARNING]: Unable to parse /etc/ansible/hosts as an inventory source
     [WARNING]: No inventory was parsed, only implicit localhost is available
     [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
    localhost | SUCCESS => {
        "changed": true, 
        "failed": false, 
        "msg": "installed nginx tree package(s)", 
        "packages": [
            "pcre", 
            "nginx", 
            "tree"
        ], 
        "stderr": "", 
        "stderr_lines": [], 
        "stdout": "(1/3) Installing pcre (8.41-r1)\n(2/3) Installing nginx (1.12.2-r3)\nExecuting nginx-1.12.2-r3.pre-install\n(3/3) Installing tree (1.7.0-r1)\nExecuting busybox-1.27.2-r7.trigger\nOK: 6 MiB in 14 packages\n", 
        "stdout_lines": [
            "(1/3) Installing pcre (8.41-r1)", 
            "(2/3) Installing nginx (1.12.2-r3)", 
            "Executing nginx-1.12.2-r3.pre-install", 
            "(3/3) Installing tree (1.7.0-r1)", 
            "Executing busybox-1.27.2-r7.trigger", 
            "OK: 6 MiB in 14 packages"
        ]
    }

    The final chord - we create an image from the resulting container and ... we see that from dappdeps there were only empty mount-point directories in it!

    $ docker commit mycontainer myimage
    sha256:9646be723b91daeaf538b7d92bb8844578abc7acd3028394f543e883eeb382bb
    $ docker run -ti --rm myimage tree /.dapp
    /.dapp
    └── deps
        ├── ansible
        │   └── 2.4.4.0-10
        ├── base
        │   └── 0.2.3
        ├── chefdk
        │   └── 2.3.17-2
        ├── gitartifact
        │   └── 0.2.1
        └── toolchain
            └── 0.1.1
    11 directories, 0 files


    It would seem, what else can you dream of? ..

    Further work and problems


    What are the problems with dappdeps?


    Work is needed to reduce the size of the dappdeps / toolchain. To do this, you need to divide the toolchain into 2 parts: the part needed to build new utilities in dappdeps, and the part with basic libraries like glibc, which must be mounted in runtime to run these utilities.

    For the Ansible apt module to work in dappdeps / ansible, I had to add the contents of the python-apt package in Ubuntu directly to the image without rebuilding. In this case, the apt module works without problems in basic images based on DEB, but glibc of a certain version is required. Since apt itself is a distribution-specific module, this is acceptable.

    What is missing in the Dockerfile?


    To use the volume from the dappdeps / toolchain image, you must first create an archive of this image and then add it to another image through the directive Dockerfile ADD(see the section “Using Omnibus with dappdeps / toolchain”). From the side of the Dockerfile, there is not enough functionality that would allow you to simply connect the directory of another image during the build as VOLUME, i.e. analogue option --volumes-fromfor Dockerfile.

    conclusions


    We made sure that the idea works and allows you to use GNU and other CLI utilities in the assembly instructions, run the Python or Ruby interpreter, even run Ansible or Chef in Alpine or scratch images. In this case, the writer of assembly instructions does not need to know the side effect of executing the commands that are run and explicitly list which files need to be imported, as is the case with Docker multi-stage builds.

    The results of this work are also applied in practice : dapp uses dappdeps images in assembly containers. For example, Git from dappdeps / gitartifact is used to work with patches, and the Git utility with some guarantee behaves the same in all base images. However, the way dapp uses dappdeps is beyond the scope of this article.

    The purpose of this article was to convey the idea itself and show on the real practical example the possibility of its application.

    PS All the above dappdeps-images available on hub.docker.com : dappdeps/toolchain:0.1.1, dappdeps/base:0.2.3, dappdeps/gitartifact0.2.1, dappdeps/ansible:2.4.4.0-10, dappdeps/chefdk:2.3.17-2- they can be used.

    Also popular now: