
Linux from scratch distribution for building Docker images - our experience with dappdeps

Assembling images for Docker based on a base image typically involves invoking commands in the environment of that base image. For example, calling the apt-get command, which is in the base image, to install new packages.
Often there is a need to reinstall a certain set of utilities into the base system, with the help of which the installation or assembly of some files that are required in the final image takes place. For example, to build a Go-application, you need to install the Go compiler, put all the application source code in the base image, compile the required program. However, in the final image, only a compiled program is required without the entire set of utilities that was used to compile this program.
The problem is well-known: one of the ways to solve it may be to build an auxiliary image and transfer files from the auxiliary image to the resulting one. To do this, Docker multi-stage builds or artifact images appeared in dapp ( Updated August 13, 2019: the dapp project has now been renamed to werf , its code has been rewritten to Go, and the documentation has been significantly improved). And this approach ideally solves a problem like transferring the results of compiling source codes into a final image. However, he does not solve all possible problems ...
Here is another example: Chef in local mode is used to build the image. To do this, chefdk is installed in the base image, recipes are mounted or added, these recipes are launched, which configure the image, install new components, packages, config files, and more. Similarly, another configuration management system, for example, Ansible, can be used. However, the installed chefdk takes about 500 MB and significantly increases the size of the final image - leaving it there makes no sense.
But multi-stage builds in Docker will not solve this problem. What if the user does not want to know what the side effect of the program is, in particular, what files it creates? For example, in order not to keep unnecessary explicit descriptions of all exported paths from the image. I just want to run the program, get some kind of result in the image, but so that the program and all the environment needed for its work to remain outside the final image .
In the case of chefdk, it would be possible to mount the directory with this chefdk into the build image at build time. But there are problems with this solution:
- Not every program needed for assembly is installed in a separate directory, which is easy to mount in an assembly image. In the case of Ansible, you need to mount Python in a non-standard place so as not to conflict with system Python, which may already cause problems.
- The mounted program will depend on the underlying image used. If the program is built for Ubuntu, then it may not start in an environment not intended for it - for example, in Alpine. Even chefdk, which is an omnibus package with all its dependencies, still depends on the system glibc and will not work in Alpine, which uses musl libc.
But what if we can prepare some static unchanging set of all possible useful utilities, which will be so cleverly linked that it will work in any basic image , even scratch? After connecting such / such images to the base one, only the empty mount-point directory in which these utilities were connected will remain in the final image.
In search of adventures
Theory
Need to get the image, which contains a set of programs in a statically defined custom directory - eg
/myutils
. Any program in /myutils
should depend only on the libraries in /myutils
. A dynamically compiled program on Linux depends on the location of the ld-linux linker on the system. For example, the binary
bash
in ubuntu:16.04
compiled so that it depends on the linker /lib64/ld-linux-x86-64.so.2
:$ ldd /bin/bash
linux-vdso.so.1 => (0x00007ffca67d8000)
libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007fd8505a6000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd8503a2000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd84ffd8000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd8507cf000)
Moreover, this dependence is static and compiled into the binary itself:
$ grep "/lib64/ld-linux-x86-64.so.2" /bin/bash
Binary file /bin/bash matches
Thus, it is necessary: a) to compile the conditional
/myutils/bin/bash
so that it uses the linker /myutils/lib64/ld-linux-x86-64.so.2
; b) so that the linker is /myutils/lib64/ld-linux-x86-64.so.2
configured to dynamically link libraries from /myutils/{lib64,lib}
. The first step is to assemble the image
toolchain
, which will contain everything that is necessary for the assembly and subsequent work of other programs in a non-standard root directory. To do this, we come in handy with the Linux From Scratch project instructions .
Build the dappdeps distribution
Why is the set of images of our “distribution” called dappdeps ? Because these images are used by the dapp collector - they are assembled for the needs of this project.
So, our final goal :
- A dappdeps / toolchain image with the GCC compiler for building other applications and the glibc library.
- A dappdeps / base image with a set of programs and all dependent libraries: bash, gtar, sudo, coreutils, findutils, diffutils, sed, rsync, shadow, termcap.
- An image of dappdeps / gitartifact with the Git utility and all the dependencies.
- The dappdeps / chefdk image with the chefdk omnibus package that contains all Chef dependencies, including Ruby interpreter.
- The dappdeps / ansible image with the Ansible utility, which contains all the dependencies, including Python interpreter.
Dappdeps images may depend on each other . For example, building dappdeps / base requires the toolchain and glibc from the dappdeps / toolchain image. After compiling all the utilities in dappdeps / base, files from dappdeps / toolchain will be required to run them in runtime.
The main condition is that the utilities from these images should be located in a non-standard place , namely, in
/.dapp/deps/
, and not depend on any utilities or libraries in standard system paths. Also, dappdeps images should not have any other files except /.dapp/deps
. Such images will allow you to create containers based on them with volumes containing utilities, and mount them into other containers using the option
--volumes-from
for Docker.Collecting dappdeps / toolchain
Chapter 5 “Constructing a Temporary System” of the Linux From Scratch manual just describes the process of building a temporary chroot environment in
/tools
with some set of utilities, which then assemble the main target distribution. In our case, we slightly change the directory of the chroot environment.
--prefix
We will indicate in the parameter during compilation /.dapp/deps/toolchain/0.1.1
. This is the directory that will appear in the assembly container when dappdeps / toolchain is mounted in it - it contains all the necessary utilities and libraries. All we need is GNU binutils, GCC, and glibc. The image is collected using Docker multi-stage builds. In the image based on the
ubuntu:16.04
whole environment is prepared and compilation and installation of programs in/.dapp/deps/toolchain/0.1.1
. Then this directory is copied to the scratch image dappdeps / toolchain: 0.1.1. Dockerfile can be found here . The final dappdeps / toolchain image is the “temporary system" in LFS terminology. GCC in this system is still tied to the system paths to libraries, however we will not ensure that GCC works in any base image. The dappdeps / toolchain image is auxiliary, it will be used later, including to build are already really independent of the common system libraries of programs.
Using Omnibus with dappdeps / toolchain

Omnibus is used to build projects like chefdk or GitLab . It allows you to create self-contained bundles with the program and all dependent libraries, except for the system linker and libc. All instructions are described in readable, convenient Ruby recipes. The Omnibus project also has a library of already written omnibus-software recipes . So, let's try to describe the assembly of the remaining dappdeps distributions using Omnibus
. However, in order to get rid of dependence on the system linker and libc, we will collect all the programs in Omnibus using the compiler from dappdeps / toolchain. In this case, the programs will be tied to glibc, which is also in the dappdeps / toolchain.
To do this, save the contents of dappdeps / toolchain as an archive:
$ docker pull dappdeps/toolchain:0.1.1
$ docker save dappdeps/toolchain:0.1.1 -o dappdeps-toolchain.tar
Add this archive through the directive
Dockerfile ADD
and unzip the contents of the archive to the root of the assembly container:ADD ./dappdeps-toolchain.tar /dappdeps-toolchain
RUN tar xf /dappdeps-toolchain/**/layer.tar -C /
Before starting the assembly via omnibus, add the
PATH
path to the variable /.dapp/deps/toolchain/0.1.1/bin
as a priority, so that GCC from dappdeps / toolchain is used. The result of Omnibus is a package (in our case, DEB), the contents of which are unpacked and transferred to
/.dapp/deps/{base|gitartifact|...}
using Docker multi-stage builds similar to dappdeps / toolchain.Build dappdeps / base
The project for Omnibus is described using the project file
omnibus/config/projects/dappdeps-base.rb
:name 'dappdeps-base'
license 'MIT'
license_file 'LICENSE.txt'
DOCKER_IMAGE_VERSION = "0.2.3"
install_dir "/.dapp/deps/base/#{DOCKER_IMAGE_VERSION}"
build_version DOCKER_IMAGE_VERSION
build_iteration 1
dependency "dappdeps-base"
This file contains all the dependencies of the dappdeps-base Omnibus package and the target installation directory. Dependencies can be located either in a separate repository (for example, omnibus-software ), or in a directory
omnibus/config/software
. Each file in this directory describes instructions for installing a package / component. For dappdeps-base in the Omnibus software-written prescriptions, missing from the standard repository software-omnibus: acl
, attr
, coreutils
, diffutils
, findutils
, gtar
, rsync
, sed
, shadow
, sudo
, termcap
. Let's
rsync
look at an example of what the software recipe for Omnibus looks like:name 'rsync'
default_version '3.1.2'
license 'GPL-3.0'
license_file 'COPYING'
version('3.1.2') { source md5: '0f758d7e000c0f7f7d3792610fad70cb' }
source url: "https://download.samba.org/pub/rsync/src/rsync-#{version}.tar.gz"
dependency 'attr'
dependency 'acl'
dependency 'popt'
relative_path "rsync-#{version}"
build do
env = with_standard_compiler_flags(with_embedded_path)
command "./configure --prefix=#{install_dir}/embedded", env: env
command "make -j #{workers}", env: env
command 'make install', env: env
end
The directive
source
indicates the URL from where to download the source codes. Dependencies on other components are specified by directive dependency
by name. The name of the component to be assembled is specified by the directive name
. Each software recipe, in turn, may indicate dependencies on other components. Inside the block build
, standard build commands from source codes are indicated. The Omnibus and Dockerfile project for dappdeps / base can be found here .
Collecting dappdeps / gitartifact
In the case of dappdeps-gitartifact, only a Git build recipe is needed, and it is already in omnibus-software - all that remains is to connect it to the current Omnibus. Otherwise, everything is similar.
The Omnibus and Dockerfile project for dappdeps / gitartifact can be found here .
Collecting dappdeps / chefdk
For chefdk, there is already a ready-made Omnibus project . It remains only to add it to the assembly container via the Dockerfile and replace the standard installation paths of chefdk
/opt/chefdk
with /.dapp/deps/chefdk/2.3.17-2
(our installation path will include a version of Chef). The dockerfile for building dappdeps / chefdk can be found here .
Collecting dappdeps / ansible
To build Ansible, we also start an Omnibus project in which we install the Python interpreter, pip and describe the software recipe for Ansible:
name "ansible"
ANSIBLE_GIT_TAG = "v2.4.4.0+dapp-6"
dependency "python"
dependency "pip"
build do
command "#{install_dir}/embedded/bin/pip install https://github.com/flant/ansible/archive/#{ANSIBLE_GIT_TAG}.tar.gz"
command "#{install_dir}/embedded/bin/pip install pyopenssl"
end
As you can see, the image with Ansible is built-in Python, pip and installed via pip Ansible with dependencies.
The Omnibus and Dockerfile project for dappdeps / ansible can be found here .
How to use dappdeps distribution?
To use dappdeps images through mounting volumes, you must first create a container for each image and specify which volume is stored in this container. This is what Docker requires at the moment.
$ docker create --name dappdeps-toolchain --volume /.dapp/deps/toolchain/0.1.1 dappdeps/toolchain:0.1.1 no-such-cmd
13edda732176a44d7d822202d8327565b78f4a2190368bb1df46cdad1e127b6e
$ docker ps -a | grep dappdeps-toolchain
13edda732176 dappdeps/toolchain:0.1.1 "no-such-cmd" About a minute ago Created dappdeps-toolchain
The container is called
dappdeps-toolchain
: by this name all declared volumes of this container can be used to mount to other containers with --volumes-from
. An arbitrary text command parameter no-such-cmd
must be specified for Docker, but this container will never be launched - it will remain in state Created
. Create the remaining containers:
$ docker create --name dappdeps-base --volume /.dapp/deps/base/0.2.3 dappdeps/base:0.2.3 no-such-cmd
20f524c5b8b4a59112b4b7cb85e47eee660c7906fb72a4935a767a215c89964e
$ docker create --name dappdeps-ansible --volume /.dapp/deps/ansible/2.4.4.0-10 dappdeps/ansible:2.4.4.0-10 no-such-cmd
cd01ae8b69cd68e0611bb6c323040ce202e8e7e6456a3f03a4d0a3ffbbf2c510
$ docker create --name dappdeps-gitartifact --volume /.dapp/deps/gitartifact/0.2.1 dappdeps/gitartifact:0.2.1 no-such-cmd
2c12a8743c2b238d90debaf066e29685b41b138c10f2b893a815931df866576d
$ docker create --name dappdeps-chefdk --volume /.dapp/deps/chefdk/2.3.17-2 dappdeps/chefdk:2.3.17-2 no-such-cmd
4dffe74c49c8e4cdf9d749177ae9efec3bdae6e37c8b6df41b6eb527a5c1d891
So we got to the climax for which all this fuss was thought. So, as a demonstration of the possibilities, install the packages in the Alpine image
nginx
and tree
, launching Ansible from dappdeps / ansible via Bash from dappdeps / base :$ docker run -ti --name mycontainer --volumes-from dappdeps-toolchain --volumes-from dappdeps-base --volumes-from dappdeps-gitartifact --volumes-from dappdeps-ansible --volumes-from dappdeps-chefdk alpine:latest /.dapp/deps/base/0.2.3/embedded/bin/bash -lc '/.dapp/deps/ansible/2.4.4.0-10/embedded/bin/ansible localhost -m apk -a "name=nginx,tree update_cache=yes"'
[WARNING]: Unable to parse /etc/ansible/hosts as an inventory source
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
localhost | SUCCESS => {
"changed": true,
"failed": false,
"msg": "installed nginx tree package(s)",
"packages": [
"pcre",
"nginx",
"tree"
],
"stderr": "",
"stderr_lines": [],
"stdout": "(1/3) Installing pcre (8.41-r1)\n(2/3) Installing nginx (1.12.2-r3)\nExecuting nginx-1.12.2-r3.pre-install\n(3/3) Installing tree (1.7.0-r1)\nExecuting busybox-1.27.2-r7.trigger\nOK: 6 MiB in 14 packages\n",
"stdout_lines": [
"(1/3) Installing pcre (8.41-r1)",
"(2/3) Installing nginx (1.12.2-r3)",
"Executing nginx-1.12.2-r3.pre-install",
"(3/3) Installing tree (1.7.0-r1)",
"Executing busybox-1.27.2-r7.trigger",
"OK: 6 MiB in 14 packages"
]
}
The final chord - we create an image from the resulting container and ... we see that from dappdeps there were only empty mount-point directories in it!
$ docker commit mycontainer myimage
sha256:9646be723b91daeaf538b7d92bb8844578abc7acd3028394f543e883eeb382bb
$ docker run -ti --rm myimage tree /.dapp
/.dapp
└── deps
├── ansible
│ └── 2.4.4.0-10
├── base
│ └── 0.2.3
├── chefdk
│ └── 2.3.17-2
├── gitartifact
│ └── 0.2.1
└── toolchain
└── 0.1.1
11 directories, 0 files

It would seem, what else can you dream of? ..
Further work and problems
What are the problems with dappdeps?
Work is needed to reduce the size of the dappdeps / toolchain. To do this, you need to divide the toolchain into 2 parts: the part needed to build new utilities in dappdeps, and the part with basic libraries like glibc, which must be mounted in runtime to run these utilities.
For the Ansible apt module to work in dappdeps / ansible, I had to add the contents of the python-apt package in Ubuntu directly to the image without rebuilding. In this case, the apt module works without problems in basic images based on DEB, but glibc of a certain version is required. Since apt itself is a distribution-specific module, this is acceptable.
What is missing in the Dockerfile?
To use the volume from the dappdeps / toolchain image, you must first create an archive of this image and then add it to another image through the directive
Dockerfile ADD
(see the section “Using Omnibus with dappdeps / toolchain”). From the side of the Dockerfile, there is not enough functionality that would allow you to simply connect the directory of another image during the build as VOLUME
, i.e. analogue option --volumes-from
for Dockerfile.conclusions
We made sure that the idea works and allows you to use GNU and other CLI utilities in the assembly instructions, run the Python or Ruby interpreter, even run Ansible or Chef in Alpine or scratch images. In this case, the writer of assembly instructions does not need to know the side effect of executing the commands that are run and explicitly list which files need to be imported, as is the case with Docker multi-stage builds.
The results of this work are also applied in practice : dapp uses dappdeps images in assembly containers. For example, Git from dappdeps / gitartifact is used to work with patches, and the Git utility with some guarantee behaves the same in all base images. However, the way dapp uses dappdeps is beyond the scope of this article.
The purpose of this article was to convey the idea itself and show on the real practical example the possibility of its application.
PS All the above dappdeps-images available on hub.docker.com :
dappdeps/toolchain:0.1.1
, dappdeps/base:0.2.3
, dappdeps/gitartifact0.2.1
, dappdeps/ansible:2.4.4.0-10
, dappdeps/chefdk:2.3.17-2
- they can be used.