silentvick August 28, 2014 at 14:09

Docker Image Optimization

From the sandbox

Docker images can be very large. Many exceed 1 GB in size. How do they become like that? Should they be like that? Can we make them smaller without sacrificing functionality?

At CenturyLink Lab, we have been working hard on building various docker images lately. When we started experimenting with their creation, we found that our assemblies inflated very quickly in volume (it was common to assemble an image that weighs 1 GB or more). Size, of course, is not so important if we are talking about images of two gigs lying on the local machine. But this becomes a problem when you start to constantly download / send these images over the Internet.

I decided that it’s worth digging a little deeper and understanding how the process of creating docker images works in order to understand what can be done to reduce the size of our assemblies.

As a small digression, Adriaan de Jonge recently published an article titled “ Creating the smallest possible Docker container ”, in which he described how to build an image that contains nothing but the statically linked Go binary that runs with the container. His image is strikingly small - 3.6 Mb. Here I will not consider such extremes. As someone who is used to working with languages like Python and Ruby, I need a slightly higher level of support from the OS, and I will gladly sacrifice a hundred megabytes of free space to be able to run Debian andapt-get install-to add your dependencies. Therefore, although I envy the tiny image of Adrian, I need support for a wider range of applications, which makes his approach impractical.

Layers

Before we get to the topic of reducing your images, we need to talk about layers. The concept of layers touches on various low-level technical details about things like the root file system ( rootfs ), the copy-on-write mechanism, and the cascade-mounted mount ( union mount ). Fortunately, this topic is well covered elsewhere , so I will not retell it here. For our purposes, it is important to understand that each instruction in the Dockerfile creates a new image layer.

Let's take a look at the Dockerfile example to see this in action:

FROM debian:wheezy
RUN mkdir /tmp/foo
RUN fallocate -l 1G /tmp/foo/bar

A completely useless image, but it will help us demonstrate what has been said. Here we use debian:wheezyas a base image, create a directory /tmp/foo, and in it we allocate 1 GB of space for the file bar.

Let's assemble this image:

$ docker build -t sample .
Sending build context to Docker daemon  2.56 kB
Sending build context to Docker daemon 
Step 0 : FROM debian:wheezy
 ---> e8d37d9e3476
Step 1 : RUN mkdir /tmp/foo
 ---> Running in 3d5d8b288cc2
 ---> 9876aa270471
Removing intermediate container 3d5d8b288cc2
Step 2 : RUN fallocate -l 1G /tmp/foo/bar
 ---> Running in 6c797329ee43
 ---> 3ebe08b36733
Removing intermediate container 6c797329ee43
Successfully built 3ebe08b36733

If you look at the result of the command docker build, you can see what Docker does to build our image:

Using the value instructions FROM, Docker container starts based on debian:wheezythe image (the container ID: 3d5d8b288cc2)
Inside this container, Docker executes the command mkdir /tmp/foo
The container is stopped, committed (as a result, a new image with ID is created 9876aa270471) and then deleted
Docker launches another container, this time from the image saved in the previous step (this container has an ID 6c797329ee43)
Inside the running container, Docker executes the command fallocate -l 1G /tmp/foo/bar
The container is stopped, committed (as a result, a new image with ID is created 3ebe08b36733) and then deleted

We can see the final result by running the command docker images --tree (unfortunately, the flag is --treeoutdated and will most likely be removed in future releases):

$ docker images --tree
Warning: '--tree' is deprecated, it will be removed soon. See usage.
└─511136ea3c5a Virtual Size: 0 B Tags: scratch:latest
  └─59e359cb35ef Virtual Size: 85.18 MB
    └─e8d37d9e3476 Virtual Size: 85.18 MB Tags: debian:wheezy
      └─9876aa270471 Virtual Size: 85.18 MB
        └─3ebe08b36733 Virtual Size: 1.159 GB Tags: sample:latest

Here you can see the image marked as debian:wheezy, after which there are two containers that were mentioned earlier (one for each instruction in the Dockerfile).

We often talk about layers and images as if they were different things. But, in fact, each layer is an image, and the image layer is just a collection of other images.

Just as we do:

docker run -it sample:latest /bin/bash

We can easily launch one of the unnamed layers:

docker run -it 9876aa270471 /bin/bash

Both that and another - images on the basis of which containers can be started. The only difference is that the first is named, and the second is not. This ability to run containers from any layer can be very useful when debugging your Dockerfile.

Image size

Knowing that an image is nothing more than a collection of other images, one can come to the obvious conclusion: the size of the image is equal to the sum of the sizes of the images that make up it.

Let's look at the output of the command docker history:

$ docker history sample
IMAGE         CREATED        CREATED BY                               SIZE
3ebe08b36733  3 minutes ago  /bin/sh -c fallocate -l 1G /tmp/foo/bar  1.074 GB
9876aa270471  3 minutes ago  /bin/sh -c mkdir /tmp/foo                0 B
e8d37d9e3476  4 days ago     /bin/sh -c #(nop) CMD [/bin/bash]        0 B
59e359cb35ef  4 days ago     /bin/sh -c #(nop) ADD file:1e2ba3d9379f  85.18 MB
511136ea3c5a  13 months ago                                           0 B

We can see all the layers of the image, samplealong with the commands that led to their creation, and their size (note that the order of the layers is docker historyinverse to the order displayed in docker images --tree).

There are only two instructions that do something meaningful for our image: an instruction ADD(inherited from debian:wheezy) and our fallocateteam.

Let's save our image in a tar archive and see what the weight will be:

$ docker save sample > sample.tar
$ ls -lh sample.tar 
-rw-r--r-- 1 core core 1.1G Jul 26 02:35 sample.tar

When an image is saved in this way to a tar file, various metadata about each layer is also placed there, so the final size will be slightly larger than the sum of the sizes of all layers.

Add another instruction to the Dockerfile:

FROM debian:wheezy
RUN mkdir /tmp/foo
RUN fallocate -l 1G /tmp/foo/bar
RUN rm /tmp/foo/bar

A new instruction will delete the file immediately after it is created fallocate.

If we run docker buildfor the updated Dockerfile and look at the story again, we will see the following:

$ docker history sample
IMAGE         CREATED         CREATED BY                               SIZE
9d9bdb929b00  8 seconds ago   /bin/sh -c rm /tmp/foo/bar               0 B
3ebe08b36733  24 minutes ago  /bin/sh -c fallocate -l 1G /tmp/foo/bar  1.074 GB
9876aa270471  24 minutes ago  /bin/sh -c mkdir /tmp/foo                0 B
e8d37d9e3476  4 days ago      /bin/sh -c #(nop) CMD [/bin/bash]        0 B
59e359cb35ef  4 days ago      /bin/sh -c #(nop) ADD file:1e2ba3d9379f  85.18 MB
511136ea3c5a  13 months ago                                            0 B

Notice that the call rmadded a new layer (at 0 bytes), but everything else remains as before. If we save our updated image, we should see that the size has not changed much (there will be a slight difference due to the metadata of the added layer):

$ docker save sample > sample.tar
$ ls -lh sample.tar
-rw-r--r-- 1 core core 1.1G Jul 26 02:55 sample.tar

If we had called docker runfor this image and looked into the directory /tmp/foo, we would have found it empty (in the end, the file was deleted). However, since our Dockerfile generated a layer containing a 1 GB file, it became an integral part of the image.

Each additional instruction in your Dockerfile will only increase the overall image size.

Of course, this example is far-fetched. But understanding the fact that images are the sums of the layers of which they are composed is important when looking for ways to reduce them. Below I will describe several ways to do this.

Choose your base

Pretty obvious advice. However, the choice of basis can significantly affect the final size of the image. Here, for example, is a list of popular base images and their sizes:

$ docker images
REPOSITORY   TAG      IMAGE ID       CREATED         VIRTUAL SIZE
scratch      latest   511136ea3c5a   13 months ago   0 B
busybox      latest   a9eb17255234   7 weeks ago     2.433 MB
debian       latest   e8d37d9e3476   4 days ago      85.18 MB
ubuntu       latest   ba5877dc9bec   4 days ago      192.7 MB
centos       latest   1a7dc42f78ba   2 weeks ago     236.4 MB
fedora       latest   88b42ffd1f7c   10 days ago     373.7 MB

We used to use the team ubuntuas a basis before, mainly because most of us were already familiar with it. However, having played a little with debian, we came to the conclusion that it fully satisfies our needs and at the same time saves 100+ MB of space.

The list of useful databases can be different and depends on your needs, but you should definitely check it out. If you use Ubuntu when BusyBox would be enough, then you are wasting a ton of space in vain.

I would like the size of the images displayed in the Docker repository. But now, unfortunately, to find out the size, the image needs to be downloaded.

Reuse your base

One of the advantages of the layer approach is the ability to reuse layers between different images. The example below shows three images that use debian:wheezyas a basis:

$ docker images --tree
Warning: '--tree' is deprecated, it will be removed soon. See usage.
└─511136ea3c5a Virtual Size: 0 B Tags: scratch:latest
    └─e8d37d9e3476 Virtual Size: 85.18 MB Tags: debian:wheezy
      ├─22a0de5ea279 Virtual Size: 85.18 MB
      │ └─057ac524d834 Virtual Size: 85.18 MB
      │   └─bd30825f7522 Virtual Size: 106.2 MB Tags: creeper:latest
      ├─d689af903018 Virtual Size: 85.18 MB
      │ └─bcf6f6a90302 Virtual Size: 85.18 MB
      │   └─ffab3863d257 Virtual Size: 95.67 MB Tags: enderman:latest
      └─9876aa270471 Virtual Size: 85.18 MB
        └─3ebe08b36733 Virtual Size: 1.159 GB
          └─9d9bdb929b00 Virtual Size: 1.159 GB Tags: sample:latest

Each one is built in debian:wheezy, but these are not three copies of Debian. Instead of copying, each image contains a link to an instance of the Debian layer (one of the reasons I like docker images --treeit is that it clearly demonstrates the connections between different layers).

This means that once you download debian:wheezy, you no longer have to pull these layers again, and each of its bits used in the images will take up space only once.

So you can save a considerable amount of space and Internet traffic, using a common base for different images.

Group your teams

In the example above, we create a file and then delete it immediately. The situation, although far-fetched, but something similar often happens when building images. Let's look at something more realistic:

FROM debian:wheezy
WORKDIR /tmp
RUN wget -nv  
RUN tar -xvf someutility-v1.0.0.tar.gz
RUN mv /tmp/someutility-v1.0.0/someutil /usr/bin/someutil
RUN rm -rf /tmp/someutility-v1.0.0 
RUN rm /tmp/someutility-v1.0.0.tar.gz

We download the tar archive, unpack it, move something and clean it up after ourselves.

As we saw earlier, each of these instructions creates a separate layer. Despite the fact that we delete the archive and the extracted files, they still remain part of the image.

$ docker history some utility
IMAGE    CREATED         CREATED BY                                     SIZE
33f4a99  16 seconds ago  /bin/sh -c rm /tmp/someutility-v1.0.0.tar.gz   0 B
fec7b5e  17 seconds ago  /bin/sh -c rm -rf /tmp/someutility-v1.0.0      0 B
0851974  18 seconds ago  /bin/sh -c mv /tmp/someutility-v1.0.0/someuti  12.21 MB
5b6b996  19 seconds ago  /bin/sh -c tar -xvf someutility-v1.0.0.tar.gz  99.91 MB
0eebad5  20 seconds ago  /bin/sh -c wget -nv http://centurylinklabs.com  55.34 MB
d6798fc  8 minutes ago   /bin/sh -c #(nop) WORKDIR /tmp                 0 B
e8d37d9  5 days ago      /bin/sh -c #(nop) CMD [/bin/bash]              0 B
59e359c  5 days ago      /bin/sh -c #(nop) ADD file:1e2ba3d9379f7685a1  85.18 MB
511136e  13 months ago                                                  0 B

Launch wgetleads to the appearance of a layer with a size of 55 MB, and unpacking the archive to a layer of 99 MB. We do not need these files, which means we are simply wasting 150+ MB for nothing.

We can fix this by doing a little refactoring of our Dockerfile:

FROM debian:wheezy
WORKDIR /tmp
RUN wget -nv  && \
  tar -xvf someutility-v1.0.0.tar.gz && \
  mv /tmp/someutility-v1.0.0/someutil /usr/bin/someutil && \
  rm -rf /tmp/someutility-v1.0.0 && \
  rm /tmp/someutility-v1.0.0.tar.gz

Instead of running each command in a separate instruction, RUNwe grouped them using an operator &&. And while the Dockerfile is becoming a little less readable, it allows us to remove the tarball and the extracted directory before the layer commits.

Here is the result:

$ docker history some utility
IMAGE   CREATED        CREATED BY                                     SIZE
8216b5f 7 seconds ago  /bin/sh -c wget -nv http://centurylinklabs.com  12.21 MB
d6798fc 17 minutes ago /bin/sh -c #(nop) WORKDIR /tmp                 0 B
e8d37d9 5 days ago     /bin/sh -c #(nop) CMD [/bin/bash]              0 B
59e359c 5 days ago     /bin/sh -c #(nop) ADD file:1e2ba3d9379f7685a1  85.18 MB
511136e 13 months ago                                                 0 B

Note that in the end we got the same image, while getting rid of several extra layers and saving 150 MB of free space.

I would not advise you to urgently go and rewrite all the commands in your Dockerfile in one line. However, if you notice that somewhere there is a similar situation when you create and then delete files, then combining several instructions into one will help you keep the image size to a minimum.

“Slam” your images

All of the above strategies come from the assumption that you are creating your own image, or at least have access to the Dockerfile. However, a situation is possible when you have an image created by someone else, and you want to make it a little easier.

In this case, we can take advantage of the fact that creating a container merges all layers into one.

Let's go back to our image sample(the one with fallocateand rm) and run it:

$ docker run -d sample
7423d238b754e6a2c5294aab7b185f80be2457ee36de22795685b19ff1cf03ec

Since our image, in fact, does nothing, it immediately finishes work. This gives us a stopped container, which is the result of merging all layers of the image (I used the flag -dsimply to display the container ID).

If we export this container by redirecting the output to a command docker import, we can turn it back into an image:

$ docker export 7423d238b | docker import - sample:flat
3995a1f00b91efb016250ca6acc31aaf5d621c6adaf84664a66b7a4594f695eb
$ docker history sample:flat
IMAGE               CREATED             CREATED BY          SIZE
3995a1f00b91        12 seconds ago                          85.18 MB

Please note that the story for our new image sample:flatshows only one layer weighing 85 MB, - the layer containing the gigabyte file is gone.

And, although this is a pretty nimble trick, it should be noted that it has significant disadvantages:

By merging all layers together, you lose the previously described advantage of sharing layers with different images. Our sample:flatimage now contains an embedded copy debian:wheezy.
All metadata, usually stored with the image, is lost during the launch / export / import process. Opened ports, environment variables, the default command - everything that can be declared in the original image is lost.

Therefore, I would definitely not advise you to rush to "collapse" all your images. But, sometimes, this can be useful: in case you are trying to optimize someone else’s image, or just want to find out how much you can squeeze your own.

- Source: Optimizing Docker Images

Tags: