Learn Docker, Part 3: Dockerfile files

https://towardsdatascience.com/learn-enough-docker-to-be-useful-b0b44222eef5
  • Transfer
In the translation of the third part of a series of materials on Docker, we will continue to be inspired by baking, namely bagels. Our main theme today is Dockerfile files management. We will analyze the instructions that are used in these files.

Part 1: Basics
Part 2: Terms and Concepts
Part 3: Dockerfile Files


Bagels are instructions in the Dockerfile file.

Docker Images


Recall that the Docker container is a Docker image brought to life. This is a self-contained operating system, in which there is only the most necessary and application code.

Docker images are the result of the build process, and Docker containers are running images. In the heart of Docker are Dockerfile files. Such files tell Docker how to build the images from which containers are created.

Each Docker image corresponds to a file called Dockerfile. His name is written that way - without the extension. When you run the command docker buildto create a new image, it means that the Dockerfile is in the current working directory. If this file is located elsewhere, its location can be specified using the flag -f.

Containers, as we found out in the first material of this series, consist of layers. Each layer, except the last, located on top of all the others, is intended only for reading. Dockerfile informs Docker about which layers and in which order to add to the image.

Each layer, in fact, is just a file that describes the change in the state of the image in comparison with the state in which it was after adding the previous layer. In Unix, by the way, almost anything is a file .

The base image is what is the original layer (or layers) of the image being created. The basic image is also called the parent image.


The base image is where the Docker image begins.

When an image is downloaded from a remote repository to a local computer, only the layers that are not on this computer are physically downloaded. Docker aims to save space and time by reusing existing layers.

Dockerfile files


Dockerfile files contain instructions for creating an image. From them, typed in capital letters, begin the lines of this file. Following the instructions are their arguments. Instructions, when assembling an image, are processed from top to bottom. Here's what it looks like:

FROM ubuntu:18.04
COPY . /app

The layers in the final image is created only instructions FROM, RUN, COPY, and ADD. Other instructions set up something, describe metadata, or tell Docker that something needs to be done while the container is running, for example, opening a port or executing a command.

Here we proceed from the assumption that the Docker image based on a Unix-like OS is used. Of course, here you can use the image based on Windows, but using Windows is a less common practice, working with such images is more difficult. As a result, if you have this opportunity, use Unix.

To begin with, here is a list of Dockerfile instructions with brief comments.

Dockerfile dozen instructions


  1. FROM - sets the base (parent) image.
  2. LABEL- describes the metadata. For example, information about who created and maintains the image.
  3. ENV - sets persistent environment variables.
  4. RUN- executes the command and creates an image layer. Used for installation in a package container.
  5. COPY - copies files and folders into container.
  6. ADD - copies files and folders into a container, can unpack local .tar-files.
  7. CMD- describes a command with arguments to be executed when the container is launched. Arguments can be overridden when the container is started. The file may contain only one instruction CMD.
  8. WORKDIR - sets the working directory for the next instruction.
  9. ARG - sets variables for Docker transfer during image build.
  10. ENTRYPOINT- provides a command with arguments to call during container execution. Arguments are not redefined.
  11. EXPOSE - indicates the need to open the port.
  12. VOLUME - creates a mount point for working with persistent storage.

Now let's talk about these instructions.

Instructions and examples of their use


▍Simple Dockerfile


A dockerfile can be extremely simple and short. For example - as follows:

FROM ubuntu:18.04

FRFROM Instruction


The Dockerfile file must begin with an instruction FROM, or with an instruction ARGfollowed by an instruction FROM.

The FROM keyword tells Docker to use the base image that matches the provided name and tag when building the image. The basic image, in addition, is also called the parent image .

In this example, the base image is stored in the ubuntu repository . Ubuntu is the name of the official Docker repository, providing the basic version of the popular Linux operating system family, called Ubuntu.

Note that the Dockerfile in question includes a tag18.04clarifying exactly which base image we need. It is this image that will be loaded when building our image. If the tag is not included in the instruction, then Docker assumes that the most recent image from the repository is required. In order to more clearly express his intentions, the author of the Dockerfile is recommended to indicate which particular image he needs.

When the above-described Dockerfile is used on the local machine to build an image for the first time, Docker will load the image defined layers ubuntu. They can be presented superimposed on each other. Each next layer is a file that describes the differences in the image in comparison with the state in which it was after adding the previous layer to it.

When creating a container, a layer in which you can make changes is added on top of all other layers. The data in the remaining layers can only be read.


The structure of the container (taken from the documentation )

Docker, for the sake of efficiency, uses a copy strategy when writing. If a layer in the image exists at the previous level and some layer needs to read data from it, Docker uses the existing file. At the same time do not need to download anything.

When the image is executed, if the layer needs to be modified by means of the container, the corresponding file is copied to the topmost, mutable layer. To learn more about the copy-on-write strategy, take a look at this material from the Docker documentation.

We will continue the discussion of the instructions that are used in the Dockerfile, giving an example of such a file with a more complex structure.

▍More complicated Dockerfile


Although the Dockerfile file we just looked at turned out neat and understandable, it is too simple, it only uses one instruction. In addition, there are no instructions called during the execution of the container. Take a look at another file that collects a small image. It has mechanisms that define the commands invoked during container execution.

FROM python:3.7.2-alpine3.8
LABEL maintainer="jeffmshale@gmail.com"
ENV ADMIN="jeff"
RUN apk update && apk upgrade && apk add bash
COPY . ./app
ADD https://raw.githubusercontent.com/discdiver/pachy-vid/master/sample_vids/vid1.mp4 \
/my_app_directory
RUN ["mkdir", "/a_directory"]
CMD ["python", "./my_script.py"]

Perhaps, at first glance, this file may seem rather complicated. So let's deal with it.

The base of this image is the official Python image with the tag 3.7.2-alpine3.8. After analyzing this code, you can see that this basic image includes Linux, Python, and, by and large, its composition is limited to this. Alpine OS images are very popular in the Docker world. The fact is that they are distinguished by their small size, high speed of work and safety. However, the images of Alpine are not distinguished by the wide capabilities typical of ordinary operating systems. Therefore, in order to collect something useful on the basis of such an image, the creator of the image needs to install the necessary packages into it.

▍Instruction LABEL



Labels

The LABEL instruction(label) allows you to add metadata to an image. In the case of the file being considered now, it includes the contact details of the image creator. Declaring tags does not slow down the process of assembling an image and does not increase its size. They only contain useful information about the Docker image, so they are recommended to be included in the file. Details about working with metadata in the Dockerfile can be found here .

▍ ENV Instruction



Environment

The ENV instructionallows you to set persistent environment variables that will be available in the container during its execution. In the previous example, after creating the container, you can use a variableADMIN.

The instruction isENVwell suited for specifying constants. If you use a certain value in the Dockerfile several times, say, when describing commands running in a container, and suspect that you may have to change it to something else, it makes sense to write it in a similar constant.

It should be noted that in Dockerfile files there are often different ways of solving the same tasks. What exactly to use is a question whose solution is influenced by the desire to follow the practices adopted in the Docker environment, to ensure the transparency of the solution and its high performance. For example, instructions RUN, CMDand ENTRYPOINTserve different purposes, but they are all used to execute commands.

▍ RUN Instruction



RUN instruction

manual RUN allows you to create a layer during assembly of the image. After its execution, a new layer is added to the image, its state is fixed. The instruction isRUNoften used to install additional packages into the images. In the previous example, the instructionRUN apk update && apk upgradetells Docker that the system needs to update packages from the base image. Following these two commands, there is a command&& apk add bashindicating that you need to install bash in the image.

What the commands look likeapkis the abbreviation of the Alpine Linux package manager(Alpine Linux package manager). If you are using a base image of some other Linux operating system, then you, for example, when using Ubuntu, may need a command to install packages RUN apt-get. Later we will talk about other ways to install packages.

Instructions RUNand similar to her instructions - such as CMDand ENTRYPOINT, can be used either in the exec-form or in the shell-shaped. The exec form uses a syntax that resembles the description of a JSON array. For example, it might look like this: RUN ["my_executable", "my_first_param1", "my_second_param2"].

In the previous example, we used the shell-shape RUN instructions in this form: RUN apk update && apk upgrade && apk add bash.

Later in our Dockerfile, exec-form instructions are used RUN, in the formRUN ["mkdir", "/a_directory"]to create a directory. In this case, using the instructions in this form, you need to remember about the need to design strings using double quotes, as is customary in the JSON format.

COP COPY Instruction



COPY Instruction

Manual COPY presented in this file as follows:COPY . ./app. It tells Docker to take the files and folders from the local build context and add them to the current working directory of the image. If the target directory does not exist, this instruction will create it.

AD ADD Instruction


The ADD instruction allows you to solve the same tasks as COPY, but with it a couple more use cases are associated with it. So, using this instruction, you can add files downloaded from remote sources to the container, as well as unpack local .tar-files.

In this example, the instruction ADDwas used to copy a file accessible via a URL to the container directory my_app_directory. It should be noted, however, that the Docker documentation does not recommend the use of similar files obtained from URLs, since they cannot be deleted, and since they increase the size of the image.

In addition, the documentation suggests wherever possible, instead of instructions, ADDuse instructions.COPYin order to make the Dockerfile files clearer. I believe, Docker development team would be necessary to combine ADDand COPYin a single instruction to those who create images that would not have to remember too many instructions.

Note that the instruction ADDcontains a line break character - \. Such characters are used to improve the readability of long commands by splitting them into several lines.

CM CMD Instruction



CMD instruction

manual CMD provides Docker command you want to execute when the container starts. The results of this command are not added to the image at the time of its assembly. In our example, using this command runs the scriptmy_script.pyduring container execution.

Here is something else you need to know about the instructionsCMD:

  • There can be only one instruction in one Dockerfile file CMD. If there are several such instructions in the file, the system will ignore everything except the last one.
  • The instruction CMDmay have an exec form. If this instruction does not include the mention of the executable file, then the instruction must be present in the file ENTRYPOINT. In this case, both of these instructions should be in the format JSON.
  • The command line docker runarguments passed override the arguments provided by the instructions CMDin the Dockerfile.

▍More complicated Dockerfile


Consider another Dockerfile file in which some new commands will be used.

FROM python:3.7.2-alpine3.8
LABEL maintainer="jeffmshale@gmail.com"
# Устанавливаем зависимости
RUN apk add --update git
# Задаём текущую рабочую директорию
WORKDIR /usr/src/my_app_directory
# Копируем код из локального контекста в рабочую директорию образа
COPY . .
# Задаём значение по умолчанию для переменной
ARG my_var=my_default_value
# Настраиваем команду, которая должна быть запущена в контейнере во время его выполнения
ENTRYPOINT ["python", "./app/my_script.py", "my_var"]
# Открываем порты
EXPOSE 8000
# Создаём том для хранения данных
VOLUME /my_volume

In this example, among other things, you can see comments that begin with a symbol #.
One of the main actions performed by the Dockerfile tools is the installation of packages. As already mentioned, there are various ways to install packages using the instructions RUN.

Packages in an Alpine Docker image can be installed using apk. For this, as we have said, a view command is used RUN apk update && apk upgrade && apk add bash.

In addition, Python packages can be installed into an image using pip , wheel and conda . If it's not about Python, but about other programming languages, then other package managers can be used to prepare the corresponding images.

At the same time, in order for the installation to be possible, the underlying layer must provide the layer in which the packages are being installed, with a suitable package manager. Therefore, if you encounter problems installing packages, make sure the package manager is installed before you try to use it.

For example, the statement RUNin the Dockerfile can be used to install a list of packages with pip. If you do this, combine all the commands into one instruction and separate it with line breaks using the symbol \. Thanks to this approach, the files will look neat and this will result in adding fewer layers to the image than would have been added using several instructions RUN.

In addition, to install multiple packages, you can proceed in a different way. They can be listed in a file and transferred to the package manager using this file RUN. Usually such files are given a name requirements.txt.

WOR WORKDIR Instruction



Working directories

The WORKDIR instructionallows you to change the working directory of a container. From this directory work instructionsCOPY,ADD,RUN,CMDandENTRYPOINT, reaching forWORKDIR. Here are some features related to this instruction:

  • It is better to set WORKDIRabsolute paths to folders instead of navigating through the file system using the commands cdin the Dockerfile.
  • The statement WORKDIRautomatically creates a directory if it does not exist.
  • You can use several instructions WORKDIR. If such instructions are provided with relative paths, each of them changes the current working directory.

AR ARG instruction


The ARG instruction allows you to set a variable, the value of which can be transferred from the command line to the image during its assembly. The value for the default variable can be represented in the Dockerfile. For example: ARG my_var=my_default_value.

Unlike ENV-var, ARG-varg are not available during container execution. However, ARG-vars can be used to set default values ​​for ENV-vars from the command line during the image build process. A- ENVvariables will already be available in the container during its execution. Details about this method of working with variables can be found here .

▍Instrution ENTRYPOINT



Point of transition to a certain place.

The ENTRYPOINT statement allows you to specify a command with arguments that must be executed when the container is started. It is similar to the commandCMD, but the parameters specified inENTRYPOINTare not overwritten in the event that the container is started with command line parameters.

Instead, command line arguments passed to view constructs aredocker run my_image_nameadded to the arguments given by the instructionENTRYPOINT. For example, after executing a command of the form, thedocker run my_image bashargument isbashadded to the end of the list of arguments specified withENTRYPOINT. When preparing the Dockerfile, do not forget about the instructionsCMDorENTRYPOINT.

There are several recommendations in the Docker documentation regarding which instruction, CMDor ENTRYPOINT, should be chosen as a tool for executing commands when launching the container:

  • If every time you start the container you need to execute the same command - use ENTRYPOINT.
  • If the container will be used as an application - use ENTRYPOINT.
  • If you know that when you start the container, you will need to pass to it the arguments that can overwrite the arguments specified in the Dockerfile, use CMD.

In our example, the use of instructions ENTRYPOINT ["python", "my_script.py", "my_var"]causes the container, when launched, to run a Python script my_script.pywith an argument my_var. The value presented my_varis then used in the script using argparse . Note that in the Dockerfile variable my_var, prior to its use, the default value is assigned using ARG. As a result, if the corresponding value was not passed to it when the container was started, the default value will be applied.

Docker documentation recommends using the exec-form ENTRYPOINT: ENTRYPOINT ["executable", "param1", "param2"].

EX EXPOSE Instruction



EXPOSE instruction

manual EXPOSE indicates which ports will be opened so that through them could be contacted with a working container. This instruction does not open ports. Rather, it plays the role of documentation for the image, a means of communication for the one who collects the image, and the one who runs the container.

In order to open a port (or ports) and configure port forwarding, you need to run the commanddocker runwith the key-p. If you use the key in the form-P(with a capital letterP), then all the ports specified in the instructions will be openEXPOSED.

▍ VOLUME Instruction



VOLUME instruction

manual VOLUME allows you to specify a place that the container will be used to permanently store files and to work with these files. We will talk about this later.

Results


Now you know a dozen of instructions used to create images using Dockerfile. This list of such instructions is not exhausted. In particular, we have not considered here are the instructions as USER, ONBUILD, STOPSIGNAL, SHELLand HEALTHCHECK. Here is a quick reference to the Dockerfile instructions.

Probably, Dockerfile files are a key component of the Docker ecosystem, which you need to work with to learn to anyone who wants to feel confident in this environment. We will return to talking about them the next time we discuss ways to reduce the size of images.

Dear readers! If you use Docker in practice, please tell us about how you write Docker-files.


Also popular now: