Acribia April 21, 2019 at 18:19

Security Cribs: Docker

Docker containers are the most popular containerization technology. Initially, it was used mainly for dev and test environments, and over time it switched to production. Docker containers began to breed in the production environment, like mushrooms after rain, but few who use this technology have thought about how to safely publish Docker containers.

Based on OWASP , we have prepared a list of rules, the implementation of which will significantly protect your environment, built on Docker containers.

Rule 0

The host machine and Docker must contain all current updates.

To protect against known vulnerabilities that lead to escaping from the container environment to the host system, which usually result in privilege escalation on the host system, installing all patches for the host OS, Docker Engine, and Docker Machine is extremely important.

In addition, containers (unlike virtual machines) share the kernel with the host, so the kernel exploit running inside the container runs directly in the host kernel. For example, a kernel privilege escalation exploit (such as Dirty COW) running inside a well-isolated container will result in root access on the host.

Rule 1

Do not give access to the socket of the Docker daemon

The Docker service (daemon) uses the UNIX socket /var/run/docker.sock for incoming API connections. The owner of this resource must be the root user. And no other way. Changing access rights to this socket is essentially equivalent to granting root access to the host system.

Also, you should not fumble the /var/run/docker.sock socket with containers, where you can do without it, because in this case, compromising the service in the container will lead to complete control over the host system. If you have containers that use something like this:

-v /var/run/docker.sock://var/run/docker.sock

or for docker-compose:

volumes:
- "/var/run/docker.sock:/var/run/docker.sock"

urgent need to change this.

And the last - never, hear, never use the Docker TCP socket without the absolute certainty that you need it, especially without the use of additional protection methods (at least authorization). By default, the Docker TCP socket opens the port on the external interface 0.0.0.0:2375 (2376, in the case of HTTPs) and allows you to fully control the containers, and with it the potential host system.

Rule 2

Configure an unprivileged user inside the container

Configuring a container to use an unprivileged user is the best way to avoid a privilege escalation attack. This can be done in various ways:

1. Using the -u option of the docker run command:

docker run -u 4000 alpine

2. During image build:

FROM alpine
RUN groupadd -r myuser && useradd -r -g myuser myuser
<Здесь ещё можно выполнять команды от root-пользователя, например, ставить пакеты>
USER myuser

3. Enable support for "user namespace" (user environment) in Docker daemon:

--userns-remap=default

Read more about this in the official documentation .

In Kubernetes, the latter is configured in the Security Context via the runAsNonRoot option:

kind: ...
apiVersion: ...
metadata:
  name: ...
spec:
  ...
  containers:
  - name: ...
    image: ....
    securityContext:
          ...
          runAsNonRoot: true
          ...

Rule 3

Limit container capabilities

On Linux, starting with kernel 2.2, there is a way to control the capabilities of privileged processes called Linux Kernel Capabilities (for details, see link).

Docker uses a predefined set of these kernel features by default. And it allows you to change this set using the commands:

--cap-drop — отключает поддержку возможности ядра
--cap-add — добавляет поддержку возможности ядра

The best security setting is to first disable all features (--cap-drop all), and then connect only the necessary ones. For example, like this:

docker run --cap-drop all --cap-add CHOWN alpine

And most important (!): Avoid running containers with the –privileged flag !!!

In Kubernetes, the Linux Kernel Capabilities constraint is configured in the Security Context via the capabilities option:

kind: ...
apiVersion: ...
metadata:
  name: ...
spec:
  ...
  containers:
  - name: ...
    image: ....
    securityContext:
          ...
          capabilities:
            drop:
              - all
            add:
              - CHOWN
          ...

Rule 4

Use the no-new-privileges flag

When starting a container, it is useful to use the --security-opt = no-new-privileges flag which prevents privilege escalation inside the container.

In Kubernetes, the Linux Kernel Capabilities constraint is configured in the Security Context via the allowPrivilegeEscalation option:

kind: ...
apiVersion: ...
metadata:
  name: ...
spec:
  ...
  containers:
  - name: ...
    image: ....
    securityContext:
          ...
          allowPrivilegeEscalation: false
          ...

Rule 5

Turn off inter-container communication

By default, inter-container communication is enabled in Docker, which means that all containers can communicate with each other (using the docker0 network). This feature can be disabled by running the Docker service with the –icc = false flag.

Rule 6

Use Linux Security Modules (Linux Security Module - seccomp, AppArmor, SELinux)

By default, Docker already uses profiles for Linux security modules. Therefore, never disable security profiles! The maximum that can be done with them is to tighten the rules.

The default profile for seccomp is available here .

Docker also uses AppArmor for protection, and the Docker Engine itself generates a default profile for AppArmor when the container starts. In other words, instead of:

$ docker run --rm -it hello-world

starts up:

$ docker run --rm -it --security-opt apparmor=docker-default hello-world

The documentation also provides an example of an AppArmor profile for nginx, which is quite possible (necessary!) To use:


#include 
profile docker-nginx flags=(attach_disconnected,mediate_deleted) {
  #include 
  network inet tcp,
  network inet udp,
  network inet icmp,
  deny network raw,
  deny network packet,
  file,
  umount,
  deny /bin/** wl,
  deny /boot/** wl,
  deny /dev/** wl,
  deny /etc/** wl,
  deny /home/** wl,
  deny /lib/** wl,
  deny /lib64/** wl,
  deny /media/** wl,
  deny /mnt/** wl,
  deny /opt/** wl,
  deny /proc/** wl,
  deny /root/** wl,
  deny /sbin/** wl,
  deny /srv/** wl,
  deny /tmp/** wl,
  deny /sys/** wl,
  deny /usr/** wl,
  audit /** w,
  /var/run/nginx.pid w,
  /usr/sbin/nginx ix,
  deny /bin/dash mrwklx,
  deny /bin/sh mrwklx,
  deny /usr/bin/top mrwklx,
  capability chown,
  capability dac_override,
  capability setuid,
  capability setgid,
  capability net_bind_service,
  deny @{PROC}/* w,   # deny write for all files directly in /proc (not in a subdir)
  # deny write to files not in /proc//** or /proc/sys/**
  deny @{PROC}/{[^1-9],[^1-9][^0-9],[^1-9s][^0-9y][^0-9s],[^1-9][^0-9][^0-9][^0-9]*}/** w,
  deny @{PROC}/sys/[^k]** w,  # deny /proc/sys except /proc/sys/k* (effectively /proc/sys/kernel)
  deny @{PROC}/sys/kernel/{?,??,[^s][^h][^m]**} w,  # deny everything except shm* in /proc/sys/kernel/
  deny @{PROC}/sysrq-trigger rwklx,
  deny @{PROC}/mem rwklx,
  deny @{PROC}/kmem rwklx,
  deny @{PROC}/kcore rwklx,
  deny mount,
  deny /sys/[^f]*/** wklx,
  deny /sys/f[^s]*/** wklx,
  deny /sys/fs/[^c]*/** wklx,
  deny /sys/fs/c[^g]*/** wklx,
  deny /sys/fs/cg[^r]*/** wklx,
  deny /sys/firmware/** rwklx,
  deny /sys/kernel/security/** rwklx,
}

Rule 7

Limit container resources

This rule is quite simple: in order to prevent containers from devouring all server resources during the next DoS / DDoS attack, we can set memory usage limits for each container individually. You can limit: amount of memory, CPU, number of container restarts.

So let's go in order.

Memory

Option -m or --memory The

maximum amount of memory that the container can use. The minimum value is 4m (4 megabytes).

Option --memory-swap

Option to configure swap (swap file). Configured cunningly:

If --memory-swap> 0, then the –memory flag must also be set. In this case, memory-swap shows how much total memory is available to the container along with swap.
A simpler example. If --memory = "300m", and --memory-swap = "1g", then the container can use 300MB of memory and 700MB of swap (1g - 300m).
If --memory-swap = 0, the setting is ignored.
If --memory-swap is set to the same value as --memory, then the container will not have swap.
If --memory-swap is not specified, but --memory is specified, then the number of swap will be equal to twice the amount of memory specified. For example, if --memory = "300m", and --memory-swap is not set, then the container will use 300MB of memory and 600MB of swap.
If --memory-swap = -1, then the container will use all the swap that is possible on the host system.

Note to the hostess: the free utility launched inside the container does not show the real value of the available swap for the container, but the number of host swap.

Option --oom-kill-disable

Allows you to enable or disable the OOM (Out of memory) killer.

Attention! You can turn off OOM Killer only with the --memory option specified, otherwise it may happen that with out-of-memory inside the container, the kernel will start killing the host system processes.

Other memory management configuration options, such as --memory-swappiness, --memory-reservation, and --kernel-memory, are more for tuning the container's performance.

Processor

Option --cpus

The option sets how much available processor resources the container can use. For example, if we have a host with two CPUs and we set --cpus = "1.5", then the container is guaranteed to use one and a half processors.

Option --cpuset-cpus

Configures the use of specific cores or CPUs. The value can be specified with a hyphen or a comma. In the first case, the range of allowed cores will be indicated, in the second - specific cores.

Number of container restarts


--restart=on-failure:

This setting sets how many times Docker will try to restart the container if it unexpectedly crashes. The counter is reset if the state of the container has changed to running.

It is recommended to set a small positive number, for example, 5, which will avoid endless restarts of a non-working service.

Rule 8

Use read-only file systems and volume

If the container should not write anything somewhere, then you need to use the read-only file system as much as possible. This will greatly complicate the life of a potential intruder.

An example of starting a container with read-only file system:

docker run --read-only alpine

An example of connecting volume in read-only mode:

docker run -v volume-name:/path/in/container:ro alpine

Rule 9

Use container security analysis tools

Tools must be used to detect containers with known vulnerabilities. There are not many of them yet, but they are:

• Free:

Clair.

• Commercial:

Snyk (there is a free version);
anchore (there is a free version);
JFrog XRay ;
Qualys .

And for Kubernetes, there are tools for detecting configuration errors:

Tags: