[Ubuntu 18.04] 18.09.0 Docker on LXD 3.0.2/3.6 is broken

Both host and container is running up-to-date Ubuntu 18.04.
A privileged container (with or without security.nesting set to true) on 3.0.2/3.6 will not run with latest docker 18.09. It runs OK with docker 18.06.1.

Trying to start the docker service gives a fail on not finding modules in /lib/modules

Nov 08 13:11:59 gitlab-runner-LXC-70-113 modprobe[672]: modprobe: ERROR: …/libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file ‘/lib/modules/4.15.0-38-generic/modules.dep.bin’
Nov 08 13:11:59 gitlab-runner-LXC-70-113 modprobe[672]: modprobe: FATAL: Module overlay not found in directory /lib/modules/4.15.0-38-generic
Nov 08 13:11:59 gitlab-runner-LXC-70-113 systemd[1]: containerd.service: Control process exited, code=exited status=1
Nov 08 13:11:59 gitlab-runner-LXC-70-113 systemd[1]: containerd.service: Failed with result ‘exit-code’.
Nov 08 13:11:59 gitlab-runner-LXC-70-113 systemd[1]: Failed to start containerd container runtime.

The snap is failing too but it’s the containerd service there.

Nov 08 13:34:03 gitlab-runner-LXC-70-110 modprobe[120050]: modprobe: ERROR: …/libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.15.0-29-generic/modules.dep.bin

Any idea how to resolve this?

Workaround:
apt-get install docker-ce=18.06.1~ce~3-0~ubuntu

This looks like a Docker regression.
I can replicate with Docker 18.09, and I get

Nov 08 16:51:59 docker modprobe[2816]: modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.15.0-38-generic/modules.dep.bin'
Nov 08 13:51:59 docker modprobe[2816]: modprobe: FATAL: Module overlay not found in directory /lib/modules/4.15.0-38-generic

Inside the container, the directory /lib/modules/ is empty. Here, Docker tries to look into that directory in order to load the overlay kernel module. This module might, or might not, be loaded on the host. Even if this module is loaded on the host, the new Docker 18.09 appears to try to search for it instead of just start using it.

So, what do you do?

ubuntu@docker:~$ sudo mkdir /lib/modules/4.15.0-38-generic/
ubuntu@docker:~$ exit
$ lxc file push /lib/modules/4.15.0-38-generic/modules.dep.bin docker//lib/modules/4.15.0-38-generic/modules.dep.bin
ubuntu@docker:~$ lxc restart docker

Now, let’s launch a Docker container in this docker LXD container.

ubuntu@docker:~$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
d1725b59e92d: Pull complete 
Digest: sha256:0add3ace90ecb4adbf7777e9aacf18357296e799f81cabc9fde470971e499788
Status: Downloaded newer image for hello-world:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"could not create session key: function not implemented\"": unknown.
ERRO[0004] error waiting for container: context canceled 
ubuntu@docker:~$

What’s going on? Looks like this issue,
https://bugs.chromium.org/p/chromium/issues/detail?id=860565

As it says, this (second) issue is related to runC (the Docker runtime) and they describe a workaround that requires making a change in runC.

Bit by the same issue. The culprit is this line in /lib/systemd/system/containerd.service:

ExecStartPre=/sbin/modprobe overlay

Which will fail inside a LXC container. Unfortunately there’s no way to create a drop-in to remove this specific line (as far as I’m aware). Overriding the whole unit file may break on upgrades, so I decided to remove this line directly from the unit file above. That’s not ideal, of course.

Is anybody aware of a better workaround, one that would not break when the package is reinstalled/upgraded?

It is an upstream issue at heart.
Docker should not modprobe blindly. It should be adapted to first check if overlay is loaded and try to load if not. Try to find an upstream bug report on this and if it does not exist, you can file a report requesting it.

Regarding workarounds, you can replace /sbin/modprobe with /bin/true in the container :slight_smile:

I totally agree. I was not implying it is something LXC has to fix, I was just trying to share a workaround with other fellow users and figuring out how to cope until it gets fixed in upstream (if ever).

I know it was tongue-in-cheek, but I actually considered that… I’m not sure if it’s better to break when kmod is reinstalled, or docker-ce.

Maybe a package diversion could work.

Now that I think about it, modprobe does not appear to have any practical use in a LXD container.
There is modprobe -c that works and may be used by some unknown scripts but apart from that, it does not look to be needed (there are no kernel modules in the container).

If you want to be extra cautious, I would suggest to create a test container and replace modprobe with a simply program that logs all invocations. Then, install Docker as usual (and other services?), restart the container a few times and finally check the log to see if any scripts have invoked modprobe.

For the interested:

As a workaround loading the overlay kernel module on the host should do the trick. You can do this with lxc config set <container> linux.kernel_modules overlay

1 Like