Both host and container is running up-to-date Ubuntu 18.04.
A privileged container (with or without security.nesting set to true) on 3.0.2/3.6 will not run with latest docker 18.09. It runs OK with docker 18.06.1.
Trying to start the docker service gives a fail on not finding modules in /lib/modules
Nov 08 13:11:59 gitlab-runner-LXC-70-113 modprobe[672]: modprobe: ERROR: …/libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file ‘/lib/modules/4.15.0-38-generic/modules.dep.bin’
Nov 08 13:11:59 gitlab-runner-LXC-70-113 modprobe[672]: modprobe: FATAL: Module overlay not found in directory /lib/modules/4.15.0-38-generic
Nov 08 13:11:59 gitlab-runner-LXC-70-113 systemd[1]: containerd.service: Control process exited, code=exited status=1
Nov 08 13:11:59 gitlab-runner-LXC-70-113 systemd[1]: containerd.service: Failed with result ‘exit-code’.
Nov 08 13:11:59 gitlab-runner-LXC-70-113 systemd[1]: Failed to start containerd container runtime.
The snap is failing too but it’s the containerd service there.
Nov 08 13:34:03 gitlab-runner-LXC-70-110 modprobe[120050]: modprobe: ERROR: …/libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.15.0-29-generic/modules.dep.bin
This looks like a Docker regression.
I can replicate with Docker 18.09, and I get
Nov 08 16:51:59 docker modprobe[2816]: modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.15.0-38-generic/modules.dep.bin'
Nov 08 13:51:59 docker modprobe[2816]: modprobe: FATAL: Module overlay not found in directory /lib/modules/4.15.0-38-generic
Inside the container, the directory /lib/modules/ is empty. Here, Docker tries to look into that directory in order to load the overlay kernel module. This module might, or might not, be loaded on the host. Even if this module is loaded on the host, the new Docker 18.09 appears to try to search for it instead of just start using it.
Bit by the same issue. The culprit is this line in /lib/systemd/system/containerd.service:
ExecStartPre=/sbin/modprobe overlay
Which will fail inside a LXC container. Unfortunately there’s no way to create a drop-in to remove this specific line (as far as I’m aware). Overriding the whole unit file may break on upgrades, so I decided to remove this line directly from the unit file above. That’s not ideal, of course.
Is anybody aware of a better workaround, one that would not break when the package is reinstalled/upgraded?
It is an upstream issue at heart.
Docker should not modprobe blindly. It should be adapted to first check if overlay is loaded and try to load if not. Try to find an upstream bug report on this and if it does not exist, you can file a report requesting it.
Regarding workarounds, you can replace /sbin/modprobe with /bin/true in the container
I totally agree. I was not implying it is something LXC has to fix, I was just trying to share a workaround with other fellow users and figuring out how to cope until it gets fixed in upstream (if ever).
I know it was tongue-in-cheek, but I actually considered that… I’m not sure if it’s better to break when kmod is reinstalled, or docker-ce.
Now that I think about it, modprobe does not appear to have any practical use in a LXD container.
There is modprobe -c that works and may be used by some unknown scripts but apart from that, it does not look to be needed (there are no kernel modules in the container).
If you want to be extra cautious, I would suggest to create a test container and replace modprobe with a simply program that logs all invocations. Then, install Docker as usual (and other services?), restart the container a few times and finally check the log to see if any scripts have invoked modprobe.
As a workaround loading the overlay kernel module on the host should do the trick. You can do this with lxc config set <container> linux.kernel_modules overlay