LXD 5.0.0: Error creating nested container on devuan chimaera: Failed to open 6/net_prio, Failed to initialize cgroups

Dear all!

We’re evaluating the full LXD/LXC/LXCFS 5.0.0 LTS chain and it seems to work fine in most cases. But when creating a nested container, we’re running into said error. I’m out of ideas and knowledge on what I am missing.

~# lxc info --show-log local:c2
[...]
lxc c2 20220503094300.251 ERROR    cgfsng - cgroups/cgfsng.c:__initialize_cgroups:3274 - Not a directory - Failed to open 6/net_prio
lxc c2 20220503094300.251 ERROR    cgfsng - cgroups/cgfsng.c:initialize_cgroups:3434 - Not a directory - Failed to initialize cgroups
lxc c2 20220503094300.252 ERROR    cgroup - cgroups/cgroup.c:cgroup_init:33 - Bad file descriptor - Failed to initialize cgroup driver
lxc c2 20220503094300.252 ERROR    start - start.c:lxc_init:865 - Failed to initialize cgroup driver
lxc c2 20220503094300.252 ERROR    start - start.c:__lxc_start:2008 - Failed to initialize container "c2"
lxc c2 20220503094327.883 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:869 - No such file or directory - Failed to receive the container state
lxc 20220503094327.884 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220503094327.884 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"
lxc 20220503094327.884 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220503094327.884 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"

For reporting purposes I mostly followed the instructions on Nested containers in LXD | Ubuntu, but I had to switch to using a privileged first container or I run into another issue (which we can look at after separately or so - unprivileged nested container would be nice).

I highly believe it worked fine with 4.24 in this exact configuration, but I can’t prove that right now. If needed, I could make an effort to test.

So, the host:

~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Devuan
Description:	Devuan GNU/Linux 4 (chimaera)
Release:	4
Codename:	chimaera

~$ uname -a
Linux manderinli 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux

~$ cat /etc/subuid /etc/subgid
root:500000:196608
root:500000:196608

The full chain is built using ubuntu deb source packages as reference, where applicable and maintained (hello lxd) but mostly patched sysvinit in. See bottom of the post for more details if needed.

So we end up with those packages installed on the host:

~$ dpkg -l | grep lx
ii  liblxc-common                                               1:5.0.0-2xxx1~4.1                  amd64        Linux Containers userspace tools (common tools)
ii  liblxc1                                                     1:5.0.0-2xxx1~4.1                  amd64        Linux Containers userspace tools (library)
ii  lxcfs                                                       5.0.0-1xxx1~4.1                    amd64        FUSE based filesystem for LXC
ii  lxd                                                         5.0.0-1xxx1~4.1                    amd64        Container hypervisor based on LXC - daemon
ii  lxd-client                                                  5.0.0-1xxx1~4.1                    amd64        Container hypervisor based on LXC - client
~$ dpkg -l | grep cgroup
ii  cgroupfs-mount                                              1.4+devuan1                        all          Light-weight package to set up cgroupfs mounts
~$ lxc launch ubuntu:jammy c1privubuntu -c security.nesting=true -c security.privileged=true
Creating c1privubuntu
Starting c1privubuntu

~$ lxc exec c1privubuntu bash
# using snaps lxd and i did not modify any /etc/subuid /etc/subgid, I just use what is shipped in ubuntu:jammy
root@c1privubuntu:~# lxd init --auto
root@c1privubuntu:~# lxc launch ubuntu:jammy c2
Creating c2
Starting c2                                 
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart c2 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c2/lxc.conf: 
Try `lxc info --show-log local:c2` for more info

Error see above at the beginning of the post. I’m out of ideas what the culprit here is.


Details to the packages if needed:

LXD is built based on GitHub - lxc/lxd-pkg-ubuntu at dpm-bionic but with a lot of updates to account for all the changes in the (inline) dependencies, compilation and new binaries since then. I was updating this through several 4.x releases and I’m fairly familiar with it - but there could be mistakes! (Re-)Added sysvinit stuff to make it work with devuan.

LXC is usually built based on GitHub - lxc/lxc-pkg-ubuntu at dpm-jammy, but for the jammy release you did a sneaky 5.0.0 prerelease build :slight_smile: , so that source is based on the real source package of Ubuntu – Details of source package lxc in jammy. Patched to ship with sysvinit and dependency to cgroupfs-mount, since we don’t have systemd here to do cgroup stuff.

LXCFS is built based on lxc / lxcfs · GitLab as referenced by Ubuntu – Details of source package lxcfs in jammy. Also shipping the old sysvinit stuff.

You should not use security.nesting=true -c security.privileged=true together as it increases the ability of the workload to escape the container.

What does lxc info --show-log local:c2 show?

Hi @tomp, thank you for your time.

You should not use security.nesting=true -c security.privileged=true together as it increases the ability of the workload to escape the container.

That’s true, although we haven’t used that yet without privileged and worked fine in earlier versions. We use this only for gitlab-runner (using an lxd-executor) with only our own pipelines, so escaping is currently not critical.

Also I mentioned that I did try this but ran into a different issue. Just to rule out restrictions in c1 I assumed opening this request is better suited with privileged.

What does lxc info --show-log local:c2 show?

This was in the first code block in the post, I copied it here again for you and added in the part I cut out. This is now the full output:

~# lxc info --show-log local:c2
Name: c2
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/05/03 09:42 UTC
Last Used: 2022/05/03 09:43 UTC

Log:

lxc c2 20220503094300.251 ERROR    cgfsng - cgroups/cgfsng.c:__initialize_cgroups:3274 - Not a directory - Failed to open 6/net_prio
lxc c2 20220503094300.251 ERROR    cgfsng - cgroups/cgfsng.c:initialize_cgroups:3434 - Not a directory - Failed to initialize cgroups
lxc c2 20220503094300.252 ERROR    cgroup - cgroups/cgroup.c:cgroup_init:33 - Bad file descriptor - Failed to initialize cgroup driver
lxc c2 20220503094300.252 ERROR    start - start.c:lxc_init:865 - Failed to initialize cgroup driver
lxc c2 20220503094300.252 ERROR    start - start.c:__lxc_start:2008 - Failed to initialize container "c2"
lxc c2 20220503094327.883 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:869 - No such file or directory - Failed to receive the container state
lxc 20220503094327.884 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220503094327.884 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"
lxc 20220503094327.884 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220503094327.884 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"

We are just finishing off the meson build changes then it will be released.

OK so looks like the cgroups are not being setup correctly. Normally systemd will do this, but if you’re not running systemd then something else will need to do this.

Please see LXC - Gentoo Wiki

Thanks, I also thought it is cgroups related. I’ll look into it

And the previous host was systemd based! So all my notices about “it did work before” is probably related to that :smiley:

Although systemd containers are fine if I don’t use nesting, to clarify the documentation in the link.

Ok, it seems the link doesn’t help. My host already has that systemd cgroup on devuan chimaera:

~$ mount | grep cgroup
cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,mode=755)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,relatime,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_prio type cgroup (rw,relatime,net_prio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,relatime,pids)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,relatime,rdma)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,relatime,name=systemd)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/elogind type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/elogind/elogind-cgroups-agent,name=elogind)

(Also clarified using a reboot and turned off autostart of all containers, just in case they would interfere)

I also encountered this problem on WSL2.

In my environment, The WSL2 runs an ubuntu 22.04 (host) and installed LXD 5.19. And I configured a ZFS 2.2 backend to support delegation.

I can run a lxd container (level 1) in the host. But running a container (level 2) nested in the container (level 1) gives the following error message:

lxc tester 20231128102529.969 ERROR    cgfsng - ../src/src/lxc/cgroups/cgfsng.c:__initialize_cgroups:3672 - Not a directory - Failed to open 6/net_prio
lxc tester 20231128102529.969 ERROR    cgfsng - ../src/src/lxc/cgroups/cgfsng.c:initialize_cgroups:3832 - Not a directory - Failed to initialize cgroups
lxc tester 20231128102529.969 ERROR    cgroup - ../src/src/lxc/cgroups/cgroup.c:cgroup_init:34 - Bad file descriptor - Failed to initialize cgroup driver
lxc tester 20231128102529.969 ERROR    start - ../src/src/lxc/start.c:lxc_init:862 - Failed to initialize cgroup driver
lxc tester 20231128102529.969 ERROR    start - ../src/src/lxc/start.c:__lxc_start:2027 - Failed to initialize container "tester"
lxc tester 20231128102530.572 ERROR    lxccontainer - ../src/src/lxc/lxccontainer.c:wait_on_daemonized_start:870 - No such file or directory - Failed to receive the container state
lxc 20231128102530.572 ERROR    af_unix - ../src/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20231128102530.572 ERROR    commands - ../src/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

The error msg shows that net_prio failed because it is Not a directory.
I found the source code here: https://github.com/lxc/lxc/blob/cb8e38aca27a23964941f0f011a8919aab8bebab/src/lxc/cgroups/cgfsng.c#L3668C32-L3668C32

After investigating, I found that in the host, ls -l /sys/fs/cgroup shows that net_prio is a directory.

total 0
dr-xr-xr-x 11 root root  0 Nov 28 17:53 blkio
dr-xr-xr-x 11 root root  0 Nov 28 17:53 cpu
drwxr-xr-x  2 root root 40 Nov 28 17:53 cpu,cpuacct
dr-xr-xr-x  4 root root  0 Nov 28 17:53 cpuacct
dr-xr-xr-x  4 root root  0 Nov 28 17:53 cpuset
dr-xr-xr-x 11 root root  0 Nov 28 17:53 devices
dr-xr-xr-x  5 root root  0 Nov 28 17:53 freezer
dr-xr-xr-x  4 root root  0 Nov 28 17:53 hugetlb
dr-xr-xr-x 11 root root  0 Nov 28 17:53 memory
dr-xr-xr-x  4 root root  0 Nov 28 17:53 misc
dr-xr-xr-x  4 root root  0 Nov 28 17:53 net_cls
drwxr-xr-x  2 root root 40 Nov 28 17:53 net_cls,net_prio
dr-xr-xr-x  4 root root  0 Nov 28 17:53 net_prio
dr-xr-xr-x  4 root root  0 Nov 28 17:53 perf_event
dr-xr-xr-x 11 root root  0 Nov 28 17:53 pids
dr-xr-xr-x  4 root root  0 Nov 28 17:53 rdma
dr-xr-xr-x 12 root root  0 Nov 28 17:53 systemd
dr-xr-xr-x 12 root root  0 Nov 28 18:03 unified

But in the container (level 1), ls -l /sys/fs/cgroup/ shows that net_prio is a symlink to net_cls,net_prio

total 0
drwxrwxr-x 5 nobody root  0 Nov 28 10:03 blkio
lrwxrwxrwx 1 root   root 11 Nov 28 10:03 cpu -> cpu,cpuacct
drwxr-xr-x 2 root   root 40 Nov 28 10:03 cpu,cpuacct
lrwxrwxrwx 1 root   root 11 Nov 28 10:03 cpuacct -> cpu,cpuacct
drwxrwxr-x 2 nobody root  0 Nov 28 10:03 cpuset
drwxrwxr-x 5 nobody root  0 Nov 28 10:03 devices
drwxrwxr-x 3 nobody root  0 Nov 28 10:03 freezer
drwxrwxr-x 2 nobody root  0 Nov 28 10:03 hugetlb
drwxrwxr-x 5 nobody root  0 Nov 28 10:03 memory
drwxrwxr-x 2 nobody root  0 Nov 28 10:03 misc
lrwxrwxrwx 1 root   root 16 Nov 28 10:03 net_cls -> net_cls,net_prio
drwxr-xr-x 2 root   root 40 Nov 28 10:03 net_cls,net_prio
lrwxrwxrwx 1 root   root 16 Nov 28 10:03 net_prio -> net_cls,net_prio
drwxrwxr-x 2 nobody root  0 Nov 28 10:03 perf_event
drwxrwxr-x 5 nobody root  0 Nov 28 10:03 pids
drwxrwxr-x 2 nobody root  0 Nov 28 10:03 rdma
drwxrwxr-x 5 nobody root  0 Nov 28 10:03 systemd
drwxrwxr-x 6 nobody root  0 Nov 28 10:03 unified

I don’t know if it is the root cause.

I have identified that the reason why /sys/fs/cgroup hierarchy containing symlink is due to cgroup v1. The solution is to disable it. For instance, on my WSL2 setup, I resolved the issue by adding the line kernelCommandLine = cgroup_no_v1=all to .wslconfig to switch to only cgroup v2.