LXD 5.0.0: Error creating nested container on devuan chimaera: Failed to open 6/net_prio, Failed to initialize cgroups

Dear all!

We’re evaluating the full LXD/LXC/LXCFS 5.0.0 LTS chain and it seems to work fine in most cases. But when creating a nested container, we’re running into said error. I’m out of ideas and knowledge on what I am missing.

~# lxc info --show-log local:c2
[...]
lxc c2 20220503094300.251 ERROR    cgfsng - cgroups/cgfsng.c:__initialize_cgroups:3274 - Not a directory - Failed to open 6/net_prio
lxc c2 20220503094300.251 ERROR    cgfsng - cgroups/cgfsng.c:initialize_cgroups:3434 - Not a directory - Failed to initialize cgroups
lxc c2 20220503094300.252 ERROR    cgroup - cgroups/cgroup.c:cgroup_init:33 - Bad file descriptor - Failed to initialize cgroup driver
lxc c2 20220503094300.252 ERROR    start - start.c:lxc_init:865 - Failed to initialize cgroup driver
lxc c2 20220503094300.252 ERROR    start - start.c:__lxc_start:2008 - Failed to initialize container "c2"
lxc c2 20220503094327.883 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:869 - No such file or directory - Failed to receive the container state
lxc 20220503094327.884 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220503094327.884 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"
lxc 20220503094327.884 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220503094327.884 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"

For reporting purposes I mostly followed the instructions on Nested containers in LXD | Ubuntu, but I had to switch to using a privileged first container or I run into another issue (which we can look at after separately or so - unprivileged nested container would be nice).

I highly believe it worked fine with 4.24 in this exact configuration, but I can’t prove that right now. If needed, I could make an effort to test.

So, the host:

~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Devuan
Description:	Devuan GNU/Linux 4 (chimaera)
Release:	4
Codename:	chimaera

~$ uname -a
Linux manderinli 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux

~$ cat /etc/subuid /etc/subgid
root:500000:196608
root:500000:196608

The full chain is built using ubuntu deb source packages as reference, where applicable and maintained (hello lxd) but mostly patched sysvinit in. See bottom of the post for more details if needed.

So we end up with those packages installed on the host:

~$ dpkg -l | grep lx
ii  liblxc-common                                               1:5.0.0-2xxx1~4.1                  amd64        Linux Containers userspace tools (common tools)
ii  liblxc1                                                     1:5.0.0-2xxx1~4.1                  amd64        Linux Containers userspace tools (library)
ii  lxcfs                                                       5.0.0-1xxx1~4.1                    amd64        FUSE based filesystem for LXC
ii  lxd                                                         5.0.0-1xxx1~4.1                    amd64        Container hypervisor based on LXC - daemon
ii  lxd-client                                                  5.0.0-1xxx1~4.1                    amd64        Container hypervisor based on LXC - client
~$ dpkg -l | grep cgroup
ii  cgroupfs-mount                                              1.4+devuan1                        all          Light-weight package to set up cgroupfs mounts
~$ lxc launch ubuntu:jammy c1privubuntu -c security.nesting=true -c security.privileged=true
Creating c1privubuntu
Starting c1privubuntu

~$ lxc exec c1privubuntu bash
# using snaps lxd and i did not modify any /etc/subuid /etc/subgid, I just use what is shipped in ubuntu:jammy
root@c1privubuntu:~# lxd init --auto
root@c1privubuntu:~# lxc launch ubuntu:jammy c2
Creating c2
Starting c2                                 
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart c2 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c2/lxc.conf: 
Try `lxc info --show-log local:c2` for more info

Error see above at the beginning of the post. I’m out of ideas what the culprit here is.


Details to the packages if needed:

LXD is built based on GitHub - lxc/lxd-pkg-ubuntu at dpm-bionic but with a lot of updates to account for all the changes in the (inline) dependencies, compilation and new binaries since then. I was updating this through several 4.x releases and I’m fairly familiar with it - but there could be mistakes! (Re-)Added sysvinit stuff to make it work with devuan.

LXC is usually built based on GitHub - lxc/lxc-pkg-ubuntu at dpm-jammy, but for the jammy release you did a sneaky 5.0.0 prerelease build :slight_smile: , so that source is based on the real source package of Ubuntu – Details of source package lxc in jammy. Patched to ship with sysvinit and dependency to cgroupfs-mount, since we don’t have systemd here to do cgroup stuff.

LXCFS is built based on lxc / lxcfs · GitLab as referenced by Ubuntu – Details of source package lxcfs in jammy. Also shipping the old sysvinit stuff.

You should not use security.nesting=true -c security.privileged=true together as it increases the ability of the workload to escape the container.

What does lxc info --show-log local:c2 show?

Hi @tomp, thank you for your time.

You should not use security.nesting=true -c security.privileged=true together as it increases the ability of the workload to escape the container.

That’s true, although we haven’t used that yet without privileged and worked fine in earlier versions. We use this only for gitlab-runner (using an lxd-executor) with only our own pipelines, so escaping is currently not critical.

Also I mentioned that I did try this but ran into a different issue. Just to rule out restrictions in c1 I assumed opening this request is better suited with privileged.

What does lxc info --show-log local:c2 show?

This was in the first code block in the post, I copied it here again for you and added in the part I cut out. This is now the full output:

~# lxc info --show-log local:c2
Name: c2
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/05/03 09:42 UTC
Last Used: 2022/05/03 09:43 UTC

Log:

lxc c2 20220503094300.251 ERROR    cgfsng - cgroups/cgfsng.c:__initialize_cgroups:3274 - Not a directory - Failed to open 6/net_prio
lxc c2 20220503094300.251 ERROR    cgfsng - cgroups/cgfsng.c:initialize_cgroups:3434 - Not a directory - Failed to initialize cgroups
lxc c2 20220503094300.252 ERROR    cgroup - cgroups/cgroup.c:cgroup_init:33 - Bad file descriptor - Failed to initialize cgroup driver
lxc c2 20220503094300.252 ERROR    start - start.c:lxc_init:865 - Failed to initialize cgroup driver
lxc c2 20220503094300.252 ERROR    start - start.c:__lxc_start:2008 - Failed to initialize container "c2"
lxc c2 20220503094327.883 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:869 - No such file or directory - Failed to receive the container state
lxc 20220503094327.884 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220503094327.884 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"
lxc 20220503094327.884 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220503094327.884 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"

We are just finishing off the meson build changes then it will be released.

OK so looks like the cgroups are not being setup correctly. Normally systemd will do this, but if you’re not running systemd then something else will need to do this.

Please see LXC - Gentoo Wiki

Thanks, I also thought it is cgroups related. I’ll look into it

And the previous host was systemd based! So all my notices about “it did work before” is probably related to that :smiley:

Although systemd containers are fine if I don’t use nesting, to clarify the documentation in the link.

Ok, it seems the link doesn’t help. My host already has that systemd cgroup on devuan chimaera:

~$ mount | grep cgroup
cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,mode=755)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,relatime,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_prio type cgroup (rw,relatime,net_prio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,relatime,pids)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,relatime,rdma)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,relatime,name=systemd)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/elogind type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/elogind/elogind-cgroups-agent,name=elogind)

(Also clarified using a reboot and turned off autostart of all containers, just in case they would interfere)