LXD - 4.23 - unable to start nested containers

I am unable to start nested container still, so no findmnt output from there. Sorry.

Thank you! Can you show me the contents of /etc/rc.conf for both the outer and nested container, please?

Sorry, I mean the host and the outer container.

In the first container please do:

mount -t cgroup cgroup -o none,name=openrc /sys/fs//cgroup/openrc

and then try to start nested containers and report whether that fixes it for you.

The problem that I see is that openrc doesn’t mount the openrc cgroup by itself. I suspect it needs to be told to do so explicitly.
When the nested container starts it will take a look at /proc/<pid>/cgroup and find name=openrc and will assume that name=openrc is mounted at /sys/fs/cgroup/openrc but while that directory exists the name=openrc controller isn’t actually mounted. So when LXC tries to move the container into this cgroup it won’t find cgroup.procs in there and fail.

I have sent a patch to LXC to verify that the fd we opened is indeed a cgroup fd and if not we’ll skip that controller (which is what we usually do).

In the meantime, until that patch is pushed one way to fix this is to set:

raw.lxc: lxc.mount.auto = cgroup:rw:force

which will make sure that LXC mounts the cgroups it detects on the host. This ensures that the outer container will have the name=openrc controller mounted at the expected location (if it is mounted on the host) meaning that the inner container will work.

I think this is the root of the problem.

HOST and CONTAINER

rc_shell=/sbin/sulogin
rc_tty_number=12
unicode="YES"

This fixes it when run inside container - now I can start nested containers

mount -t cgroup cgroup -o none,name=openrc /sys/fs//cgroup/openrc

THANK YOU!

I am going to check inside openrc if there is something that can be done about this.

So, the nested container now starts, but I cannot access it with exec

lxc exec test bash
Error: Failed to retrieve PID of executing child process
| test                                                  | RUNNING | 10.92.140.9 (eth0)   | fd42:3732:2a0d:9b9e:216:3eff:feb9:d055 (eth0) | CONTAINER | 0         |
lxc console --show-log test
                                               
Console log:                
                                               
INIT: version 3.01 booting
                                               
   OpenRC 0.44.10 is starting up LiGurOS Linux (x86_64) [LXC]

 * /proc is already mounted
 * Mounting /run ... * /run/openrc: creating directory
 * /run/lock: creating directory
 * /run/lock: correcting owner
 * Caching service dependencies ... [ ok ]
 * Mounting cgroup filesystem ... [ ok ]
mount: /sys/fs/cgroup/openrc: wrong fs type, bad option, bad superblock on openrc, missing codepage or helper program, or other error.
 * Remounting devtmpfs on /dev ... [ ok ]
 * Mounting /dev/shm ... [ ok ]
 * Configuring kernel parameters ...sysctl: permission denied on key "fs.protected_symlinks"
sysctl: permission denied on key "fs.protected_hardlinks"
 * Unable to configure some kernel parameters
 [ !! ]
 * ERROR: sysctl failed to start
 * Creating user login records ... [ ok ]
 * Wiping /tmp directory ... [ ok ]
 * Detecting local filesystems ... [ ok ]
 * Bringing up network interface lo ...RTNETLINK answers: File exists
 [ ok ]
 * Updating /etc/mtab ... * Creating mtab symbolic link
 [ ok ]
INIT: Entering runlevel: 3
 * Configuring kernel parameters ...sysctl: permission denied on key "fs.protected_symlinks"
sysctl: permission denied on key "fs.protected_hardlinks"
 * Unable to configure some kernel parameters
 [ !! ]
 * ERROR: sysctl failed to start
 * Network udhcpc eth0 up ...udhcpc: started, v1.34.1
Clearing IP addresses on eth0, upping it
udhcpc: broadcasting discover
udhcpc: broadcasting select for 10.92.140.9, server 10.92.140.1
udhcpc: lease of 10.92.140.9 obtained from 10.92.140.1, lease time 3600
Setting IP address 10.92.140.9 on eth0
Deleting routers
SIOCDELRT: No such process
Adding router 10.92.140.1
Recreating /etc/resolv.conf
 Adding DNS server 10.92.140.1                                                                 
 [ ok ]                                                                                        
 * Mounting network filesystems ... [ ok ]
 * Starting sshd ... [ ok ]
 * Starting local ... [ ok ]
INIT: no more processes left in this runlevel

I had the same problem and it’s unrelated to the openrc cgroup issue. @stgraber where there any changes in lxd that could explain this? Otherwise I’ll take a look tomorrow morning.

openrc inside container tries to mount cgroups, but fails with this error:

 * Mounting cgroup filesystem ... [ ok ]
mount: /sys/fs/cgroup/openrc: wrong fs type, bad option, bad superblock on openrc, missing codepage or helper program, or other error.

I will try to look into openrc cgroups.init why that is failing.

Thank you for looking into it. Le me know if you need more information.

Not that I can think of. We’ve had some folks reporting this on particular versions of the 5.15 kernel.

I am using kernel-5.16.7

There were some security fixes to how permission checking works on cgroups in the kernel. It might be that this is related.

I have installed on a different server lxd-4.0.9/lxc-4.0.12 in HOST and also in CONTAINER. There was same problem with NESTED-CONTAINER not starting, after I mounted cgroup=openrc manually the NESTED-CONT started, but I wasn’t able to connect to the shell/console.

I have then downgraded the kernel to 5.10.92 after reboot everything works even without manual cgroups mount intervention.

Feel free to ask for any sort of debug/trace messages, if it is going to help to solve this with kernel>5.10 I am ready to help.

Thanks for support.

I think I have a fix for the issue but the code is a hot mess right now. I’ll likely put up a PR tomorrow.

Nice @brauner ! Let me know if I can help in any way (eg. testing 
).

This should fix it:

1 Like

I can confirm that this patch fixes it for me on 5.16.7 kernel - lxd-4.23 and lxc-4.0.12 + plus the patch.

Thank you very much!