Unprivileged container does not work in Ubuntu 22.04

I recently upgrade my system from Ubuntu 20.04 to 22.04. Then my (unprivileged) LXC container stopped working. I also tried downloading new ones, which didn’t work eiter.

Things that I have checked/tried

  • privileged containers worked well
  • I did not see any messages about AppArmor in /var/log/syslog
  • I downloaded and tried to start new containers of ubuntu and archlinux, neither worked.
  • I tried another user to run containers, didn’t work.
  • Modifed init command to /bin/bash, didn’t work.

Things I did not try

  • Reboot host with cgroupv1/hybrid, I just try to make it work with v2

Output of lxc-checkconfig

LXC version 5.0.0~git2209-g5a7b9ce67
Kernel configuration not found at /proc/config.gz; searching...
Kernel configuration found at /boot/config-5.15.0-46-generic
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled

--- Control groups ---
Cgroups: enabled
Cgroup namespace: enabled

Cgroup v1 mount points: 


Cgroup v2 mount points: 
/sys/fs/cgroup

Cgroup v1 systemd controller: missing
Cgroup v1 freezer controller: missing
Cgroup ns_cgroup: required
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled

--- Misc ---
Veth pair device: enabled, loaded
Macvlan: enabled, not loaded
Vlan: enabled, not loaded
Bridges: enabled, loaded
Advanced netfilter: enabled, loaded
CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, not loaded
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, not loaded
FUSE (for use with lxcfs): enabled, not loaded

--- Checkpoint/Restore ---
checkpoint restore: enabled
CONFIG_FHANDLE: enabled
CONFIG_EVENTFD: enabled
CONFIG_EPOLL: enabled
CONFIG_UNIX_DIAG: enabled
CONFIG_INET_DIAG: enabled
CONFIG_PACKET_DIAG: enabled
CONFIG_NETLINK_DIAG: enabled
File capabilities: 

Note : Before booting a new kernel, you can check its configuration
usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig

Output of lxc-start

Command: systemd-run --unit=my-unit --user --scope -p "Delegate=yes" -- lxc-start -l DEBUG --logfile=/tmp/lxc.log -L /tmp/lxc1.log -F my-container

Running scope as unit: my-unit.scope
lxc-start: my-container: cgroups/cgfsng.c: __cgfsng_delegate_controllers: 2953 Device or resource busy - Could not enable "+memory +pids" controllers in the unified cgroup 8
lxc-start: my-container: mount_utils.c: fs_attach: 255 Permission denied - Failed to finalize filesystem context 19
lxc-start: my-container: cgroups/cgfsng.c: __cgroupfs_mount: 1539 Permission denied - Failed to mount cgroup2 filesystem onto 18((null))
lxc-start: my-container: cgroups/cgfsng.c: cgfsng_mount: 1708 Permission denied - Failed to force mount cgroup filesystem in cgroup namespace
lxc-start: my-container: conf.c: lxc_mount_auto_mounts: 851 Permission denied - Failed to mount "/sys/fs/cgroup"
lxc-start: my-container: conf.c: lxc_setup: 4396 Failed to setup remaining automatic mounts
lxc-start: my-container: start.c: do_start: 1275 Failed to setup container "my-container"
lxc-start: my-container: sync.c: sync_wait: 34 An error occurred in another process (expected sequence number 4)
lxc-start: my-container: start.c: __lxc_start: 2074 Failed to spawn container "my-container"
lxc-start: my-container: tools/lxc_start.c: main: 306 The container failed to start
lxc-start: my-container: tools/lxc_start.c: main: 311 Additional information can be obtained by setting the --logfile and --logpriority options

If I understood correctly, there must be something wrong before the container, as the init command does not make any effect.

I Googled the error message __cgfsng_delegate_controllers: 2953 Device or resource busy - Could not enable "+memory +pids" controllers in the unified cgroup 8, but didn’t find anything particular helpful.

DEBUG log

You could try this:

Hi Thomas, thanks for your reply!

That post does look similar, I also upgraded Ubuntu from 20.04 to 22.04.

I reckon as well going back to cgroupv1/hybrid might work, but I want it to make it work in v2. I have seen this workaround in different places, but I have not seen an explanation on “what’s wrong with the cgroup v2 setup”.

Update

My local build can set “+memory +pids” to subtree_control, but it still failed to mount cgroup.

And I do see “supports mount api” with my local build.

Original message

More context: could it be related to mount api?

I’m trying to build lxc from source, it doesn’t fully work, but it seems to go pass “mounting cgroup” step.

According to the error log, there was an error from here, which uses FSCONFIG_CMD_CREATE. I also saw “kernel supports mount api” in TRACE log.

However, I saw FSCONFIG_CMD_CREATE is not supported during meson setup, and I don’t see “kernel supports mount api” in the log.

Any thoughts @stgraber @brauner ?

Sorry I updated my previous message several times.

Now I think it is not a false alarm. I was confused because I messed up with PATH.

I tried running my local build with and without systemd-run, no matter the case, I saw the following in the log.

TRACE    cgfsng - ../src/lxc/cgroups/cgfsng.c:__cgfsng_delegate_controllers:3336 - Enabled "+memory +pids" controllers in the unified cgroup 11

Maybe related: I’ve been using sudo machinectl shell my-dev-user@ to enter a dev shell, then I’d run my local build in that shell, with or without systemd-run

I was able to narrow down a bit:

  • distro-installed lxc-start shows error about “add +memory +pids to subtree_control”, then another error about mounting cgroup
  • my local built lxc-start shows only the mounting error
  • mount api is not relevant, nothing changed if I force enable/disable it.
  • I can see lxc.init.cmd running if I set an empty lxc.mount.auto
    • but of course it’d complain about missing directories.
    • lxc.init.cmd=/bin/bash worked without mounting cgroup.

So I think there’s something about "no permission to mount cgroup (v2) directories).

Relevant question (since I’m not very familiar with cgroup):

How does “mounting cgroup v2 to child process/namespace” work? Can I do it without root? Is there a sample command that I can try?

@tomp I have a thoery and a hacky solution.

I think the issue is indeed about “mounting cgroup2 in rootless containers”. I was able to get a shell by:

In the shell I tried mount -t cgroup2 none somewhere, but I always got error “cannot mount … read only”. I’m not sure about the root cause though.

As for the hack, I found this bug and this PR relevant. Both are for runc.

I followed the same idea and apply the bind-mount in __cgroupfs_mount, which worked.

I’m not sure whether this would be a proper fix though.

Also a note, systemd automatically creates directories in /sys/fs/cgroup, I also needed to grant permissions to the container root, otherwise bind-mount would fail.

Interestingly, it seems that “setting permissions of cgroups directories” fixes the whole thing, I no longer need the bind-mount hack.

For example, the cgroup directory looks like this:

/sys/fs/cgroup/user.slice/user-1006.slice/user@1006.service/app.slice/lxc-my-container-0.scope

In my case I need to

  • manually call chmod o+x app.slice
  • in cgfsng.c apply o+x to lxc-my-container-0.scope, because this part is dynamically generated.

Then lxc-start just works, I don’t need any other hacks.

So does it sound like a proper fix or a hack? Any security concerns?

I’m not sure im afraid.

I find a potential cultprit: I have umask set to 0027 via pam_umask. Everything seems to work if I remove it.

Meanwhile I’m also looking for a better solution without disabling pam_umask.

I think this could be a potential explanation of the post that you mentioned.

Looking ahead, I think it’d be great we have at least one of:

  1. LXC checks that the container root has access to all cgroup directories, just like LXC checks the setuid bit.
  2. LXC shows a hint upon cgroup mounting errors.
  3. Maybe mention this in some relevant wiki/manual.

Otherwise this issue could be very cryptic.

With “umask” I was able to find a couple of existing issues. E.g. #2277 and #3100.

Apparently this also happened for cgroup v1, and it was “fixed” in pam-cgfs.

Now that cgroup v2 is handled by systemd (if I understood correctly), I wonder if this is a bug of LXC, or systemd, or maybe not a bug in the first place?

Are you launching your unprivileged container as the host root user or as an unprivileged user?

unprivileged user

If you have a reproducer then please could you log the issue over at Issues · lxc/lxc · GitHub

Thanks

Sure, will do. :ok_hand:

Done. #4186

1 Like

@tomp

You said:

It was a CGROUP2 bug/problem

Was there an actual BugID for this?