I’m trying to run a privileged guest with systemd and v1 cgroups (Ubuntu Xenial) on a host with v2 cgroups (Debian 11 5.10.0-26-amd64). I’m semisure that this was possible earlier on older kernel by simply mounting tmpfs on /sys/fs/cgroup in the container. Now it leads to
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
My debugging boiled down to try unshare. I can mount the v1 cgroup outside the container:
~$ sudo mount -t cgroup -o none,name=systemd none mnt
~$ ls mnt
cgroup.clone_children cgroup.sane_behavior release_agent
cgroup.procs notify_on_release tasks
Can you try doing that mount on the host, keeping it there and then making a new mount in the namespace?
I know that some kernel versions need to have a cgroup controller be setup by the root user on the host prior to its use in a container. Though in your case it appears your container is privileged which therefore shouldn’t really be affected by this.
~$ sudo mount -t cgroup -o none,name=systemd none cg
~$ ls cg
cgroup.clone_children cgroup.sane_behavior release_agent
cgroup.procs notify_on_release tasks
~$ sudo unshare -C
/home/timo# ls cg
cgroup.clone_children cgroup.sane_behavior release_agent
cgroup.procs notify_on_release tasks
/home/timo# mount -t cgroup -o none,name=systemd none cg
mount: /home/timo/cg: none already mounted on /sys/fs/bpf.
/home/timo# ls cg
cgroup.clone_children cgroup.sane_behavior release_agent
cgroup.procs notify_on_release tasks
The mount is seen also in the namespace but the mount command fails. I don’t know how would I try this with the container but I would expect systemd would also fail trying to mount cgroup.
The error message is very confusing: “already mounted on /sys/fs/bpf”.
Ok, I found out that if I unshare -Cm and umount the mount point in the namespace, then I can successfully do the mountin the namespace. Or I can mount into a different mount point.
Now I’m not sure if I know how to apply this in the container setup…
I mounted cgroup in the host just somewhere (/home/timo/cg) and the container started to work. Thanks a lot for steering me into the right track!
Now I wonder, how would I apply this knowledge in a ‘correct’ or ‘elegant’ way. Should this work out of the box? Is there something missing/wrong in my installation?
It is working nicely now but I’m a little bit worried about container separation. When systemd in the container adds things into the cgroup tree, e.g. user.slice, they are visible in the host side too. Is this going to cause problems if I had multiple v1 containers? Is there a way to somehow confine the mounts?