Hi all!
I’m working on some changes on the snap-confine program, but since a couple of days I’m stuck with a problem that I fail to understand: snap-confine needs to create a directory under /sys/fs/cgroup/freezer/, but the mkdirat syscall fails with EACCESS:
snap-confine is setuid root: when it starts, in my branch I drop to the ordinary user but I retain a few capabilities, including DAC_OVERRIDE.
Even modifying the program to not drop to the ordinary user, and continue being setuid root (and cap_to_text() shows =ep, that is all caps are effective) does not help, I get the same error.
/sys/fs/cgroup/freezer/ has this (weird, IMHO) ownership:
# lxd.lxc exec my-ubuntu -- ls -ld /sys/fs/cgroup/freezer/
drwxrwxr-x 3 nobody root 0 Nov 17 07:09 /sys/fs/cgroup/freezer/```
No apparmor or seccomp denials are visible in the logs
snap-confine is executed as an ordinary user.
The test code (where you can see how the machine is initialized) is here.
(snap-confine is executed as part of the /snap/bin/test-snapd-sh.sh command).
The reason why it’s currently working in snapd master branch is that before doing that mkdirat call we are changing our effective group to be root as well. Then it matches the group set on the /sys/fs/cgroup/freezer/ so it has permissions to create child items according to the DAC. But this should not be needed, since we have DAC_OVERRIDE.
So, the question is, why can’t we create a directory under /sys/fs/cgroup/freezer/ even being root or having DAC_OVERRIDE?
It will fail with permission denied. Note that the effective user when running this program will be root, and as such it will have CAP_DAC_OVERRIDE set (I can also make another test where I renounce the root effective user as well and just retain the CAP_DAC_OVERRIDE capability, if you’d rather see that). But the directory creation fails because the group does not match; yet, this operation succeeds, when you are in the host machine.
Please, try to add printf("%d %d %d\n", getpid(), geteuid(), getuid()); in the beginning of main function and ensure that geteuid() shows 0. I’ve tried to play with that reproducer on my system and noticed that from the user namespace without root mapping you will not get effective UID to be 0. This may be the reason for the behavior you are seeing.