[3.1.0] binfmt-support.service in unprivileged guest requires write access on host's /proc/sys/fs/binfmt_misc

unprivileged guest service binfmt-support.service - Enable support for additional executable binary formats fails (logically) with:

unable to open /proc/sys/fs/binfmt_misc/status for writing: Permission denied
unable to open /proc/sys/fs/binfmt_misc/register for writing: Permission denied

on the host setfacl -m u:100000:w /proc/sys/fs/binfmt_misc/register produces

setfacl: /proc/sys/fs/binfmt_misc/register: Operation not supported

How do I get the guest the necessary write permission on the host?

I’d expect the kernel to stop you from doing that, at least until we have proper namespacing for binfmt (which I know at least one person was working on).

If the kernel wasn’t preventing your container from registering handlers in binfmt, that container would be allowed to take over the execution of any binary in any container or even on the host.

In the worst case scenario, you could have the container register a binfmt handler for the current native architecture’s ELF binaries and therefore intercept the execution of every single binary on the entire system.

Anyway, short answer is that the Permission denied came from simple permission checks in procfs which you’ve worked around with the setfacl, the Operation not permitted most likely comes directly from the binfmt module in the kernel as it sees a non-root user trying to reconfigure the execution of binaries on the entire system.

2 Likes

I understand the security implications and thus my general preference is to basically deploy only unprivileged containers.

Just in this case it presents a bit of a dilemma (sort of catch22) if I would be required to change the container to a privileged one instead since binfmt-support.service is an essential dependency in the guest for compiling some stuff…


Any idea when that might become available?

Assuming that all you need is something like the qemu-static binaries. Installing the package which sets that up in the host, then copying the static binaries into the container should work fine and is what we used to be doing when offering arm containers on x86 (though that was horribly broken for other reasons).

No idea for timeline on the binfmt namespace. The last attempt to upstream was late last year. I don’t know if @brauner saw more chatter about it.

Got those in the guest container but that would not help with the binfmt-support.service to start/work, which I suppose is there for a purpose and not just idle curiosity. Suppose will have to see how the compilation works out (or not).


Please pardon my ignorance in this matter - binfmt namespace being developed where (lxc(fs)?) and upstream being linux kernel development?

Developed in the Linux kernel. Once that’s done, it will just work for unprivileged containers. No need for anything to be done at the LXC/LXD level.

2 Likes

#posterity

binfmt support is kind of needed due to the fact that we are using qemu to chroot into the different architectures from amd64 or x86 systems.