unprivileged guest service binfmt-support.service - Enable support for additional executable binary formats fails (logically) with:
unable to open /proc/sys/fs/binfmt_misc/status for writing: Permission denied
unable to open /proc/sys/fs/binfmt_misc/register for writing: Permission denied
on the host setfacl -m u:100000:w /proc/sys/fs/binfmt_misc/register produces
setfacl: /proc/sys/fs/binfmt_misc/register: Operation not supported
How do I get the guest the necessary write permission on the host?
I’d expect the kernel to stop you from doing that, at least until we have proper namespacing for binfmt (which I know at least one person was working on).
If the kernel wasn’t preventing your container from registering handlers in binfmt, that container would be allowed to take over the execution of any binary in any container or even on the host.
In the worst case scenario, you could have the container register a binfmt handler for the current native architecture’s ELF binaries and therefore intercept the execution of every single binary on the entire system.
Anyway, short answer is that the Permission denied came from simple permission checks in procfs which you’ve worked around with the setfacl, the Operation not permitted most likely comes directly from the binfmt module in the kernel as it sees a non-root user trying to reconfigure the execution of binaries on the entire system.
I understand the security implications and thus my general preference is to basically deploy only unprivileged containers.
Just in this case it presents a bit of a dilemma (sort of catch22) if I would be required to change the container to a privileged one instead since binfmt-support.service is an essential dependency in the guest for compiling some stuff…
Assuming that all you need is something like the qemu-static binaries. Installing the package which sets that up in the host, then copying the static binaries into the container should work fine and is what we used to be doing when offering arm containers on x86 (though that was horribly broken for other reasons).
No idea for timeline on the binfmt namespace. The last attempt to upstream was late last year. I don’t know if @brauner saw more chatter about it.
Got those in the guest container but that would not help with the binfmt-support.service to start/work, which I suppose is there for a purpose and not just idle curiosity. Suppose will have to see how the compilation works out (or not).
Please pardon my ignorance in this matter - binfmt namespace being developed where (lxc(fs)?) and upstream being linux kernel development?
Developed in the Linux kernel. Once that’s done, it will just work for unprivileged containers. No need for anything to be done at the LXC/LXD level.