Disabling unprvileged user namespace cloning on an LXD host

fwaggle · March 16, 2022, 4:21am

To avoid grave-digging, I’m starting a new thread (I’m not sure it would be the faux pas the warning on the side suggested, but to be safe I won’t grave-dig a resolved thread), but it’s related to this:

Are you able to shed a bit more light on when/why this would be required by LXD?

We’ve been looking at doing some general hardening, and consistent advice is that if you don’t need this feature you should turn it off, and the only thing I’ve been able to find that reliably uses it is browsers for sandboxing purposes (not in my use case in the slightest).

From my testing, LXD seems to function fine without this set - on Ubuntu, LXD itself runs as “root” (albeit in a snap container?), and all the functions that I have tested seem to work (I haven’t tested anything CRIU related, but honestly CRIU has never worked right for me anyway).

But for some reason, LXD’s Snap startup script specifically resets this to 1 if it’s set to zero: lxd-pkg-snap/snapcraft/commands/daemon.start at 6a8b5bee9c78bfef92bd89d4810bc74b6e670d69 · canonical/lxd-pkg-snap · GitHub

I’ve yet to find anything on my home machine that doesn’t work when this is disabled, but is there something I’m missing? If it’s just the case of “anything that needs this will fail on the host” I can live with that, I’m just wondering if there’s something specific about LXD that requires it.

Thanks a lot!

stgraber · March 16, 2022, 2:31pm

LXD containers may themselves need to run tasks inside of a user namespace, some systemd units do that, some of LXD’s own helpers also do it as would running any kind of nested container.

That’s because root inside of a LXD container is still seen by the kernel as an unprivileged user on the host and so requires this feature be enabled for root in a LXD container being able to unshare a user namespace.

fwaggle · March 16, 2022, 11:33pm

Thanks Stéphane!

Do you happen to know which LXD helpers would need this? At this time I’ve not been able to find any problems - exec, copy, etc all seem to work as intended.

I think we can find workarounds for any systemd units that use this (haven’t been able to find any on ubuntu-minimal with a fairly small number of packages installed, eg Nginx, MariaDB, etc).

The reason I bring it up again is because it’s becoming a fairly common theme that having this feature off (which IIRC Debian defaults to off, Ubuntu defaults to “on” for some reason, which makes sense in a Desktop environment but not really in a server one from what I can see) mitigates several recent LPEs and container escapes over the last year or so.

I know that patching is the preferred solution, but I would seek to avoid the exposure between when a vulnerability is found and when patches are available (with some of them being rather slow from Ubuntu), if the feature isn’t used at all then turning it off seems to make the most sense.

So that was the question: what do we give up to turn this off? From what I can tell I think we can live without this feature, but I’m not sure if I’m missing something.