It is highly suggested on this page: Linux Containers - LXC - Getting started
“So should something go very wrong and an attacker manages to escape the container,
they’ll find themselves with about as many rights as a nobody user.”
What lead to that first word “So”: "That means that uid 0 (root) in the container is actually something like uid 100000 outside the container. "
Wouldn’t that normally suggest that the escape leads to the user that started the container, not a nobody user?
I also tried:
user@user:/$ unshare --user bash
nobody@user:/$ unshare --map-root-user bash
unshare: unshare failed: Operation not permitted
This works though:
user@user:/$ unshare --map-root-user bash
Which makes me think that normally unprivileged container’s can’t be made from nobody users.
Does LXC circumvent this and still make unprivileged container’s from nobody users?
Or is the first quote just false/very loose language?
If it’s true I don’t think I need separate users for all of my containers.
The user that spawned the container matters relatively little. What matters is the uid/gid map which is applied to the container. For unprivileged users, this map is restricted by what’s in /etc/subuid and /etc/subgid, for privileged users, it doesn’t have to be.
The result is that the vast majority of unprivileged containers these days are actually spawned by root (as that’s how LXD does it) but they’re still unprivileged containers as real root isn’t mapped to a user inside of the container.
In any case, what the statement is trying to convey is that because the uids and gids used for the container aren’t meaningful on the host, should the user be able to escape from the container as any of the uid/gid in that container, they’d find themselves on the host with about as many rights as a nobody/nogroup user would (actually slightly less as nobody/nogroup may own some things) .
I now realize it’s actually properly written in the 2nd quote,
I failed to properly understand the sentence.
Do I understand correctly that;
for example on LXD as root (or any user with LXC)
a unused subuid and subgid is mapped for the container,
before something analogous to “unshare --map-root-user” is done?
So the difference with my example that didn’t work is that instead of a nobody you do it as the unused subuid@subgid? Is this done separately for each container?
And this is configured with:
MS_UID="$(grep “$(id -un)” /etc/subuid | cut -d : -f 2)"
ME_UID="$(grep “$(id -un)” /etc/subuid | cut -d : -f 3)"
MS_GID="$(grep “$(id -un)” /etc/subgid | cut -d : -f 2)"
ME_GID="$(grep “$(id -un)” /etc/subgid | cut -d : -f 3)"
echo “lxc.idmap = u 0 $MS_UID $ME_UID” >> ~/.config/lxc/default.conf
echo “lxc.idmap = g 0 $MS_GID $ME_GID” >> ~/.config/lxc/default.conf
Is there any risk of these subuid’s and subgid’s that LXC uses, overlapping? I would assume that as long as only LXC is making use of this feature, that should not happen, but if there is another program on the same host making use of subuid’s and subgid’s? Then should you just split the values?
Or is that unnecessary?
And any escape from this unused subuid/subgid is guaranteed to be an upstream kernel security issue, instead of an issue with LXC? Is that correct?
For LXC, it’s manual and you need to figure out what you set for each containers.
For LXD, it generally ignores subuid/subgid but on systems where it uses it, it figures out the largest contiguous range for the root user and uses that for all containers unless containers are setup to be isolated from each other (security.idmap.isolated).
There is nothing that prevents two containers to use the same or slightly overlapping map just like there’s nothing preventing two container managers using the same maps either. So it’s really a matter of keeping things properly configured.
There’s however generally less of a concern having multiple containers sharing some uids/gids that there are with containers sharing uids/gids with active host uid/gid.
And yeah, LXC/LXD setup the namespaces in the kernel, after that, all enforcement is done by the kernel, so a way to escape such confinement is normally a kernel security bug and can most often be used outside of a container too as a way to gain privileges (so usually pretty nasty security issues when one of those comes around).