Container root has access to disk device that it shouldn't

Objective: Give read/write access to a part of the host filesystem to a non-root user in a container.

host /etc/subuid and /etc/subgid:
root:1000000:65536

What I did:

lxc config device add testmachine testdisk disk path=/testmount source=/tank/testmount recursive=true
This allowed root in the container to access host’s /tank/testmount at /testmount, however, its access was read-only. I’m assuming that this was because container-root was mapped, as advertised, to a nonprivileged user who only had world (read/execute, no write) access to /tank/testmount.

Next thing I tried was to create a user (called ‘nas’) on both the host and the container whose uids/gids I could map together.
useradd -rMd /var/empty nas
On the host, this user had uid 994 and gid 992; on the container, it had uid 997 and gid 996. I also chowned /tank/testmount to nas:nas, chmod 755.
lxc config set testmachine raw.idmap "uid 994 997\ngid 992 996"
Having done this, after bringing down the container, it wouldn’t come up again until I added these lines:

  • host’s /etc/subuid: root:994:1
  • host’s /etc/subgid: root:992:1

After this, the container came up fine, and as ‘nas’ on the container, I was able to create a file in container’s /testmount, which appeared on the host as being owned by nas:nas. However, as root on the container, I was also able to create a file here (which I couldn’t do before the raw.idmap/subuid/subgid change) which appeared on the host as being owned by 1000000:1000000. This I did not expect.

The source of my confusion is this. It’s easy enough to understand that my actions with raw.idmap/subuid/subgid set it up so that the container’s ‘nas’ user was the host’s ‘nas’ user and had the same privileges on the container as on the host. However, root on the container ostensibly maps to uid 1000000 on the host, which should have no privileges to modify /tank/testmount, and yet it’s able to.

Where does this unexpected privilege come from, and am I doing something wrong to accomplish my goal?

I appreciate anyone’s help. =)

Yeah, that looks a bit odd. You’d expect root to have to switch to the nas user before being able to do that.

@brauner

Yeah, that looks a bit odd. You’d expect root to have to switch to the nas user before being able to do that.

@brauner

I’ll take a look soon but it’s unlikely I will manage to do it this
week.

I’ll be around! Let me know if I can provide any more information to assist.

@brauner Ping? Don’t want to rush you, but also want to keep this from falling into the cracks. =)

I think this is expected behavior:

  • If the nas user is not mapped and you’re trying to create a file in a directory that is owned by your nas user the kernel will check on the directory inode whether you have the necessary capabilities within the user namespace and whether the owning ids of the inode are mapped within that user namespace. They aren’t so you’re not allowed to create that file.
  • If the nas user is mapped then the kernel will hit capable_wrt_inode_uidgid() as well. It checks on the directory you’re trying to create the file in that the current user (userns root in your questions) has necessary capabilities within the user namespace (it has) and that the id of the file to be created is mapped within that user namespace (it is since you’ve mapped the ids of the nas user). So you can create a file.