Incus isolation - unexpected behaviour

When running an isolated container and mounting a filesystem from the host, root in the container appears to bypass filesystem permissions on host, much to my surprise. Can somebody confirm if this is expected behaviour, and if so, why?

Incus

  kernel_version: 6.1.0-21-amd64 (debian-12)
  server_version: 6.0.0
  storage: zfs
  storage_version: 2.2.3-2~bpo12+1
  driver: lxc
  driver_version: 5.0.2

Container:

config:
  raw.idmap: |
    both 1000 1000
  security.idmap.base: "1000000"
  security.idmap.isolated: "true"
  security.idmap.size: "6553600"
devices:
  host:
    path: /test
    source: /test
    type: disk

Host:

ls -ld /test
drwx------ 8 user user 22 Jun  2 20:44 /test

Container (incus exec as root):

cd /test
touch file
ls -l file
-rw-r--r-- 1 root    root     0 Jun  2 21:18 file

Back on the host:

ls -l /test/file
-rw-r--r-- 1 1000000 1000000 0 Jun  2 21:18 /test/file

How on earth is UID 1000000 able to write to /test on the host, since it was owner by user:user (1000:1000) and mode 0700?

I’m not sure of the exact security logic here, but assuming that user:user on the host matches the 1000:1000 mapping, you have now exposed that user/group to the container as part of its map. So while the container is isolated, that particular uid/gid is not, it’s directly mapped through.

Root in the container has privileges over all uids/gids that are available in its namespace, which in this case would also extend to 1000/1000 due to raw.idmap.

Hmm… I was hoping I was doing something wrong. It somewhat blows my carefully laid plans out of the water :slight_smile:

Quite honestly, the observed behaviour is really unexpected.

If I now have a nested mount on the host, e.g. /test/subdir, both owned by 1000:1000 on the host, with the idmap above, but only add a disk device for /test, I see this in the container:

drwxr-x--- 3 1000 1000 3 Jun  2 21:08 /test
drwxr-x--- 3 nobody nogroup 3 Jun  2 21:08 /test/subdir

which make sense. And root in the container can’t write to subdir, which again makes perfect sense.

However, in the container, rmdir /test/subdir does work and umounts /test/subdir on the host. Now that, again, is really really surprising.

I guess I’ll have to check the mount propagation settings…

recursive: true fixes that