Unable to add lxc rootfs mount options (context=)

Hi,

I tried a lot of things but failed so far to add additional rootfs mount options when starting a lxc container.

I am developing a SELinux module for incus/incusd.

I can set for example

raw.lxc: |
    lxc.selinux.context = system_u:system_r:spc_t:s0:c100

which works so far, so the container process runs in this domain.

But the rootfs and all files below it are labeled unlabeled_t which is bad. What other container systems do today is to add a mount option for the rootfs like context=system_u:system_r:container_file_t:s0:c100.

Is there any way to do that in current version (6.1)?

I am using the ZFS storage driver.

What I tried already:

  • lxc.rootfs.options
raw.lxc: |
    lxc.rootfs.options = idmap=container,context=system_u:system_r:container_file_t:s0:c100

but this ends up as a second line for lxc.rootfs.options and seems to be ignored

  • lxc.hook.pre-start
raw.lxc: |
    lxc.hook.pre-start = /usr/local/bin/lxc_rootfs_label.sh
#!/usr/bin/env bash
sed -i '/^lxc.rootfs.options/ s/$/,context=system_u:system_r:container_file_t:s0:c100/' "$LXC_CONFIG_FILE"

This hook is being executed, the resulting lxc.conf has context= added as I want it.

But in any case, the result is the same:

+ incus exec first -- ls -lZ /
total 96
lrwxrwxrwx.   1 root   root    system_u:object_r:unlabeled_t:s0              7 Jul  3 07:43 bin -> usr/bin
drwxr-xr-x.   2 root   root    system_u:object_r:unlabeled_t:s0              2 Apr 18  2022 boot
drwxr-xr-x.   8 root   root    system_u:object_r:initrc_state_t:s0         480 Jul  3 20:12 dev
drwxr-xr-x.  62 root   root    system_u:object_r:unlabeled_t:s0            126 Jul  3 07:44 etc
drwxr-xr-x.   3 root   root    system_u:object_r:unlabeled_t:s0              3 Jul  3 07:44 home
lrwxrwxrwx.   1 root   root    system_u:object_r:unlabeled_t:s0              7 Jul  3 07:43 lib -> usr/lib
lrwxrwxrwx.   1 root   root    system_u:object_r:unlabeled_t:s0              9 Jul  3 07:43 lib64 -> usr/lib64
drwxr-xr-x.   2 root   root    system_u:object_r:unlabeled_t:s0              2 Jul  3 07:43 media
drwxr-xr-x.   2 root   root    system_u:object_r:unlabeled_t:s0              2 Jul  3 07:43 mnt
drwxr-xr-x.   2 root   root    system_u:object_r:unlabeled_t:s0              2 Jul  3 07:43 opt
dr-xr-xr-x. 575 nobody nogroup system_u:object_r:proc_t:s0                   0 Jul  3 20:12 proc
drwx------.   2 root   root    system_u:object_r:unlabeled_t:s0              4 Jul  3 07:43 root
drwxr-xr-x.   5 root   root    system_u:object_r:container_tmpfs_t:s0:c100 100 Jul  3 20:12 run
lrwxrwxrwx.   1 root   root    system_u:object_r:unlabeled_t:s0              8 Jul  3 07:43 sbin -> usr/sbin
drwxr-xr-x.   2 root   root    system_u:object_r:unlabeled_t:s0              2 Jul  3 07:43 srv
dr-xr-xr-x.  13 nobody nogroup system_u:object_r:sysfs_t:s0                  0 May 10 13:29 sys
drwxrwxrwt.   2 root   root    system_u:object_r:unlabeled_t:s0              2 Jul  3 07:43 tmp
drwxr-xr-x.  12 root   root    system_u:object_r:unlabeled_t:s0             12 Jul  3 07:43 usr
drwxr-xr-x.  12 root   root    system_u:object_r:unlabeled_t:s0             13 Jul  3 07:43 var
+ ls -lZ /var/lib/incus/containers/first/rootfs/
total 130
lrwxrwxrwx.  1 root root system_u:object_r:unlabeled_t:s0   7  3. Jul 09:43 bin -> usr/bin
drwxr-xr-x.  2 root root system_u:object_r:unlabeled_t:s0   2 18. Apr 2022  boot
drwxr-xr-x.  2 root root system_u:object_r:unlabeled_t:s0   2  3. Jul 09:48 dev
drwxr-xr-x. 62 root root system_u:object_r:unlabeled_t:s0 126  3. Jul 09:44 etc
drwxr-xr-x.  3 root root system_u:object_r:unlabeled_t:s0   3  3. Jul 09:44 home
lrwxrwxrwx.  1 root root system_u:object_r:unlabeled_t:s0   7  3. Jul 09:43 lib -> usr/lib
lrwxrwxrwx.  1 root root system_u:object_r:unlabeled_t:s0   9  3. Jul 09:43 lib64 -> usr/lib64
drwxr-xr-x.  2 root root system_u:object_r:unlabeled_t:s0   2  3. Jul 09:43 media
drwxr-xr-x.  2 root root system_u:object_r:unlabeled_t:s0   2  3. Jul 09:43 mnt
drwxr-xr-x.  2 root root system_u:object_r:unlabeled_t:s0   2  3. Jul 09:43 opt
drwxr-xr-x.  2 root root system_u:object_r:unlabeled_t:s0   2 18. Apr 2022  proc
drwx------.  2 root root system_u:object_r:unlabeled_t:s0   4  3. Jul 09:43 root
drwxr-xr-x.  2 root root system_u:object_r:unlabeled_t:s0   2  3. Jul 09:44 run
lrwxrwxrwx.  1 root root system_u:object_r:unlabeled_t:s0   8  3. Jul 09:43 sbin -> usr/sbin
drwxr-xr-x.  2 root root system_u:object_r:unlabeled_t:s0   2  3. Jul 09:43 srv
drwxr-xr-x.  2 root root system_u:object_r:unlabeled_t:s0   2 18. Apr 2022  sys
drwxrwxrwt.  2 root root system_u:object_r:unlabeled_t:s0   2  3. Jul 09:43 tmp
drwxr-xr-x. 12 root root system_u:object_r:unlabeled_t:s0  12  3. Jul 09:43 usr
drwxr-xr-x. 12 root root system_u:object_r:unlabeled_t:s0  13  3. Jul 09:43 var

Am I missing something?

I am quite new to incus or lxc so there might be something I misunderstood or that I am assuming which is not the case…

Is context an option that you can set on a simple bind-mount or is it an option that’s part of the superblock?

Not 100% sure yet, but with bind-mounts this does not seem to work. Is incus unsing bind mounts?

Effectively, yes. Whether you mount something or specifically ask for a bind-mount, the Linux kernel will reuse an existing superblock if it already exists.

So if all your containers are on the same filesystem, say a dir or btrfs pool, then they will just get a new mount but not a new superblock, which limits what kind of mount options the kernel will apply.
zfs is a bit different in how it works in that regard but I’m also not sure what its SELinux support is like.

One option that should work is using the lvm storage driver or zfs but using it in block mode.
In either of those cases, you’ll be able to set a volume.block.mount_options config key on your pool which can then specify the mount options to pass when the individual volumes get mounted.

As those will be fresh mounts with no existing mounts on the system, you will get to set superblock options.

Hi Stéphane,

thanks for that, I wil give it a try. I am wondering how for example docker or podman is doing it then.

BTW, zfs has SELinux support, so that should be fine.

Do you see any changes in the foreseeable future of incus that might change the fact that it will not work in “non-block” (aka default) mode today? Or is this really a pure Linux thing that cannot be changed or if, only can be done by changing how Linux itself handles those mounts?

Thanks again for helping!
-Marc

Depending on the backend used in Docker/Podman, they may or may not run into the same issue.

The common overlay2 backend results in a dedicated mount per container which assuming overlayfs supports selinux as a mount option, would likely explain why this works per-container.

If using the zfs backend then they’d likely have the same setup as Incus can do.

And then if using the vfs backend, they’d also be dealing with bind-mounts so likely getting the exact same problem you have currently.

So yeah, at the end of the day, not being able to alter the default SELinux context on a bind-mount would be a Linux issue. I don’t know if this is intentional or not though.

On our end it sounds like we could add automatic handling of SELinux when the container is backed by a block device (lvm, ceph and zfs in block mode) and then for zfs datasets too using the various SELinux context related properties it offers.

Without changes to how Linux handles this, I don’t see a good way to improve things on the dir or btrfs drivers though.

Having support for context in datasets would be nice! There seems to be an issue in zfs with that though: Snapshots do not inherit SELinux Context · Issue #14784 · openzfs/zfs · GitHub - This is about snaphots so I assume it would affect clones as well.

I just found out how to set default volume settings. Will now try with block_mode and report back how far I am getting with this.

Thanks again!

Using block_mode it is working! To make it really usable I think some new instance configuration keys might be worth adding. What do you think?

For containers being able to run confined the processes inside the container will run under a dedicated domain and all files of the rootfs need to be labeled correctly.

To seperate one container from the other, a MLS/MCS SELinux policy is used and every single container will get its own dedicated category or category a range and/or list so some containers can share data.

The process domain context for a container can be set like this:

raw.lxc: |
    lxc.selinux.context = system_u:system_r:spc_t:s0:c100

I think it would be good to have a (more easy) possibilty to set a default context for new containers (like system_u:system_r:spc_t:s0 or any other context) and then this need to be set per container to seperate it from others by adding a category to it (like :c100 or :c10.c20 or :c1,c5)

How can this be done using a single incus profile ... command without using raw.lxc?

To make the rootfs have the correct filesystem labels one needs to set a proper context. Currently this is possible like that (using zfs backend):

incus profile device set default root initial.zfs.block_mode=true
incus profile device set default root initial.block.mount_options=context=system_u:object_r:container_file_t:s0:c100

As for the process domains that needs to be set as a default for new containers as well as for individual containers for seperation by adding a category.

I think some setting/feature, that will automatically set a seperate category for every new container would be very useful, too. Would it be worth creating a feature request issue for that?

Hi Stéphane,

and now I also tried creating a VM on my SELinux enabled host on a ZFS pool. The name of the VM is “vm1”.

I am running into trouble here, because the VM dataset will be mounted under /var/lib/incus/storage-pools/default/virtual-machines/vm1 without a SELinux context applied resuling in unlabeled files which are not accessable:

~# incus launch images:ubuntu/22.04 vm1 --vm --profile default-vm
Launching vm1
Error: Failed instance creation: Failed creating instance from image: Failed to chmod mount directory "/var/lib/incus/storage-pools/default/virtual-machines/vm1" (0100): chmod /var/lib/incus/storage-pools/default/virtual-machines/vm1: permission denied

This is denied, because incusd is not allowed to write to unlabeled dirs/files (unlabeled_t)

To fix this, the dataset needs to have the context property set when it is being created (before mounting).

For example:

zfs set context="system_u:object_r:container_file_t:s0:c100" zdata/incus/virtual-machines/vm1

The context being used should be configurable though.

Any ideas? Maybe having a possibility to have the context in the dataset being set would also make it possible to have containers with block_mode=false at least on zfs/btrfs/(bcachefs?) …

Here are some details about my test-config:

(sysadm_r)@host ~ # incus storage show default
config:
  source: zdata/incus
  volatile.initial_source: zdata/incus
  zfs.pool_name: zdata/incus
description: ""
name: default
driver: zfs

(sysadm_r)@host ~ # incus profile show default-vm
config:
  security.idmap.isolated: "yes"
description: Default Incus profile for VMs (no block_mode)
devices:
  eth0:
    name: eth0
    network: incusbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: default-vm
used_by: []
project: default

I’m not the biggest fan of directly exposing those as config options as that’s likely to confuse users which is never a good thing for a security option :slight_smile:

Are there standard contexts used for this?
It sounds like container_file_t would be fine for container filesystems, is there an equivalent for virtual machines? What does libvirt use for example?

I’d basically prefer us automatically setting what’s the most likely correct context when we find ourselves on a SELinux system and if there’s a strong case for making that customizable, we’d probably just allow it through env variables (SELINUX_CONTEXT_CONTAINER, SELINUX_CONTEXT_VM, …)

For virtual machines when using libvirt, roughly the following contexts are being used:

  • for VM runtime files: svirt_runtime_t,s0-mls_systemhigh
  • for VM content: virt_content_t
  • for VM config files: virt_etc_rw_t
  • default for libvirt files in /var/lib/libvirt is virt_var_lib_t (which will propably be incus_var_lib_t for /var/lib/incus in the future with the dedicated incus-selinux module I am curently developing)
  • for VM images: virt_image_t
  • and maybe some more (please see below)

But I think that will not really cover it, to do it “right”:

Please note that SELinux has different policies types, where the adminstrator has to choose one from. “Simple” policies like strict or targeted do not know anything about sensivity levels or categories (MLS). But when a host runs several containers or VMs, then sVirt comes into play, which uses the SELinux MLS-Policy with categories to segregate virtual guests from one another. This is very desireable and for it to work each individual VM (or container) needs to have an individual value assigned as category which may be randomly generated like sVirt does.

Some details:

There is already code on github.com/opencontainers/selinux to handle SELinux contexts in a container engine which incus maybe can use?

The current file contexts in the virt-module on Gentoo (which uses the reference policy) are like that (-d is for directories, -- is for regular files or any file type if omitted)

HOME_DIR/\.libvirt(/.*)?        gen_context(system_u:object_r:virt_home_t,s0)
HOME_DIR/\.libvirt/qemu(/.*)?   gen_context(system_u:object_r:svirt_home_t,s0)
HOME_DIR/\.virtinst(/.*)?       gen_context(system_u:object_r:virt_home_t,s0)
HOME_DIR/VirtualMachines(/.*)?  gen_context(system_u:object_r:virt_home_t,s0)
HOME_DIR/VirtualMachines/isos(/.*)?     gen_context(system_u:object_r:virt_content_t,s0)

/etc/libvirt    -d      gen_context(system_u:object_r:virt_etc_t,s0)
/etc/libvirt/[^/]*      --      gen_context(system_u:object_r:virt_etc_t,s0)
/etc/libvirt/[^/]*      -d      gen_context(system_u:object_r:virt_etc_rw_t,s0)
/etc/libvirt/.*/.*      gen_context(system_u:object_r:virt_etc_rw_t,s0)

/etc/qemu(/.*)?         gen_context(system_u:object_r:virt_etc_t,s0)

/etc/rc\.d/init\.d/(libvirt-bin|libvirtd)       --      gen_context(system_u:object_r:virtd_initrc_exec_t,s0)

/etc/xen        -d      gen_context(system_u:object_r:virt_etc_t,s0)
/etc/xen/[^/]*  --      gen_context(system_u:object_r:virt_etc_t,s0)
/etc/xen/[^/]*  -d      gen_context(system_u:object_r:virt_etc_rw_t,s0)
/etc/xen/.*/.*  gen_context(system_u:object_r:virt_etc_rw_t,s0)

/usr/lib/libvirt/libvirt_lxc    --      gen_context(system_u:object_r:virtd_lxc_exec_t,s0)
/usr/lib/libvirt/libvirt_leaseshelper   --      gen_context(system_u:object_r:virt_leaseshelper_exec_t,s0)
/usr/lib/qemu/qemu-bridge-helper        --      gen_context(system_u:object_r:virt_bridgehelper_exec_t,s0)

/usr/libexec/libvirt_lxc        --      gen_context(system_u:object_r:virtd_lxc_exec_t,s0)
/usr/libexec/qemu-bridge-helper gen_context(system_u:object_r:virt_bridgehelper_exec_t,s0)
/usr/libexec/libvirt_leaseshelper       --      gen_context(system_u:object_r:virt_leaseshelper_exec_t,s0)

/usr/bin/condor_vm-gahp --      gen_context(system_u:object_r:virtd_exec_t,s0)
/usr/bin/fence_virtd    --      gen_context(system_u:object_r:virsh_exec_t,s0)
/usr/bin/libvirt-qmf    --      gen_context(system_u:object_r:virt_qmf_exec_t,s0)
/usr/bin/libvirtd       --      gen_context(system_u:object_r:virtd_exec_t,s0)
/usr/bin/virsh          --      gen_context(system_u:object_r:virsh_exec_t,s0)
/usr/bin/virtlockd      --      gen_context(system_u:object_r:virtlockd_exec_t,s0)
/usr/bin/virtlogd       --      gen_context(system_u:object_r:virtlogd_exec_t,s0)
/usr/bin/virt-sandbox-service.* --      gen_context(system_u:object_r:virsh_exec_t,s0)

/usr/sbin/condor_vm-gahp        --      gen_context(system_u:object_r:virtd_exec_t,s0)
/usr/sbin/fence_virtd   --      gen_context(system_u:object_r:virsh_exec_t,s0)
/usr/sbin/libvirt-qmf   --      gen_context(system_u:object_r:virt_qmf_exec_t,s0)
/usr/sbin/libvirtd      --      gen_context(system_u:object_r:virtd_exec_t,s0)
/usr/sbin/virtlockd     --      gen_context(system_u:object_r:virtlockd_exec_t,s0)
/usr/sbin/virtlogd      --      gen_context(system_u:object_r:virtlogd_exec_t,s0)

/var/cache/libvirt(/.*)?        gen_context(system_u:object_r:virt_cache_t,s0-mls_systemhigh)

/var/lib/libvirt(/.*)?  gen_context(system_u:object_r:virt_var_lib_t,s0)
/var/lib/libvirt/boot(/.*)?     gen_context(system_u:object_r:virt_content_t,s0)
/var/lib/libvirt/images(/.*)?   gen_context(system_u:object_r:virt_image_t,s0)
/var/lib/libvirt/isos(/.*)?     gen_context(system_u:object_r:virt_content_t,s0)
/var/lib/libvirt/qemu(/.*)?     gen_context(system_u:object_r:svirt_runtime_t,s0-mls_systemhigh)
/var/lib/libvirt/lockd(/.*)?    gen_context(system_u:object_r:virtlockd_var_lib_t,s0)

/var/log/log(/.*)?      gen_context(system_u:object_r:virt_log_t,s0)
/var/log/libvirt(/.*)?  gen_context(system_u:object_r:virt_log_t,s0)
/var/log/vdsm(/.*)?     gen_context(system_u:object_r:virt_log_t,s0)

/var/vdsm(/.*)?         gen_context(system_u:object_r:virt_runtime_t,s0)
/run/libguestfs(/.*)?   gen_context(system_u:object_r:virt_runtime_t,s0)
/run/libvirtd\.pid      --      gen_context(system_u:object_r:virt_runtime_t,s0)
/run/libvirt(/.*)?      gen_context(system_u:object_r:virt_runtime_t,s0)
/run/libvirt/common(/.*)?       gen_context(system_u:object_r:virt_common_runtime_t,s0)
/run/libvirt/lxc(/.*)?  gen_context(system_u:object_r:virtd_lxc_runtime_t,s0)
/run/libvirt-sandbox(/.*)?      gen_context(system_u:object_r:virtd_lxc_runtime_t,s0)
/run/libvirt/qemu(/.*)? gen_context(system_u:object_r:svirt_runtime_t,s0-mls_systemhigh)
/run/libvirt/virtlockd-sock     -s      gen_context(system_u:object_r:virtlockd_run_t,s0)
/run/user/[^/]*/libguestfs(/.*)?        gen_context(system_u:object_r:virt_home_t,s0)
/run/vdsm(/.*)? gen_context(system_u:object_r:virt_runtime_t,s0)
/run/virtlockd\.pid     --      gen_context(system_u:object_r:virtlockd_run_t,s0)

And please note that other distros like RedHat might use slightly different names for type names.