Live configuration update for unix-char devices

Hello

I ran into an issue with permissions while adding a character device to a running container.

The device is configured in a profile. When I add the profile to a running container, accessing the device only works partially. Depending on how I invoke a command that uses the device file, either the command can access the device, or opening the device file results in error EPERM.

For example lxc exec <container> -- sudo -u <user> <command> works, while ssh <user>@<container-ip> -- <command> results in EPERM.

If I add the profile before starting the container (or restart the container after adding it), accessing the device always works as expected.

Am I expected to add the device before starting the container, or did I miss something in the configuration? Any hints are highly appreciated.

Can you give a more complete reproducer as well as tell us what version of LXD you’re using?

Adding it or changing it in the profile should work fine, but we may have an issue with updating cgroup configuration causing the EPERM, more details would be appreciated.

We are on LXD version 3.0.1.
Container onfiguration:

architecture: x86_64
config:
  boot.autostart: "false"
  image.architecture: amd64
  image.description: customized Debian 8 jessie (20181001T1247Z)
  image.os: Debian
  raw.lxc: lxc.apparmor.profile=unconfined
  security.privileged: "true"
  # [...] # volatile...
devices:
  dev-EtherCAT0:
    gid: "997"
    mode: "0664"
    path: /dev/EtherCAT0
    type: unix-char
  # [...] # eth0, graphics, homedir, root, x11-unix
ephemeral: false
profiles:
- [...]
- dev-EtherCAT0_profile
stateful: false
description: ""

The device in question is /dev/EtherCAT0. It is created by this kernel module. So reproducibility is somewhat limited.

I tried to reproduce it with a cloned /dev/zero (mknod /dev/foo c 1 5) with the same access rights as the device in question, but that worked as expected with no issue.

Ok, so I’d say to check that:

  • Permissions on the device itself look correct
  • Permissions on the /var/lib/lxd/devices/... file also look correct
  • Device is listed in /sys/fs/cgroup/devices/devices.allow inside the container

If all 3 look good, then the error you’re getting is most likely coming from the kernel module itself.

I think the problem is that the devices cgroup does only propagate “deny” changes down the hierarchy, but not allow (and even then only making sure you never have more permissions lower in the hierarchy). Since the container is already running, there is {system,user}.slice that both miss the permission for the device.

I assume that lxc exec runs inside the containers root cgroup while ssh runs inside one of the cgroups that do not have this permission. Durign the reboot case, the containers root cgroup will have the device permission set from the start, all groups created later will inherit the entries.

Oh yeah, that’d totally explain it :slight_smile:

Sorry, I didn’t see your message until now.

Permissions:

<host>$ ls -l /dev/EtherCAT
crw-rw-r-- 1 root realtime 245, 0 Oct 31 07:58 /dev/EtherCAT0

<container>$ ls -l /dev/EtherCAT0
crw-rw-r-- 1 root realtime 245, 0 Oct 31 06:58 /dev/EtherCAT0

Groups realtime on host and container are different.

<host>$ sudo ls -l /var/lib/lxd/devices/<container>/
crw-rw-r-- 1 root            997 245,   0 Oct 31 07:58 unix.dev-EtherCAT0.dev-EtherCAT0

GID 997 corresponds to group realtime in the container.

So far, everything looks as I would expect it, but I don’t know how to see if the device is listed in devices.allow:

<container>$ ls -l /sys/fs/cgroup/devices/
[...]
--w-------  1 root root 0 Oct 31 06:58 devices.allow
--w-------  1 root root 0 Oct 31 06:57 devices.deny
-r--r--r--  1 root root 0 Oct 31 07:06 devices.list
[...]

I would guess devices.allow is 0200 because of raw.lxc: lxc.apparmor.profile=unconfined?

I just check devices.list in /sys/fs/cgroup/devices/ and /sys/fs/cgroup/devices/{system,user}.slice/. And indeed the device is listed in /sys/fs/cgroup/devices/devices.list but not in {system,user}.slice/devices.list. But it is listed after restarting the container.
Would that confirm @eraserix explanation?

Yeah, it would… this lack of propagation is a bit annoying…
It’s always kinda hard to know what the user actually expects in those sceneraios, what you want here (propagation to all child cgroups) may well be considered a security issue by others…

@brauner anything we can do to make this suck less?

This is the cgroup inside the container? Not really I’d say from LXC’s perspective. If it doesn’t propagate to already created device cgroups than that’s likely a kernel thing. But deny settings do propagate? Hm, that sounds intentional such that you don’t accidently propagate additional device permissions per default to unprivileged users.

Deny will only propagate in the following way (excerpt from cgroup-v1/devices.txt):

device cgroups maintain hierarchy by making sure a cgroup never has more
access permissions than its parent. Every time an entry is written to
a cgroup’s devices.deny file, all its children will have that entry removed
from their whitelist and all the locally set whitelist entries will be
re-evaluated.