On openSUSE Tumbleweed hosts running openSUSE packaged LXD 4.21 (not snap), LXC containers do not mount the host cgroup on /sys/fs/cgroup
. What parts of the openSUSE host system might be causing this?
# a001: alpine, f001: fedora, s001: opensuse, u001: ubuntu
host $ lxc exec a001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec f001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec s001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec u001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
The initial observable effect is that no IPv4 DHCP address is obtained:
host $ lxc list
+------+---------+------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+---------+------+------+-----------+-----------+
| a001 | RUNNING | | | CONTAINER | 0 |
+------+---------+------+------+-----------+-----------+
| f001 | RUNNING | | | CONTAINER | 0 |
+------+---------+------+------+-----------+-----------+
| s001 | RUNNING | | | CONTAINER | 0 |
+------+---------+------+------+-----------+-----------+
| u001 | RUNNING | | | CONTAINER | 0 |
+------+---------+------+------+-----------+-----------+
openSUSE Tumbleweed systemd runs with AppArmor and unified cgroup heirarchy:
host $ sudo systemctl --version
systemd 249 (249.7+suse.57.g523f32df57)
+PAM +AUDIT +SELINUX +APPARMOR -IMA -SMACK +SECCOMP +GCRYPT +GNUTLS
+OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD
+LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2
+LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT
default-hierarchy=unified
The containers exhibit the same behavior restarted with and without config
raw.lxc "lxc.apparmor.profile=unconfined"
and security.privileged true
.
Expected Behavior
In a qemu VM on the same host running Ubuntu 21.04,
these images mount cgroup2 and operate as expected:
ubuntu-host-in-qemu $ lxc launch alpine/3.15 a001
ubuntu-host-in-qemu $ lxc launch fedora/35 f001
ubuntu-host-in-qemu $ lxc launch opensuse/tumbleweed s001
ubuntu-host-in-qemu $ lxc launch ubuntu/21.10 u001
ubuntu-host-in-qemu $ lxc list
+------+---------+----------------------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+------+---------+----------------------+------+-----------+-----------+
| a001 | RUNNING | 10.38.149.22 (eth0) | | CONTAINER | 0 |
+------+---------+----------------------+------+-----------+-----------+
| f001 | RUNNING | 10.38.149.219 (eth0) | | CONTAINER | 0 |
+------+---------+----------------------+------+-----------+-----------+
| s001 | RUNNING | 10.38.149.132 (eth0) | | CONTAINER | 0 |
+------+---------+----------------------+------+-----------+-----------+
| u001 | RUNNING | 10.38.149.242 (eth0) | | CONTAINER | 0 |
+------+---------+----------------------+------+-----------+-----------+
ubuntu-host-in-qemu $ lxc exec s001 -- mount |grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,uid=1000000,gid=1000000)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
ubuntu-host-in-qemu $ lxc exec f001 -- mount |grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,uid=1000000,gid=1000000)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
ubuntu-host-in-qemu $ lxc exec u001 -- mount |grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,uid=1000000,gid=1000000)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
ubuntu-host-in-qemu $ lxc exec a001 -- mount |grep cgroup
(empty)
openSUSE Tumbleweed LXD host information
LXD version is 4.21 packaged (non-snap) for openSUSE Tumbleweed:
lxc version
Client version: 4.21
Server version: 4.21
LXD init uses defaults except to disable IPv6 networking for the moment:
host $ sudo lxd init
Would you like to use LXD clustering? (yes/no) [default=no]:
Do you want to configure a new storage pool? (yes/no) [default=yes]:
Name of the new storage pool [default=default]:
Name of the storage backend to use (btrfs, dir, lvm) [default=btrfs]:
Would you like to create a new btrfs subvolume under /var/lib/lxd? (yes/no) [default=yes]:
Would you like to connect to a MAAS server? (yes/no) [default=no]:
Would you like to create a new local network bridge? (yes/no) [default=yes]:
What should the new bridge be called? [default=lxdbr0]:
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]:
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: none
Would you like the LXD server to be available over the network? (yes/no) [default=no]:
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: yes
The openSUSE Tumbleweed host runs a recent Linux kernel and systemd and cgroup2 is mounted:
host $ uname -a
Linux asus 5.15.12-1-default #1 SMP Wed Dec 29 14:50:16 UTC 2021 (375fcb8) x86_64 x86_64 x86_64 GNU/Linux
host $ mount|grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
As shown at the top of the post,
containers do not mount cgroup2 as they should:
# a001: alpine, f001: fedora, s001: opensuse, u001: ubuntu
host $ lxc exec a001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec f001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec s001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec u001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
openSUSE Tumbleweed does have App Armor installed.
The containers exhibit the same behavior restarted with and without raw.lxc
App Armor config:
host $ lxc config set a001 raw.lxc "lxc.apparmor.profile=unconfined"
host $ lxc config set f001 raw.lxc "lxc.apparmor.profile=unconfined"
host $ lxc config set s001 raw.lxc "lxc.apparmor.profile=unconfined"
host $ lxc config set u001 raw.lxc "lxc.apparmor.profile=unconfined"
host $ lxc restart --all
sudo dmesg |grep lxc
[ 5.225606] audit: type=1400 audit(1641763911.329:8):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="lxc-container-default" pid=746 comm="apparmor_parser"
[ 5.225610] audit: type=1400 audit(1641763911.329:9):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="lxc-container-default-cgns" pid=746 comm="apparmor_parser"
[ 5.225612] audit: type=1400 audit(1641763911.329:10):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="lxc-container-default-with-mounting" pid=746 comm="apparmor_parser"
[ 5.225615] audit: type=1400 audit(1641763911.329:11):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="lxc-container-default-with-nesting" pid=746 comm="apparmor_parser"
Likewise, no change in behavior if running containers with config
lxc config set s001 security.privileged true
.
LXD systemd unit starts normally:
host $ systemctl status lxd
lxd.service - LXD Container Hypervisor
Loaded: loaded (/usr/lib/systemd/system/lxd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2022-01-07 14:00:26 PST; 9h ago
Docs: man:lxd(1)
Process: 1571 ExecStartPost=/usr/bin/lxd waitready --timeout=600 (code=exited, status=0/SUCCESS)
Main PID: 1570 (lxd)
Tasks: 31
CPU: 37.964s
CGroup: /system.slice/lxd.service
├─1570 /usr/bin/lxd --group=lxd --logfile=/var/log/lxd/lxd.log
└─1682 dnsmasq
--keep-in-foreground
--strict-order
--bind-interfaces
--except-interface=lo
--pid-file=
--no-ping
--interface=lxdbr0
--dhcp-rapid-commit
--quiet-dhcp
--quiet-dhcp6
--quiet-ra
--listen-address=10.224.241.1
--dhcp-no-override
--dhcp-authoritative
--dhcp-leasefile=/var/lib/lxd/networks/lxdbr0/dnsmasq.leases
--dhcp-hostsfile=/var/lib/lxd/networks/lxdbr0/dnsmasq.hosts
--dhcp-range 10.224.241.2,10.224.241.254,1h -s lxd
--interface-name _gateway.lxd,lxdbr0 -S /lxd/
--conf-file=/var/lib/lxd/networks/lxdbr0/dnsmasq.raw -u nobody -g lxd
The LXD service log shows the following two cgroup2
related warnings on startup:
host $ sudo grep -i cgroup /var/log/lxd/lxd.log
t=2022-01-11T21:16:10-0800 lvl=info msg=" - cgroup layout: cgroup2"
t=2022-01-11T21:16:10-0800 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored"
t=2022-01-11T21:16:10-0800 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
Network and default profile config:
host $ lxc network show lxdbr0
config:
ipv4.address: 10.224.241.1/24
ipv4.nat: "true"
ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/a001
- /1.0/instances/f001
- /1.0/instances/s001
- /1.0/instances/u001
- /1.0/profiles/default
managed: true
status: Created
locations:
- none
host $ lxc profile show default
config: {}
description: Default LXD profile
devices:
eth0:
name: eth0
network: lxdbr0
type: nic
root:
path: /
pool: default
type: disk
name: default
used_by:
- /1.0/instances/a001
- /1.0/instances/f001
- /1.0/instances/u001
- /1.0/instances/s001
LXD service startup log:
t=2022-01-11T21:16:10-0800 lvl=info msg="LXD is starting" mode=normal path=/var/lib/lxd version=4.21
t=2022-01-11T21:16:10-0800 lvl=info msg="Kernel uid/gid map:"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - u 0 0 4294967295"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - g 0 0 4294967295"
t=2022-01-11T21:16:10-0800 lvl=info msg="Configured LXD uid/gid map:"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - u 0 400000000 500000001"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - g 0 400000000 500000001"
t=2022-01-11T21:16:10-0800 lvl=info msg="Kernel features:"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - closing multiple file descriptors efficiently: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - pidfds: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - core scheduling: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - uevent injection: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - seccomp listener: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - seccomp listener continue syscalls: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - seccomp listener add file descriptors: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - attach to namespaces via pidfds: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - safe native terminal allocation : yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - unprivileged file capabilities: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - cgroup layout: cgroup2"
t=2022-01-11T21:16:10-0800 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored"
t=2022-01-11T21:16:10-0800 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - shiftfs support: no"
t=2022-01-11T21:16:10-0800 lvl=info msg="- idmapped mounts kernel support: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg="Initializing local database"
t=2022-01-11T21:16:10-0800 lvl=info msg="Set client certificate to server certificate" fingerprint=(snip)
t=2022-01-11T21:16:10-0800 lvl=info msg="Starting database node" address=1 id=1 role=voter
t=2022-01-11T21:16:10-0800 lvl=info msg="Starting /dev/lxd handler:"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - binding devlxd socket" socket=/var/lib/lxd/devlxd/sock
t=2022-01-11T21:16:10-0800 lvl=info msg="REST API daemon:"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - binding Unix socket" socket=/var/lib/lxd/unix.socket
t=2022-01-11T21:16:10-0800 lvl=info msg="Initializing global database"
t=2022-01-11T21:16:10-0800 lvl=info msg="Connecting to global database"
t=2022-01-11T21:16:10-0800 lvl=info msg="Connected to global database"
t=2022-01-11T21:16:10-0800 lvl=info msg="Initialized global database"
t=2022-01-11T21:16:10-0800 lvl=info msg="Firewall loaded driver" driver=nftables
t=2022-01-11T21:16:10-0800 lvl=info msg="Initializing storage pools"
t=2022-01-11T21:16:10-0800 lvl=info msg="Initializing daemon storage mounts"
t=2022-01-11T21:16:10-0800 lvl=info msg="Loading daemon configuration"
t=2022-01-11T21:16:10-0800 lvl=info msg="Initializing networks"
t=2022-01-11T21:16:11-0800 lvl=info msg="Pruning leftover image files"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done pruning leftover image files"
t=2022-01-11T21:16:11-0800 lvl=info msg="Starting device monitor"
t=2022-01-11T21:16:11-0800 lvl=info msg="Started seccomp handler" path=/var/lib/lxd/seccomp.socket
t=2022-01-11T21:16:11-0800 lvl=info msg="Pruning expired images"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done pruning expired images"
t=2022-01-11T21:16:11-0800 lvl=info msg="Pruning expired instance backups"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done pruning expired instance backups"
t=2022-01-11T21:16:11-0800 lvl=info msg="Pruning resolved warnings"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done pruning resolved warnings"
t=2022-01-11T21:16:11-0800 lvl=info msg="Expiring log files"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done expiring log files"
t=2022-01-11T21:16:11-0800 lvl=info msg="Updating instance types"
t=2022-01-11T21:16:11-0800 lvl=info msg="Updating images"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done updating instance types"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done updating images"
t=2022-01-11T21:16:11-0800 lvl=info msg="Daemon started"
t=2022-01-11T21:19:30-0800 lvl=warn msg="Detected poll(POLLNVAL) event."
t=2022-01-11T22:16:11-0800 lvl=info msg="Updating images"
t=2022-01-11T22:16:11-0800 lvl=info msg="Pruning expired instance backups"
t=2022-01-11T22:16:11-0800 lvl=info msg="Done updating images"
t=2022-01-11T22:16:11-0800 lvl=info msg="Done pruning expired instance backups"
Thanks for any suggestions.