On openSUSE host LXD containers do not mount host cgroup on /sys/fs/cgroup

On openSUSE Tumbleweed hosts running openSUSE packaged LXD 4.21 (not snap), LXC containers do not mount the host cgroup on /sys/fs/cgroup. What parts of the openSUSE host system might be causing this?

# a001: alpine, f001: fedora, s001: opensuse, u001: ubuntu
host $ lxc exec a001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec f001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec s001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec u001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)

The initial observable effect is that no IPv4 DHCP address is obtained:

host $ lxc list
+------+---------+------+------+-----------+-----------+
| NAME |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+------+---------+------+------+-----------+-----------+
| a001 | RUNNING |      |      | CONTAINER | 0         |
+------+---------+------+------+-----------+-----------+
| f001 | RUNNING |      |      | CONTAINER | 0         |
+------+---------+------+------+-----------+-----------+
| s001 | RUNNING |      |      | CONTAINER | 0         |
+------+---------+------+------+-----------+-----------+
| u001 | RUNNING |      |      | CONTAINER | 0         |
+------+---------+------+------+-----------+-----------+

openSUSE Tumbleweed systemd runs with AppArmor and unified cgroup heirarchy:

host $ sudo systemctl --version
systemd 249 (249.7+suse.57.g523f32df57)
+PAM +AUDIT +SELINUX +APPARMOR -IMA -SMACK +SECCOMP +GCRYPT +GNUTLS
+OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD
+LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2
+LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT
default-hierarchy=unified

The containers exhibit the same behavior restarted with and without config
raw.lxc "lxc.apparmor.profile=unconfined" and security.privileged true.

Expected Behavior

In a qemu VM on the same host running Ubuntu 21.04,
these images mount cgroup2 and operate as expected:

ubuntu-host-in-qemu $ lxc launch alpine/3.15 a001
ubuntu-host-in-qemu $ lxc launch fedora/35 f001
ubuntu-host-in-qemu $ lxc launch opensuse/tumbleweed s001
ubuntu-host-in-qemu $ lxc launch ubuntu/21.10 u001
ubuntu-host-in-qemu $ lxc list
+------+---------+----------------------+------+-----------+-----------+
| NAME |  STATE  |         IPV4         | IPV6 |   TYPE    | SNAPSHOTS |
+------+---------+----------------------+------+-----------+-----------+
| a001 | RUNNING | 10.38.149.22 (eth0)  |      | CONTAINER | 0         |
+------+---------+----------------------+------+-----------+-----------+
| f001 | RUNNING | 10.38.149.219 (eth0) |      | CONTAINER | 0         |
+------+---------+----------------------+------+-----------+-----------+
| s001 | RUNNING | 10.38.149.132 (eth0) |      | CONTAINER | 0         |
+------+---------+----------------------+------+-----------+-----------+
| u001 | RUNNING | 10.38.149.242 (eth0) |      | CONTAINER | 0         |
+------+---------+----------------------+------+-----------+-----------+

ubuntu-host-in-qemu $ lxc exec s001 -- mount |grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,uid=1000000,gid=1000000)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)

ubuntu-host-in-qemu $ lxc exec f001 -- mount |grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,uid=1000000,gid=1000000)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)

ubuntu-host-in-qemu $ lxc exec u001 -- mount |grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,uid=1000000,gid=1000000)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)

ubuntu-host-in-qemu $ lxc exec a001 -- mount |grep cgroup
(empty)

openSUSE Tumbleweed LXD host information

LXD version is 4.21 packaged (non-snap) for openSUSE Tumbleweed:

lxc version
Client version: 4.21
Server version: 4.21

LXD init uses defaults except to disable IPv6 networking for the moment:

host $ sudo lxd init
Would you like to use LXD clustering? (yes/no) [default=no]:
Do you want to configure a new storage pool? (yes/no) [default=yes]:
Name of the new storage pool [default=default]:
Name of the storage backend to use (btrfs, dir, lvm) [default=btrfs]:
Would you like to create a new btrfs subvolume under /var/lib/lxd? (yes/no) [default=yes]:
Would you like to connect to a MAAS server? (yes/no) [default=no]:
Would you like to create a new local network bridge? (yes/no) [default=yes]:
What should the new bridge be called? [default=lxdbr0]:
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]:
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: none
Would you like the LXD server to be available over the network? (yes/no) [default=no]:
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: yes

The openSUSE Tumbleweed host runs a recent Linux kernel and systemd and cgroup2 is mounted:

host $ uname -a
Linux asus 5.15.12-1-default #1 SMP Wed Dec 29 14:50:16 UTC 2021 (375fcb8) x86_64 x86_64 x86_64 GNU/Linux
host $ mount|grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)

As shown at the top of the post,
containers do not mount cgroup2 as they should:

# a001: alpine, f001: fedora, s001: opensuse, u001: ubuntu
host $ lxc exec a001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec f001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec s001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
host $ lxc exec u001 -- mount |grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)

openSUSE Tumbleweed does have App Armor installed.
The containers exhibit the same behavior restarted with and without raw.lxc App Armor config:

host $ lxc config set a001 raw.lxc "lxc.apparmor.profile=unconfined"
host $ lxc config set f001 raw.lxc "lxc.apparmor.profile=unconfined"
host $ lxc config set s001 raw.lxc "lxc.apparmor.profile=unconfined"
host $ lxc config set u001 raw.lxc "lxc.apparmor.profile=unconfined"
host $ lxc restart --all
sudo dmesg |grep lxc
[    5.225606] audit: type=1400 audit(1641763911.329:8): 
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="lxc-container-default" pid=746 comm="apparmor_parser"
[    5.225610] audit: type=1400 audit(1641763911.329:9):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="lxc-container-default-cgns" pid=746 comm="apparmor_parser"
[    5.225612] audit: type=1400 audit(1641763911.329:10):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="lxc-container-default-with-mounting" pid=746 comm="apparmor_parser"
[    5.225615] audit: type=1400 audit(1641763911.329:11):
apparmor="STATUS" operation="profile_load" profile="unconfined"
name="lxc-container-default-with-nesting" pid=746 comm="apparmor_parser"

Likewise, no change in behavior if running containers with config
lxc config set s001 security.privileged true.

LXD systemd unit starts normally:

host $ systemctl status lxd
lxd.service - LXD Container Hypervisor
     Loaded: loaded (/usr/lib/systemd/system/lxd.service; enabled; vendor preset: disabled)
     Active: active (running) since Fri 2022-01-07 14:00:26 PST; 9h ago
       Docs: man:lxd(1)
    Process: 1571 ExecStartPost=/usr/bin/lxd waitready --timeout=600 (code=exited, status=0/SUCCESS)
   Main PID: 1570 (lxd)
      Tasks: 31
        CPU: 37.964s
     CGroup: /system.slice/lxd.service
             ├─1570 /usr/bin/lxd --group=lxd --logfile=/var/log/lxd/lxd.log
             └─1682 dnsmasq
                    --keep-in-foreground
                    --strict-order
                    --bind-interfaces
                    --except-interface=lo
                    --pid-file=
                    --no-ping
                    --interface=lxdbr0
                    --dhcp-rapid-commit
                    --quiet-dhcp
                    --quiet-dhcp6
                    --quiet-ra
                    --listen-address=10.224.241.1
                    --dhcp-no-override
                    --dhcp-authoritative
                    --dhcp-leasefile=/var/lib/lxd/networks/lxdbr0/dnsmasq.leases
                    --dhcp-hostsfile=/var/lib/lxd/networks/lxdbr0/dnsmasq.hosts
                    --dhcp-range 10.224.241.2,10.224.241.254,1h -s lxd
                    --interface-name _gateway.lxd,lxdbr0 -S /lxd/
                    --conf-file=/var/lib/lxd/networks/lxdbr0/dnsmasq.raw -u nobody -g lxd

The LXD service log shows the following two cgroup2 related warnings on startup:

host $ sudo grep -i cgroup /var/log/lxd/lxd.log
t=2022-01-11T21:16:10-0800 lvl=info msg=" - cgroup layout: cgroup2"
t=2022-01-11T21:16:10-0800 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored"
t=2022-01-11T21:16:10-0800 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"

Network and default profile config:

host $ lxc network show lxdbr0
config:
  ipv4.address: 10.224.241.1/24
  ipv4.nat: "true"
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/a001
- /1.0/instances/f001
- /1.0/instances/s001
- /1.0/instances/u001
- /1.0/profiles/default
managed: true
status: Created
locations:
- none

host $ lxc profile show default
config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: default
used_by:
- /1.0/instances/a001
- /1.0/instances/f001
- /1.0/instances/u001
- /1.0/instances/s001

LXD service startup log:

t=2022-01-11T21:16:10-0800 lvl=info msg="LXD is starting" mode=normal path=/var/lib/lxd version=4.21
t=2022-01-11T21:16:10-0800 lvl=info msg="Kernel uid/gid map:"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - u 0 0 4294967295"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - g 0 0 4294967295"
t=2022-01-11T21:16:10-0800 lvl=info msg="Configured LXD uid/gid map:"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - u 0 400000000 500000001"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - g 0 400000000 500000001"
t=2022-01-11T21:16:10-0800 lvl=info msg="Kernel features:"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - closing multiple file descriptors efficiently: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - pidfds: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - core scheduling: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - uevent injection: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - seccomp listener: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - seccomp listener continue syscalls: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - seccomp listener add file descriptors: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - attach to namespaces via pidfds: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - safe native terminal allocation : yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - unprivileged file capabilities: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - cgroup layout: cgroup2"
t=2022-01-11T21:16:10-0800 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored"
t=2022-01-11T21:16:10-0800 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - shiftfs support: no"
t=2022-01-11T21:16:10-0800 lvl=info msg="- idmapped mounts kernel support: yes"
t=2022-01-11T21:16:10-0800 lvl=info msg="Initializing local database"
t=2022-01-11T21:16:10-0800 lvl=info msg="Set client certificate to server certificate" fingerprint=(snip)
t=2022-01-11T21:16:10-0800 lvl=info msg="Starting database node" address=1 id=1 role=voter
t=2022-01-11T21:16:10-0800 lvl=info msg="Starting /dev/lxd handler:"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - binding devlxd socket" socket=/var/lib/lxd/devlxd/sock
t=2022-01-11T21:16:10-0800 lvl=info msg="REST API daemon:"
t=2022-01-11T21:16:10-0800 lvl=info msg=" - binding Unix socket" socket=/var/lib/lxd/unix.socket
t=2022-01-11T21:16:10-0800 lvl=info msg="Initializing global database"
t=2022-01-11T21:16:10-0800 lvl=info msg="Connecting to global database"
t=2022-01-11T21:16:10-0800 lvl=info msg="Connected to global database"
t=2022-01-11T21:16:10-0800 lvl=info msg="Initialized global database"
t=2022-01-11T21:16:10-0800 lvl=info msg="Firewall loaded driver" driver=nftables
t=2022-01-11T21:16:10-0800 lvl=info msg="Initializing storage pools"
t=2022-01-11T21:16:10-0800 lvl=info msg="Initializing daemon storage mounts"
t=2022-01-11T21:16:10-0800 lvl=info msg="Loading daemon configuration"
t=2022-01-11T21:16:10-0800 lvl=info msg="Initializing networks"
t=2022-01-11T21:16:11-0800 lvl=info msg="Pruning leftover image files"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done pruning leftover image files"
t=2022-01-11T21:16:11-0800 lvl=info msg="Starting device monitor"
t=2022-01-11T21:16:11-0800 lvl=info msg="Started seccomp handler" path=/var/lib/lxd/seccomp.socket
t=2022-01-11T21:16:11-0800 lvl=info msg="Pruning expired images"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done pruning expired images"
t=2022-01-11T21:16:11-0800 lvl=info msg="Pruning expired instance backups"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done pruning expired instance backups"
t=2022-01-11T21:16:11-0800 lvl=info msg="Pruning resolved warnings"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done pruning resolved warnings"
t=2022-01-11T21:16:11-0800 lvl=info msg="Expiring log files"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done expiring log files"
t=2022-01-11T21:16:11-0800 lvl=info msg="Updating instance types"
t=2022-01-11T21:16:11-0800 lvl=info msg="Updating images"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done updating instance types"
t=2022-01-11T21:16:11-0800 lvl=info msg="Done updating images"
t=2022-01-11T21:16:11-0800 lvl=info msg="Daemon started"
t=2022-01-11T21:19:30-0800 lvl=warn msg="Detected poll(POLLNVAL) event."
t=2022-01-11T22:16:11-0800 lvl=info msg="Updating images"
t=2022-01-11T22:16:11-0800 lvl=info msg="Pruning expired instance backups"
t=2022-01-11T22:16:11-0800 lvl=info msg="Done updating images"
t=2022-01-11T22:16:11-0800 lvl=info msg="Done pruning expired instance backups"

Thanks for any suggestions.

Still encountering this problem. Can anyone suggest additional areas I might look for error information related to host cgroup not mounting in containers?

Look at lxc console --show-log NAME for errors from the init system in the container.

Thanks, console is a helpful facility I wasn’t aware of. There are visible errors in the Ubuntu and Fedora container consoles, but none from openSUSE:

[UNSUPP] Starting of Arbitrary Executable Fi…ystem Automount Point unsupported.
...
systemd-journald.service: Attaching egress BPF program to cgroup /sys/fs/cgroup/system.slice/systemd-journald.service failed: Invalid argument

Ubuntu container:

host $ lxc console --show-log u001

Console log:

systemd 248.3-1ubuntu8 running in system mode. (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS -OPENSSL +ACL +BLKID +CURL +ELFUTILS -FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP -LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Ubuntu 21.10!

Queued start job for default target Graphical Interface.
[  OK  ] Created slice system-modprobe.slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[UNSUPP] Starting of Arbitrary Executable Fi…tem Automount Point not supported.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Reached target Local Verity Integrity Protected Volumes.
[  OK  ] Listening on Syslog Socket.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Listening on Network Service Netlink Socket.
[  OK  ] Listening on udev Control Socket.
[  OK  ] Listening on udev Kernel Socket.
[  OK  ] Reached target Sockets.
systemd-journald.service: Attaching egress BPF program to cgroup /sys/fs/cgroup/system.slice/systemd-journald.service failed: Invalid argument
         Starting Journal Service...
         Starting Set the console keyboard layout...
         Starting Load Kernel Module configfs...
         Starting Load Kernel Module drm...
         Starting Load Kernel Module fuse...
         Starting Remount Root and Kernel File Systems...
         Starting Apply Kernel Variables...
modprobe@configfs.service: Deactivated successfully.
[  OK  ] Finished Load Kernel Module configfs.
modprobe@drm.service: Deactivated successfully.
[  OK  ] Finished Load Kernel Module drm.
modprobe@fuse.service: Deactivated successfully.
[  OK  ] Finished Load Kernel Module fuse.
[  OK  ] Finished Remount Root and Kernel File Systems.
         Starting Create System Users...
[  OK  ] Started Journal Service.
[  OK  ] Finished Apply Kernel Variables.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Finished Create System Users.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Finished Create Static Device Nodes in /dev.
         Starting Rule-based Manager for Device Events and Files...
[  OK  ] Finished Set the console keyboard layout.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
         Starting Set console font and keymap...
[  OK  ] Finished Set console font and keymap.
[  OK  ] Finished Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Rule-based Manager for Device Events and Files.
         Starting Network Service...
[  OK  ] Finished Create Volatile Files and Directories.
[  OK  ] Reached target System Time Set.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Finished Update UTMP about System Boot/Shutdown.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Trigger to poll for Ubuntu …(Only enabled on GCP LTS non-pro).
[  OK  ] Started Daily apt download activities.
[  OK  ] Started Daily apt upgrade and clean activities.
[  OK  ] Started Periodic ext4 Online Metadata Check for All Filesystems.
[  OK  ] Started Daily rotation of log files.
[  OK  ] Started Message of the Day.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Started Ubuntu Advantage Timer for running repeated jobs.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Basic System.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Started Regular background program processing daemon.
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Save initial kernel messages after boot.
         Starting Remove Stale Online ext4 Metadata Check Snapshots...
         Starting Dispatcher daemon for systemd-networkd...
         Starting System Logging Service...
         Starting User Login Management...
[  OK  ] Started System Logging Service.
[  OK  ] Finished Remove Stale Online ext4 Metadata Check Snapshots.
[  OK  ] Started User Login Management.
[  OK  ] Started Network Service.
         Starting Network Name Resolution...
[  OK  ] Started Dispatcher daemon for systemd-networkd.
[  OK  ] Started Network Name Resolution.
[  OK  ] Reached target Network.
[  OK  ] Reached target Host and Network Name Lookups.
         Starting Permit User Sessions...
[  OK  ] Finished Permit User Sessions.
[  OK  ] Started Console Getty.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Reached target Login Prompts.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.

Ubuntu 21.10 u001 console

u001 login:

Fedora container:

host $ lxc console --show-log f001

Console log:

systemd v249.7-2.fc35 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Fedora Linux 35 (Container Image)!

Queued start job for default target Graphical Interface.
[  OK  ] Created slice Slice /system/getty.
[  OK  ] Created slice Slice /system/modprobe.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[UNSUPP] Starting of Arbitrary Executable Fi…ystem Automount Point unsupported.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Path Units.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slice Units.
[  OK  ] Reached target Swaps.
[  OK  ] Reached target Local Verity Protected Volumes.
[  OK  ] Listening on Process Core Dump Socket.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Listening on Network Service Netlink Socket.
[  OK  ] Listening on udev Control Socket.
[  OK  ] Listening on udev Kernel Socket.
[  OK  ] Listening on User Database Manager Socket.
         Mounting Temporary Directory /tmp...
         Starting Load Kernel Module configfs...
         Starting Load Kernel Module drm...
         Starting Load Kernel Module fuse...
systemd-journald.service: Attaching egress BPF program to cgroup /sys/fs/cgroup/system.slice/systemd-journald.service failed: Invalid argument
         Starting Journal Service...
         Starting Remount Root and Kernel File Systems...
         Starting Apply Kernel Variables...
         Starting Coldplug All udev Devices...
[  OK  ] Mounted Temporary Directory /tmp.
modprobe@configfs.service: Deactivated successfully.
[  OK  ] Finished Load Kernel Module configfs.
modprobe@drm.service: Deactivated successfully.
[  OK  ] Finished Load Kernel Module drm.
modprobe@fuse.service: Deactivated successfully.
[  OK  ] Finished Load Kernel Module fuse.
[  OK  ] Finished Remount Root and Kernel File Systems.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Finished Apply Kernel Variables.
[  OK  ] Started Journal Service.
[  OK  ] Finished Create Static Device Nodes in /dev.
[  OK  ] Reached target Preparation for Local File Systems.
[  OK  ] Reached target Local File Systems.
         Starting Flush Journal to Persistent Storage...
         Starting Rule-based Manager for Device Events and Files...
[  OK  ] Started Rule-based Manager for Device Events and Files.
         Starting Network Configuration...
[  OK  ] Finished Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
         Starting User Database Manager...
[  OK  ] Finished Create Volatile Files and Directories.
         Starting Record System Boot/Shutdown in UTMP...
[  OK  ] Finished Record System Boot/Shutdown in UTMP.
[  OK  ] Started User Database Manager.
[  OK  ] Finished Coldplug All udev Devices.
[  OK  ] Reached target System Initialization.
[  OK  ] Started dnf makecache --timer.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timer Units.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Socket Units.
[  OK  ] Reached target Basic System.
         Starting Home Area Manager...
         Starting User Login Management...
         Starting D-Bus System Message Bus...
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Home Area Manager.
[  OK  ] Started User Login Management.
[  OK  ] Finished Home Area Activation.
[  OK  ] Started Network Configuration.
         Starting Wait for Network to be Configured...
         Starting Network Name Resolution...
[  OK  ] Started Network Name Resolution.
[  OK  ] Reached target Network.
[  OK  ] Reached target Host and Network Name Lookups.
         Starting Permit User Sessions...
[  OK  ] Finished Permit User Sessions.
[  OK  ] Started Console Getty.
[  OK  ] Reached target Login Prompts.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Record Runlevel Change in UTMP...
[  OK  ] Finished Record Runlevel Change in UTMP.

Fedora Linux 35 (Container Image)
Kernel 5.16.4-1-default on an x86_64 (console)

f001 login:

openSUSE Tumbleweed container:

host $ lxc console --show-log s001

Console log:

systemd 249.7+suse.57.g523f32df57 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR -IMA -SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to openSUSE Tumbleweed!

Queued start job for default target Graphical Interface.
[  OK  ] Created slice Slice /system/getty.
[  OK  ] Created slice Slice /system/modprobe.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slice Units.
[  OK  ] Reached target Swaps.
[  OK  ] Reached target Local Verity Protected Volumes.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
         Mounting Temporary Directory /tmp...
         Starting Load AppArmor profiles...
         Starting Journal Service...
         Starting Remount Root and Kernel File Systems...
         Starting Apply Kernel Variables...
[  OK  ] Mounted Temporary Directory /tmp.
[  OK  ] Finished Load AppArmor profiles.
[  OK  ] Finished Remount Root and Kernel File Systems.
[  OK  ] Finished Apply Kernel Variables.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Finished Create Static Device Nodes in /dev.
[  OK  ] Reached target Preparation for Local File Systems.
[  OK  ] Reached target Local File Systems.
[  OK  ] Finished Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Finished Create Volatile Files and Directories.
         Starting Record System Boot/Shutdown in UTMP...
[  OK  ] Finished Record System Boot/Shutdown in UTMP.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Watch for changes in CA certificates.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Path Units.
[  OK  ] Reached target Timer Units.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Socket Units.
[  OK  ] Reached target Basic System.
[  OK  ] Started D-Bus System Message Bus.
         Starting User Login Management...
         Starting wicked AutoIPv4 supplicant service...
         Starting wicked DHCPv4 supplicant service...
         Starting wicked DHCPv6 supplicant service...
[  OK  ] Started User Login Management.
[  OK  ] Started wicked DHCPv6 supplicant service.
[  OK  ] Started wicked AutoIPv4 supplicant service.
[  OK  ] Started wicked DHCPv4 supplicant service.
         Starting wicked network management service daemon...
[  OK  ] Started wicked network management service daemon.
         Starting wicked network nanny service...
[  OK  ] Started wicked network nanny service.
         Starting wicked managed network interfaces...
[  OK  ] Finished wicked managed network interfaces.
[  OK  ] Reached target Network.
         Starting Permit User Sessions...
[  OK  ] Finished Permit User Sessions.
[  OK  ] Started Console Getty.
[  OK  ] Reached target Login Prompts.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Record Runlevel Change in UTMP...
[  OK  ] Finished Record Runlevel Change in UTMP.

s001 login:

Alpine container:

host $ lxc console --show-log a001

Console log:

Welcome to Alpine Linux 3.15
Kernel 5.16.4-1-default on an x86_64 (/dev/console)

a001 login:

Ok, so reading through the whole thread again, things actually look correct.

If your system uses cgroup2 exclusively like OpenSUSE appears to be doing and like Ubuntu 21.10 or higher does too, then containers cannot get a cgroup1 layout, only cgroup2 can work.

In cgroup2 only, the mount layout is that cgroup2 is directly mounted on /sys/fs/cgroup which seems to match what you’re getting.

The containers appear to boot correctly with no critical error.

What remains is the question of why they’re not getting networking but maybe that’s because in your test you hadn’t yet configured a bridge?

In any case, the original question was around cgroups and the behavior here looks correct. If the host is booted in hybrid mode (cgroup1+cgroup2), then the host will have /sys/fs/cgroup as tmpfs, /sys/fs/cgroup/unified as cgroup2 and /sys/fs/cgroup/XYZ for each controllers as cgroup1. Containers can then either replicate that same layout or can run in cgroup2 only mode or can run in cgroup1 only mode as they wish.

When the host is booted in cgroup2-only mode, /sys/fs/cgroup is a direct cgroup2 mount, there are no cgroup1 mounts and containers are unable to mount anything other than a matching cgroup2 tree on /sys/fs/cgroup.

If you want to go back to the old behavior, most distros will let you boot with systemd.unified_cgroup_hierarchy=0 on the kernel command line which will have systemd go back to the old hybrid cgroup layout.

Thanks for the helpful info Stéphane. I’ll investigate further. To answer your question, yes, there was a bridge set up on the openSUSE host using the lxd init defaults, before containers were created:

The lxdbr0 bridge has an ip address that matches the running lxd config:

host $ ip addr show dev lxdbr0
4: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:2a:37:6c brd ff:ff:ff:ff:ff:ff
    inet 10.224.241.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe2a:376c/64 scope link
       valid_lft forever preferred_lft forever
host $ lxc network show lxdbr0
config:
  ipv4.address: 10.224.241.1/24
  ipv4.nat: "true"
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/a001
- /1.0/instances/f001
- /1.0/instances/s001
- /1.0/instances/u001
- /1.0/profiles/default
managed: true
status: Created
locations:
- none

Do you happen to have Docker running on this system?
We’ve seen some interference with Docker firewall rules in the past, same deal with firewalld which can also get in the way in some configuration.

No docker, podman or kubernetes (k3s) running on the openSUSE host. openSUSE uses firewalld, this host is using defaults. I’ll investigate if there are any references to lxdbr0 among firewalld zones or rules, as well as temporarily disable the firewall and repeat tests.

As info, these are the systemd status of the running openSUSE and Ubuntu containers (neither has acquired an ipv4 or ipv6 address:

host $ lxc exec s001 -- systemctl status
● s001
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Wed 2022-02-09 22:22:24 UTC; 33min ago
   CGroup: /
           ├─.lxc
           │ ├─331 systemctl status
           │ └─332 (pager)
           ├─init.scope
           │ └─1 /sbin/init
           └─system.slice
             ├─wickedd-auto4.service
             │ └─82 /usr/libexec/wicked/bin/wickedd-auto4 --systemd --foreground
             ├─systemd-journald.service
             │ └─67 /usr/lib/systemd/systemd-journald
             ├─wickedd-dhcp4.service
             │ └─83 /usr/libexec/wicked/bin/wickedd-dhcp4 --systemd --foreground
             ├─wickedd-dhcp6.service
             │ └─84 /usr/libexec/wicked/bin/wickedd-dhcp6 --systemd --foreground
             ├─console-getty.service
             │ └─322 /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 linux
             ├─wickedd-nanny.service
             │ └─86 /usr/sbin/wickedd-nanny --systemd --foreground
             ├─dbus.service
             │ └─80 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
             ├─wickedd.service
             │ └─85 /usr/sbin/wickedd --systemd --foreground
             └─systemd-logind.service
               └─81 /usr/lib/systemd/systemd-logind
host $ lxc exec u001 -- systemctl status
● u001
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Wed 2022-02-09 22:22:24 UTC; 39min ago
   CGroup: /
           ├─.lxc
           │ ├─138 systemctl status
           │ └─139 less
           ├─init.scope
           │ └─1 /sbin/init
           └─system.slice
             ├─systemd-networkd.service
             │ └─106 /lib/systemd/systemd-networkd
             ├─systemd-udevd.service
             │ └─101 /lib/systemd/systemd-udevd
             ├─cron.service
             │ └─107 /usr/sbin/cron -f -P
             ├─networkd-dispatcher.service
             │ └─111 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
             ├─systemd-journald.service
             │ └─65 /lib/systemd/systemd-journald
             ├─rsyslog.service
             │ └─112 /usr/sbin/rsyslogd -n -iNONE
             ├─console-getty.service
             │ └─122 /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 linux
             ├─systemd-resolved.service
             │ └─119 /lib/systemd/systemd-resolved
             ├─dbus.service
             │ └─108 @dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
             └─systemd-logind.service
               └─113 /lib/systemd/systemd-logind

host $ lxc list
+------+---------+------+------+-----------+-----------+
| NAME |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+------+---------+------+------+-----------+-----------+
| a001 | RUNNING |      |      | CONTAINER | 0         |
+------+---------+------+------+-----------+-----------+
| f001 | RUNNING |      |      | CONTAINER | 0         |
+------+---------+------+------+-----------+-----------+
| s001 | RUNNING |      |      | CONTAINER | 0         |
+------+---------+------+------+-----------+-----------+
| u001 | RUNNING |      |      | CONTAINER | 0         |
+------+---------+------+------+-----------+-----------+

Thank you for that tip! I disabled firewalld temporarily, and all containers obtained IP addresses:

host $ sudo systemctl stop firewalld
host $ systemctl restart lxd
host $ lxc list
+------+---------+-----------------------+------+-----------+-----------+
| NAME |  STATE  |         IPV4          | IPV6 |   TYPE    | SNAPSHOTS |
+------+---------+-----------------------+------+-----------+-----------+
| a001 | RUNNING | 10.224.241.107 (eth0) |      | CONTAINER | 0         |
+------+---------+-----------------------+------+-----------+-----------+
| f001 | RUNNING | 10.224.241.99 (eth0)  |      | CONTAINER | 0         |
+------+---------+-----------------------+------+-----------+-----------+
| s001 | RUNNING | 10.224.241.55 (eth0)  |      | CONTAINER | 0         |
+------+---------+-----------------------+------+-----------+-----------+
| u001 | RUNNING | 10.224.241.159 (eth0) |      | CONTAINER | 0         |
+------+---------+-----------------------+------+-----------+-----------+

While docker is not presently installed on the host system there is an active docker zone present in the firewalld configuration:

host $ sudo firewall-cmd --list-all-zones
( ... )
docker (active)
  target: ACCEPT
  icmp-block-inversion: no
  interfaces: docker0
  sources:
  services:
  ports:
  protocols:
  forward: no
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

Thanks again for your help. Now I can focus on appropriate firewalld rules for the bridge and resolving the potential for conflict with docker rules. I don’t recall installing docker proper on this machine, but it is possible that firewalld rule was created as part of a previous podman or k3s install (neither are installed presently).

https://linuxcontainers.org/lxd/docs/master/networks/#how-to-let-firewalld-control-the-lxd-s-iptables-rules