Unable to start any container in the cluster after upgrading to 7.0 and Ubuntu RPI kernel

tregubovav · May 16, 2026, 9:46pm

My customer installed several updates at once and got cluster that can’t start any containers. The instance startup log looks like:

lxc [instance_name] 20260516213511.413 WARN     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgroup_tree_create:550 - File exists - Failed to create monitor cgroup 19(lxc.monitor.infra_dhcp-07-1)
lxc [instance_name] 20260516213511.616 ERROR    cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_enter:1590 - No space left on device - Failed to enter cgroup 33
lxc [instance_name] 20260516213511.617 ERROR    start - ../src/lxc/start.c:\__lxc_start:2235 - Failed to enter monitor cgroup
lxc [instance_name] 20260516213511.619 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:837 - Received container state “ABORTING” instead of “RUNNING”
lxc [instance_name] 20260516213511.247 WARN     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_payload_destroy:422 - Uninitialized limit cgroup
lxc [instance_name] 20260516213511.266 WARN     cgfsng - ../src/lxc/cgroups/cgfsng.c:cgfsng_monitor_destroy:658 - No space left on device - Failed to move monitor 1792355 to “lxc.pivot”

The cluster is build on the RPI4 SBCs with the Ubuntu 24.04 host OS. Cluster uses CEPH storages (microceph cluster running on the same nodes)..
Last updates installed are:

microceph updated to 19.2.3 (stable)
incus 7.0
latest Ubuntu updates.

Host filesystems and CEPH RDBs and FS have a lot of free space.

I have no idea what is causing container starts failures. I would appreciate any suggestions that help me resolving this issue.

stgraber · May 16, 2026, 9:47pm

The no space left on device may be pointing towards a completely flooded /sys/fs/cgroup

tregubovav · May 16, 2026, 9:50pm

Thank you Stephane for a quick reply.
How to check if it flooded? How to identify who does it?

stgraber · May 16, 2026, 9:53pm

I’d start by cleaning things up, LXC has logic to create the next cgroup when it gets a name conflict, up to some limit. When it reaches that limit, it gets you the error you’re getting.

tregubovav · May 16, 2026, 9:55pm

Just to be sure. Cluster worked without any issues and nothing was installed/updated until the incident. The changed things are:

Linux Kernel
microceph snap
Incus version bump.

tregubovav · May 17, 2026, 1:19am

The issue was in the RPI kernel command-line (cmdline.txt).
The nodes used cgroup_enable=cpuset cgroup_enable=memory cgroup_memory=1 systemd.unified_cgroup_hierarchy=1 since the cluster was build a few years ago. That means all that time the cluster supported the cgroupv1 only. All worked fine, until last update.
Issue was fixed with updating parameter to systemd.unified_cgroup_hierarchy=1 (switching to cgroupv2).