Python LXC and cgroups leaking prevents container start

Hi,

I faced off a problem leading to not starting containers while managing LXC 5.0.0 unprivileged containers by Python application using Python LXC library (the latest 3.0.4) on Ubuntu 22.04 (cgroupv2).
I do not observe such behavior using lxc-start/lxc-stop tools.

I’m able to reproduce it quite easily; running python interpreter the following way (from within python venv containing lxc lib installed):

# machinectl shell labuser@ /usr/bin/systemd-run --user --scope -p "Delegate=yes" /path/to/labuser/venv/bin/python3

import lxc
import time

c=lxc.Container("media-runner7tef5")
c.set_config_path("/path/to/labuser/containers")
c.load_config()
# start and stop container in a loop
while True:
	c.start()
	time.sleep(5)
	c.stop()
	time.sleep(5)

Then after a few iterations looking on created cgroups (systemd-cgls tool):

# systemd-cgls -a

Control group /:
-.slice
├─sys-fs-fuse-connections.mount 
├─sys-kernel-config.mount 
├─sys-kernel-debug.mount 
├─dev-mqueue.mount 
├─user.slice 
│ ├─user-131.slice 
│ │ └─user@131.service …
│ │   ├─app.slice 
│ │   │ ├─run-r246103e60f794171b073fe036a3c1137.scope 
│ │   │ │ ├─lxc.pivot 
│ │   │ │ └─lxc.monitor.media-runner7tef5 
│ │   │ │   ├─lxc.pivot 
│ │   │ │   └─lxc.monitor.media-runner7tef5 
│ │   │ │     ├─lxc.pivot 
│ │   │ │     └─lxc.monitor.media-runner7tef5 
│ │   │ │       ├─lxc.pivot 
│ │   │ │       └─lxc.monitor.media-runner7tef5 
│ │   │ │         ├─lxc.pivot 
│ │   │ │         └─lxc.monitor.media-runner7tef5 
│ │   │ │           ├─lxc.pivot 
│ │   │ │           └─lxc.monitor.media-runner7tef5 
│ │   │ │             ├─lxc.pivot 
│ │   │ │             └─lxc.monitor.media-runner7tef5 
│ │   │ │               ├─lxc.pivot 

The number of entries is growing until it’s not possible to start any container anymore from this certain interpreter process.
While I’m still able to run the same container at the same time by lxc-start command.
After restarting interpreter process it’s functioning properly again (until the number of cgroups grow to reach again some limit).

Enabling logs I could find:

Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_start:2188 - Doing lxc_start
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: apparmor - lsm/apparmor.c:lsm_apparmor_ops_init:1275 - Per-container AppArmor profiles are disabled because the mac_admin capability is missing
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: lsm - lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:781 - Initialized LSM
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:486 - Set container state to STARTING
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:489 - No state clients registered
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:787 - Set container state to "STARTING"
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:843 - Set environment variables
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:848 - Ran pre-start hooks
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:setup_signal_fd:373 - Created signal file descriptor 5
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:861 - Set up signal fd
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: cgfsng - cgroups/cgfsng.c:initialize_cgroups:3434 - Cannot allocate memory - Failed to initialize cgroups
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: cgroup - cgroups/cgroup.c:cgroup_init:33 - Bad file descriptor - Failed to initialize cgroup driver
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:865 - Failed to initialize cgroup driver
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:__lxc_start:2008 - Failed to initialize container "media-runner7tef5"
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:486 - Set container state to ABORTING
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:489 - No state clients registered
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:486 - Set container state to STOPPING
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:489 - No state clients registered
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_end:966 - Closed command socket
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_end:977 - Set container state to "STOPPED"
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: conf - conf.c:run_script_argv:337 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "media-runner7tef5", config section "lxc"

After long research I have no clue where the problem in python library comparing to lxc-start command came from.

Thanks in advance for any help.

Do you see this with LXC 5.0.2 as it had a fix for leaking cgroups when stopping/starting repeatedly that may be what you’re seeing: