Python LXC and cgroups leaking prevents container start

Mariusz · January 25, 2023, 9:44am

Hi,

I faced off a problem leading to not starting containers while managing LXC 5.0.0 unprivileged containers by Python application using Python LXC library (the latest 3.0.4) on Ubuntu 22.04 (cgroupv2).
I do not observe such behavior using lxc-start/lxc-stop tools.

I’m able to reproduce it quite easily; running python interpreter the following way (from within python venv containing lxc lib installed):

# machinectl shell labuser@ /usr/bin/systemd-run --user --scope -p "Delegate=yes" /path/to/labuser/venv/bin/python3

import lxc
import time

c=lxc.Container("media-runner7tef5")
c.set_config_path("/path/to/labuser/containers")
c.load_config()
# start and stop container in a loop
while True:
	c.start()
	time.sleep(5)
	c.stop()
	time.sleep(5)

Then after a few iterations looking on created cgroups (systemd-cgls tool):

# systemd-cgls -a

Control group /:
-.slice
├─sys-fs-fuse-connections.mount 
├─sys-kernel-config.mount 
├─sys-kernel-debug.mount 
├─dev-mqueue.mount 
├─user.slice 
│ ├─user-131.slice 
│ │ └─user@131.service …
│ │   ├─app.slice 
│ │   │ ├─run-r246103e60f794171b073fe036a3c1137.scope 
│ │   │ │ ├─lxc.pivot 
│ │   │ │ └─lxc.monitor.media-runner7tef5 
│ │   │ │   ├─lxc.pivot 
│ │   │ │   └─lxc.monitor.media-runner7tef5 
│ │   │ │     ├─lxc.pivot 
│ │   │ │     └─lxc.monitor.media-runner7tef5 
│ │   │ │       ├─lxc.pivot 
│ │   │ │       └─lxc.monitor.media-runner7tef5 
│ │   │ │         ├─lxc.pivot 
│ │   │ │         └─lxc.monitor.media-runner7tef5 
│ │   │ │           ├─lxc.pivot 
│ │   │ │           └─lxc.monitor.media-runner7tef5 
│ │   │ │             ├─lxc.pivot 
│ │   │ │             └─lxc.monitor.media-runner7tef5 
│ │   │ │               ├─lxc.pivot

The number of entries is growing until it’s not possible to start any container anymore from this certain interpreter process.
While I’m still able to run the same container at the same time by lxc-start command.
After restarting interpreter process it’s functioning properly again (until the number of cgroups grow to reach again some limit).

Enabling logs I could find:

Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_start:2188 - Doing lxc_start
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: apparmor - lsm/apparmor.c:lsm_apparmor_ops_init:1275 - Per-container AppArmor profiles are disabled because the mac_admin capability is missing
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: lsm - lsm/lsm.c:lsm_init_static:38 - Initialized LSM security driver AppArmor
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:781 - Initialized LSM
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:486 - Set container state to STARTING
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:489 - No state clients registered
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:787 - Set container state to "STARTING"
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:843 - Set environment variables
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:848 - Ran pre-start hooks
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:setup_signal_fd:373 - Created signal file descriptor 5
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:861 - Set up signal fd
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: cgfsng - cgroups/cgfsng.c:initialize_cgroups:3434 - Cannot allocate memory - Failed to initialize cgroups
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: cgroup - cgroups/cgroup.c:cgroup_init:33 - Bad file descriptor - Failed to initialize cgroup driver
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_init:865 - Failed to initialize cgroup driver
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:__lxc_start:2008 - Failed to initialize container "media-runner7tef5"
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:486 - Set container state to ABORTING
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:489 - No state clients registered
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:486 - Set container state to STOPPING
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_serve_state_clients:489 - No state clients registered
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_end:966 - Closed command socket
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: start - start.c:lxc_end:977 - Set container state to "STOPPED"
Jan 24 21:42:54 ubuntu lxc[102656]: media-runner7tef5: conf - conf.c:run_script_argv:337 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "media-runner7tef5", config section "lxc"

After long research I have no clue where the problem in python library comparing to lxc-start command came from.

Thanks in advance for any help.

tomp · January 26, 2023, 10:00am

Do you see this with LXC 5.0.2 as it had a fix for leaking cgroups when stopping/starting repeatedly that may be what you’re seeing:

Mariusz · April 13, 2023, 7:07pm

I’m really sorry for the long delay but I had a trouble with that check and tried again today.
While 5.0.2 is not available in Ubuntu at all, I tried to run it on Debian Testing.
Unfortunately I’m unable to start container from Python (while the same container is starting well by lxc-start). Tried with python bindings 5.0.0 (shipped with Debian as well as compiled from source) and 3.0.4 (the latest available via pip). No luck.

Is there any way to clean up the leaked cgroup after stopping container? I’m not so familiar with raw cgroups management so I’m asking. So far I failed with developing such workaround.

amikhalitsyn · April 18, 2023, 1:25pm

I believe that you are failing here:

github.com

lxc/lxc/blob/master/src/lxc/cgroups/cgfsng.c#L4079

      
        
            	const char *controllers_use;
            
            
	if (ops->dfd_mnt >= 0)
            		return ret_errno(EBUSY);
            
            
	/*
            	 * I don't see the need for allowing symlinks here. If users want to
            	 * have their hierarchy available in different locations I strongly
            	 * suggest bind-mounts.
            	 */
            	dfd = open_at(-EBADF, DEFAULT_CGROUP_MOUNTPOINT,
            			PROTECT_OPATH_DIRECTORY, PROTECT_LOOKUP_ABSOLUTE_XDEV, 0);
            	if (dfd < 0)
            		return syserror("Failed to open " DEFAULT_CGROUP_MOUNTPOINT);
            
            
	controllers_use = lxc_global_config_value("lxc.cgroup.use");
            	if (controllers_use) {
            		__do_free char *dup = NULL;
            		char *it;
            
            
		dup = strdup(controllers_use);

You said that you’ve tried to start the same container with lxc-start and it’s working fine. But as I can see from your first post, you are running your python application from machinectl shell ... container environment, right? Have you tried to run lxc-start from the same machinectl shell ... environment, or you’ve just run it from the host (without any machinectl ... tricks)?

Mariusz · April 18, 2023, 6:08pm

I’m running all tests via exactly the same way.
Both lxc-start as well as from python interpreter by the same way via horrible systemd stupidity.

There is no known way (at least for me) to run unprivileged container without using machinectl/systemd-run on cgroupv2 system. So I did not try while writing my posts.

Bye the way I’ve already created topic about this strong systemd dependency, but there was no one interesting in discussion. Maybe I’m the only one who sees it as a problem.

Finally I faced off the problem starting one-shot subprocess responsible to only boot container once and exit. Thanks to that systemd-cgls is unaware of any leaked cgroups and problem doesn’t happen for main process anymore.

amikhalitsyn · April 18, 2023, 8:23pm

you are not only one who sees this problem, but unfortunately we can’t do anything with it. As it’s the systemd side thing. systemd is responsible for cgroup management and if you want to run container from an unprivileged user then you need to ask systemd to allocate an empty cgroup for you.

Couldn’t you elaborate this? What particularly have you changed in your python code and how are you running the whole thing now?

Mariusz · April 19, 2023, 6:27am

Fortunately no big invention was needed.
Simply instead of doing:

import lxc

lxc_container = lxc.Container(name)
lxc_container.load_config()

lxc_container.start()

I’m doing more or less:

import lxc
from concurrent.futures import ProcessPoolExecutor

def start_lxc_container(name: str) -> bool:
    lxc_container = lxc.Container(name)
    lxc_container.load_config()

    return lxc_container.start()

def start_worker_instance(name: str) -> bool:
    # Start container by temporary process in order to avoid
    # producing leaked cgroups accounted against main process.
    loop = asyncio.get_running_loop()
    with ProcessPoolExecutor(max_workers=1) as pool:
         return await loop.run_in_executor(pool, start_lxc_container, name)

start_worker_instance(name="test1")

It’s not the real code but should explain how this trivial workaround is working.