Lxc.cgroup.pattern is not being honored

Mrinal_Dhillon · February 27, 2020, 5:39am

Hi I am upgrading from lxc 3.0.2 to master branch (preparing for lxc 4.0) , please advice how to set system wide cgroup limits using lxc.cgroup.pattern.

System Details:
yocto morty, linux v4.9, cgroup v1, lxc 3.2.1+master-bf04c8508dce0c68ea9a98ee6ac1fa01dc2f56a2

/etc/lxc/lxc.conf based on lxc 3.0.2
// global system wide cgroup limit are applied on mylxc
lxc.cgroup.pattern = mylxc/%n
// l3mdev cgroup does not allow nesting, so it is excluded.
lxc.cgroup.use = blkio,cpu,cpuacct,debug,devices,freezer,net_cls,memory,name=systemd

Issue:
lxc-execute ub – bash

On host I see root lxc.cgroup.pattern is not being honored:
/sys/fs/cgroup/memory/lxc.payload.ub/
I expected it to be /sys/fs/cgroup/memory/mylxc/lxc.payload.ub
cgroups are not visible inside container.
root@ub:/# ls -l /sys/fs/cgroup/
total 0

What are lxc.payload lxc.monitor and lxc.pivot ?

Thanks
Mrinal

stgraber · February 27, 2020, 8:23pm

I believe it’s a per-container config key, not a system config key.

So you’d want to set it in default.conf or in individual container configuration files.

Mrinal_Dhillon · February 27, 2020, 8:31pm

Default.conf is per container. In lxc man pages there is lxc.system.conf man page that explains lxc.cgroup.pattern. For us to enforce system wide limit on all containers . I.e. all containers combined resources for example don’t consume beyond 4 gb memory then we need a top level cgroup to enforce this limit. I.e. sys/fs/memory/containers/lxc.payload.ub
Here global total limit on all containers will be set on containers/memory.limit_in_bytes . For this we lxc.cgoup.pattern=containers/lxc/%n.

stgraber · February 27, 2020, 8:35pm

Oh right, confused pattern and dir (or whatever the new one is called).

@brauner any idea why pattern wouldn’t work anymore?

brauner · February 27, 2020, 9:36pm

I think I deliberately deprecated it since we have a per-container setting now. But I can re-introduce it no problem.

brauner · February 27, 2020, 10:04pm

Create this real quick:

Needs to survive test-suite and could use some manual testing too.

Mrinal_Dhillon · February 27, 2020, 10:27pm

Great! Will cherry-pick and test on my setup. Was this deprecated in lxc 3.0.4 release? Or just for lxc 4.0.

Mrinal_Dhillon · February 27, 2020, 10:40pm

One more issue :
Why are cgroups are not visible inside container?
lxc-execute ub – bash
root@ub:/# ls -l /sys/fs/cgroup/
total 0

Is there any doc or commit comments that explain changes to cgroup infra i.e. lxc.payload lxc.monitor and lxc.pivot ?

stgraber · February 28, 2020, 11:01am

Did you configure lxc.mount.auto to setup a cgroup mount for you?

In most cases we prefer the containers to handle that through their init system as that’s what a normal Linux system does, but there should still be optoins to force liblxc to setup mounts for you I think.

brauner · February 28, 2020, 11:18am

You need these two prs:

brauner · February 28, 2020, 11:19am

As @stgraber said, we usually leave that for the init system. If your init system doesn’t mount them automatically but you want them to be mounted anyway you need to set:

lxc.mount.auto = cgroup:rw:force

Which will tell LXC to mount them for you.

brauner · February 28, 2020, 11:38am

Not yet, but once we release 4.0 we might add a section explaining this explaining the logic behind that. In short, this was made necessary by the new cgroup filesystem version aka cgroup2. It has a completely different delegation/ownership model then the legacy cgroup filesystem. One of the most daunting - yet understandable - restrictions is that only leaf nodes of a cgroup tree can container live processes. That means you can’t have a hierarchy where the [lxc monitor] process sits above the container’s cgroup and supervises it because that would violate the no-live-processes-in-non-leaf-nodes restriction. It means the [lxc monitor] and the container need to be on the same level in the cgroup tree. This way they can both have live processes in them or even have subtrees. So lxc.monitor.<container-name> is the monitor’s cgroup and lxc.payload.<container-name> is the container’s cgroup.

In any case, while it is tempting and understandable to make assumptions about the on-disk cgroup layout of a container I would strongly recommend against this. We can’t make any promises as to how the on-disk layout needs to look like. That is dictated more by the cgroup filesystem itself then LXC. We really have not choice then to do it this way. The good news is that this flattens the cgroup tree quite a bit which is great because scheduling costs, moving processes around, creating new subcgroups is way more expensive the further you go down a cgroup hierarchy since a global semaphore needs to be taken whenever something like this happens.

Mrinal_Dhillon · February 28, 2020, 10:54pm

lxc.cgroup.pattern patch works as expected and lxc.mount.auto = cgroup:rw:force mounts cgroups but lxc.mount.auto = cgroup:mixed which is inherited by default through /usr/share/lxc/config/common.conf is not able to mount cgroups . Same issue even if i directly set it in containers config. Is this expected. AFAIK this lxc.mount.auto = cgroup:mixed does cgroup mounts in lxc 3.0.2 .

brauner · February 29, 2020, 2:38am

When cgroup:mixed is set and LXC detects that cgroup namespaces are supported (which they are on a 4.9 kernel) it will setup cgroups for the container as requested by the user but it will not mount them and leave this up to the init system of the container. So if your init system doesn’t mount cgroups by default then you won’t have them mounted. Setting the force option is a way to tell LXC to always mount them.