LXD equivalent of docker --cgroup-parent

Hello,

I have a server running Ubuntu-Server 20.04 with 8 cores. Cores 0-1 (system) are dedicated to os processes and 2-7 (user) are reserved (shielded) for user processes via cset. I would like to provision LXC containers via LXD and have them execute only on the “user” cpuset. With docker, that can be done by passing the option --cgroup-parent=/user (https://docs.docker.com/engine/reference/commandline/dockerd/#miscellaneous-options). What would be the LXD equivalent of that? I tried setting raw.lxc=lxc.cgroup.dir=/user in the configuration of my container but that doesn’t work: when I run:

cat /proc/cpuinfo

inside the container only cpus 0 and 1 show up, which means that the container is running under the system cgroup - and indeed, under /sys/fs/cgroup/cpuset/system/user I do see the monitor and payload of my container. So I tried manually setting the cpuset.cpus of the container group to 2-7 with raw.lxc=lxc.cgroup.cpuset.cpus=2-7 but that of course didn’t work either: the container would just fail to start. The error message in the log is:

> Name: test-container
> Location: none
> Remote: unix://
> Architecture: x86_64
> Created: 2020/12/14 16:26 UTC
> Status: Stopped
> Type: container
> Profiles: default
> 
> Log:
> 
> lxc test-container 20201214192010.525 WARN     cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset/system/lxc.monitor.test-container"
> lxc test-container 20201214192010.526 WARN     cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1152 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset/system/lxc.payload.test-container"
> lxc test-container 20201214192010.581 ERROR    cgfsng - cgroups/cgfsng.c:cgfsng_setup_limits_legacy:2873 - Permission denied - Failed to set "cpuset.cpus" to "2-7"
> lxc test-container 20201214192010.581 ERROR    start - start.c:lxc_spawn:1741 - Failed to setup cgroup limits for container "test-container"
> lxc test-container 20201214192010.581 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:860 - Received container state "ABORTING" instead of "RUNNING"
> lxc test-container 20201214192010.583 ERROR    start - start.c:__lxc_start:1999 - Failed to spawn container "test-container"
> lxc test-container 20201214192010.583 WARN     start - start.c:lxc_abort:1013 - No such process - Failed to send SIGKILL via pidfd 31 for process 168790
> lxc 20201214192010.925 WARN     commands - commands.c:lxc_cmd_rsp_recv:126 - Connection reset by peer - Failed to receive response for command "get_state"

I suspect this has to do with the fact that the lxd daemon is running under system.

What is the best way to achieve that? The only relevant threads which I found are: What is the best way to use numactl or taskset and chrt in lxd which cpus are isolated from the host
How to allocate cores which are in isolcpus list of host
where the only solution seems to be to manually assign the cpus to the containers.

OS: Ubuntu Server 20.04
LXD version: 4.8

I’m not sure if this is what you’re after, but take a look at the CPU limits options available:

https://linuxcontainers.org/lxd/docs/master/instances#cpu-limits, specifically the limits.cpu setting.

That was the first thing that I tried. When I set:

limts.cpu="2-7"

in the configuration, only cores 0-1 are visible, which I do expect to happen since the containers are created under the system cgroup.

btw, this is my first post on this forum, so let me know if you need more detailed info.

@stgraber @brauner is this possible? Thanks

The limits.cpu setting won’t do what you want. What you want is for the container to use the /user cgroup, I assume the following example:

raw.lxc: |-
  lxc.cgroup.relative = 1
  lxc.cgroup.dir = /bla

This would cause the container to create separate cgroups for the monitor process [lxc monitor] and the container’s itself under /sys/fs/cgroup/<controller/user, e.g.:

  1. [lxc monitor]
12:blkio:/user.slice/bla/lxc.monitor.f8
11:perf_event:/bla/lxc.monitor.f8
10:hugetlb:/bla/lxc.monitor.f8
9:devices:/user.slice/bla/lxc.monitor.f8
8:cpu,cpuacct:/user.slice/bla/lxc.monitor.f8
7:freezer:/user/root/0/bla/lxc.monitor.f8
6:memory:/user/root/0/bla/lxc.monitor.f8
5:pids:/user.slice/user-1000.slice/session-1.scope/bla/lxc.monitor.f8
4:cpuset:/bla/lxc.monitor.f8
3:rdma:/bla/lxc.monitor.f8
2:net_cls,net_prio:/bla/lxc.monitor.f8
1:name=systemd:/user/root/0/bla/lxc.monitor.f8
0::/user.slice/user-1000.slice/session-1.scope/bla/lxc.monitor.f8
  1. container
12:blkio:/user.slice/bla/lxc.payload.f8
11:perf_event:/bla/lxc.payload.f8
10:hugetlb:/bla/lxc.payload.f8
9:devices:/user.slice/bla/lxc.payload.f8
8:cpu,cpuacct:/user.slice/bla/lxc.payload.f8
7:freezer:/user/root/0/bla/lxc.payload.f8
6:memory:/user/root/0/bla/lxc.payload.f8
5:pids:/user.slice/user-1000.slice/session-1.scope/bla/lxc.payload.f8
4:cpuset:/bla/lxc.payload.f8
3:rdma:/bla/lxc.payload.f8
2:net_cls,net_prio:/bla/lxc.payload.f8
1:name=systemd:/user/root/0/bla/lxc.payload.f8/init.scope
0::/user.slice/user-1000.slice/session-1.scope/bla/lxc.payload.f8/init.scope
2 Likes

Thanks a lot and sorry for the late reply.

That works (sometimes), but not how I would like to and not quite reliably.

I added the options you specified to the profile used by my containers. Here is what the profile looks like:

config:
  raw.lxc: |-
    lxc.cgroup.relative=1
    lxc.cgroup.dir=/user
description: A profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: base
used_by: []

At the beginning, the containers cgroups were created under /system/user, so I deleted the cgroup /system/user and all of its children, rebooted, and re-applied the cset configuration on startup. This is the ouput of cset set:

giu@server:~$ cset set
cset:
Name CPUs-X MEMs-X Tasks Subs Path


     root        0-7 y       0 y   337    2 /
     user        2-7 n       0 n     0    0 /user
   system        0-1 n       0 n    74    0 /system

Then I spawned a container with the new profile with:

lxc launch ubuntu:focal a-container --profile base

This changed the cset configuration:

cset:
Name CPUs-X MEMs-X Tasks Subs Path


     root        0-7 y       0 y   359    2 /
     user        0-7 n       0 n     0    2 /user
   system        0-1 n       0 n   106    0 /system

And didn’t achieve the desired effect:

giu@server:~$ lxc shell a-container 
root@a-container:~# cat /proc/cpuinfo | grep "core id"
core id		: 0
core id		: 1

So I created another container, this time specifying the option limits.cpu: 2-7. This kinda worked:

root@another-container:~# cat /proc/cpuinfo | grep “core id”
core id : 0
core id : 1
core id : 2
core id : 3
core id : 4
core id : 5
core id : 6
core id : 7

But that’s still not quite what I want as cores 0-1 shouldn’t be used by the container (I specified limits.cpu = 2-7). So I deleted all the containers and tried to re-apply the cset settings with the following script:

cset set --set system --cpu=0-1
cset set --set user --cpu=0-7
cset proc -m -f root -t system -k

The lxc.pivot cgroup under /user would cause that to fail, so I deleted it with

sudo cgdelete cpuset:/user/lxc.pivot

and re-ran the previous script. The cset config was now correct, but all the new containers cgroups would now be created under /system/user again.

Any ideas? Thanks in advance for your help.

P.S.: somehow that work as expected on a similar system (same hardware same OS), but no clue what the difference was.

Can you show the output of:
/proc/<container-init-pid>/cgroup and /proc/<container-monitor-pid>/cgroup, please?