Unprivileged LXD containers cannot use cgroup inside

Background: we are running SLURM (a cluster scheduler) and nvidia dockers inside an unprivileged LXD containers.

Setup:
Host: Ubuntu server 20.04
LXD: 4.0.7
LXD container: ubuntu 20.04
Configuration:
security.nesting: true
security.privileged: false
nvidia.runtime: true

Symptoms:

  1. The SLURM inside the container cannot use cgroup to limit resource consumption of a job. It issues error like “cannot write /sys/fs/cgroup/freezer/xxxx”.
  2. Cannot passthrough GPUs from the LXD container to nvidia docker containers, unless we configure nvidia-container-runtime inside the LXD container with “no-cgroups=true”. The default of nvidia-container-runtime is “no-cgroups=false”. Setting “no-cgroups=true” make GPU passing through from LXD to docker. This evidence that cgroup has some problems within the LXD container.

Questions:

  1. Is it possible for an unprivileged container to use cgroup inside?
  2. If the answer is yes, then how?

Trials to fix the problem:
We tried to run the LXD container still as a unprivileged but map id 0-65535 from the container to 0-65535 of the host. Still it has the cgroup problem. By the way, the idmap hacking allows the root inside the container to change file ownership.

Have you enabled security.nesting on the instance?

@tomp, Yes. I have enabled security.nesting. We can run docker inside the LXD container. However, when we try to pass through GPUs to docker, we must turn set “no-cgroups=true”. The default “no-cgroups=false” does not work.

1 Like

Yes. I have enabled security.nesting. We can run docker inside the LXD container.

However, when we try to pass through GPUs to docker, we must turn set “no-cgroups=true”. The default “no-cgroups=false” does not work.
The issue of SLURM cannot control resources with cgroup still stands.