Containers do not launch on Linux 5.12

This is on Arch Linux, LXD from repositories (4.13-1).

I cannot get any containers to launch on ‘linux 5.12.1.arch1’.

I switched to 5.10 (linux-lts) and containers work again.

Also reported to Arch tracker: https://bugs.archlinux.org/task/70736

[1] % lxc info --show-log ansible
Name: ansible
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/02/04 12:12 UTC
Status: Stopped
Type: container
Profiles: default

Log:

lxc ansible 20210506122706.465 ERROR    cgfsng - cgroups/cgfsng.c:__cgroup_tree_create:771 - File exists - Creating the final cgroup 13(lxc.payload.ansible) failed
lxc ansible 20210506122706.465 ERROR    cgfsng - cgroups/cgfsng.c:cgroup_tree_create:831 - File exists - Failed to create payload cgroup 13(lxc.payload.ansible)
lxc ansible 20210506122706.465 ERROR    cgfsng - cgroups/cgfsng.c:__cgroup_tree_create:771 - File exists - Creating the final cgroup 13(lxc.payload.ansible-1) failed
lxc ansible 20210506122706.465 ERROR    cgfsng - cgroups/cgfsng.c:cgroup_tree_create:831 - File exists - Failed to create payload cgroup 13(lxc.payload.ansible-1)
lxc ansible 20210506122706.465 ERROR    cgfsng - cgroups/cgfsng.c:__cgroup_tree_create:771 - File exists - Creating the final cgroup 13(lxc.payload.ansible-2) failed
lxc ansible 20210506122706.465 ERROR    cgfsng - cgroups/cgfsng.c:cgroup_tree_create:831 - File exists - Failed to create payload cgroup 13(lxc.payload.ansible-2)
lxc ansible 20210506122706.478 ERROR    conf - conf.c:lxc_map_ids:3094 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Operation not permitted": newuidmap 10179 65536 0 1 0 100000 65536
lxc ansible 20210506122706.478 ERROR    conf - conf.c:userns_exec_1:4444 - Error setting up {g,u}id mappings for child process "10179"
lxc ansible 20210506122706.478 ERROR    cgfsng - cgroups/cgfsng.c:cgfsng_chown:1395 - No such file or directory - Error requesting cgroup chown in new user namespace
lxc ansible 20210506122706.478 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:868 - Received container state "ABORTING" instead of "RUNNING"
lxc ansible 20210506122706.479 ERROR    start - start.c:__lxc_start:2073 - Failed to spawn container "ansible"
lxc ansible 20210506122706.479 WARN     start - start.c:lxc_abort:1016 - No such process - Failed to send SIGKILL via pidfd 20 for process 10175
lxc ansible 20210506122706.505 ERROR    conf - conf.c:lxc_map_ids:3094 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Operation not permitted": newuidmap 10188 65536 0 1 0 100000 65536
lxc ansible 20210506122706.505 ERROR    conf - conf.c:userns_exec_1:4444 - Error setting up {g,u}id mappings for child process "10188"
lxc ansible 20210506122706.505 WARN     cgfsng - cgroups/cgfsng.c:cgfsng_payload_destroy:559 - No such file or directory - Failed to destroy cgroups
lxc 20210506122706.538 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:207 - Connection reset by peer - Failed to receive response
1 Like

@brauner

The error is coming from newuidmap and newgidmap which anre’t LXC tools but are coming from the shadow project. It may be worth testing them in isolation to confirm the issue is in the kernel or with those tools.

Can you show me the containers config, please?

architecture: x86_64
config:
  image.architecture: amd64
  image.description: Archlinux current amd64 (20181217_01:27)
  image.os: Archlinux
  image.release: current
  image.serial: "20181217_01:27"
  security.privileged: "false"
  volatile.base_image: b69318d5ed7f3748f2da516c4b04fd975fc5b6f2831a859d1428603c09be90c5
  volatile.eth0.host_name: vethdbbcadb6
  volatile.eth0.hwaddr: 00:16:3e:19:de:b0
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: 5ff6a31a-61bf-44d7-9278-734ec721065a
devices: {}
ephemeral: false
profiles:
- default
stateful: false
description: ""
  • cat /etc/subuid
  • cat /etc/subgid
  • cat /var/log/lxd/ansible/lxc.conf
  • root:100000:65536
  • root:100000:65536
lxc.log.file = /var/log/lxd/ansible/lxc.log
lxc.log.level = warn
lxc.console.buffer.size = auto
lxc.console.size = auto
lxc.console.logfile = /var/log/lxd/ansible/console.log
lxc.mount.auto = proc:rw sys:rw cgroup:rw:force
lxc.autodev = 1
lxc.pty.max = 1024
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file,optional 0 0
lxc.mount.entry = /dev/net/tun dev/net/tun none bind,create=file,optional 0 0
lxc.mount.entry = /proc/sys/fs/binfmt_misc proc/sys/fs/binfmt_misc none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/firmware/efi/efivars sys/firmware/efi/efivars none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/fs/fuse/connections sys/fs/fuse/connections none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/fs/pstore sys/fs/pstore none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/config sys/kernel/config none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/security sys/kernel/security none rbind,create=dir,optional 0 0
lxc.mount.entry = /sys/kernel/tracing sys/kernel/tracing none rbind,create=dir,optional 0 0
lxc.mount.entry = /dev/mqueue dev/mqueue none rbind,create=dir,optional 0 0
lxc.include = /usr/share/lxc/config/common.conf.d/
lxc.arch = linux64
lxc.hook.version = 1
lxc.hook.pre-start = /proc/1248/exe callhook /var/lib/lxd "default" "ansible" start
lxc.hook.stop = /usr/bin/lxd callhook /var/lib/lxd "default" "ansible" stopns
lxc.hook.post-stop = /usr/bin/lxd callhook /var/lib/lxd "default" "ansible" stop
lxc.tty.max = 0
lxc.uts.name = ansible
lxc.mount.entry = /var/lib/lxd/devlxd dev/lxd none bind,create=dir 0 0
lxc.seccomp.profile = /var/lib/lxd/security/seccomp/ansible
lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536
lxc.mount.auto = shmounts:/var/lib/lxd/shmounts/ansible:/dev/.lxd-mounts
lxc.net.0.type = phys
lxc.net.0.name = eth0
lxc.net.0.flags = up
lxc.net.0.link = veth29b0ba61
lxc.rootfs.path = dir:/var/lib/lxd/containers/ansible/rootfs

Can you show:

  • ls -lh $(which newuidmap)
  • ls -lh $(which newgidmap)

Ok, I see the issue. Due to a kernel security issue we had to restrict mapping host uid 0 in a user namespace. To do this we require the caller to have CAP_SETFCAP. We can fix this in LXC itself most likely but we should also probably mention on the shadow repo that newuidmap needs to have CAP_SETFCAP set in addition to CAP_SETUID.

LXC does this in a few places because it’s trying to be clever so I’ll get you a fix into LXC too.

Ah, so arch doesn’t ship newuidmap/newgidmap as setuid but instead use fscaps and are missing CAP_SETFCAP on it?

Yes.

I’ll do a bug report downstream in Arch for shadow and test this.

Link to the security issue for context :)?

Can someone switch lxd into debug and verbose mode and get me log full trace log for the container?

I think this is what you wanted
https://haste.rys.pw/raw/imulimizex

Thank you!
Can you please also give me

  • ls -lh $(which newuidmap)
  • ls -lh $(which newgidmap)

that @stgraber requested?

.rwxr-xr-x root root 36 KB Mon Sep  7 15:42:01 2020  /usr/bin/newuidmap
.rwxr-xr-x root root 40 KB Mon Sep  7 15:42:01 2020  /usr/bin/newgidmap

Ah I thought it was not needed anymore

No worries. I just don’t understand the issue rn. I thought I did but I manage to start containers on 5.12 with newuidmap only with CAP_SETUID and CAP_SETGID set just fine and with your exact idmapping.

Ah, ok this is a pure unified cgroup layout. /me goes to check whether that makes a difference. It might.

Hm, no. This is odd.

Is there any interesting dmesg output?