Upgraded to 0.5.1, now container won't start

Hi,

Recently upgraded to Incus 0.5.1, then did a reboot of my machine (for other reasons). Upon restarting, one of my containers no longer starts. Instead, the attempted start reports this error:

Remapping container filesystem
Error: Failed to handle idmapped storage: invalid argument - Failed to change ACLs on /var/lib/incus/storage-pools/default/containers/foo/rootfs/var/lib/tpm2-tss/system/keystore
Try `incus info --show-log foo` for more info

Executing the suggested command results in:

Name: foo
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2023/03/24 16:10 GMT
Last Used: 2024/01/30 08:19 GMT

Log:

There doesn’t appear to be any log. Previously I migrated from LXD to Incus (and since then have performed a few reboots of my machine) and no errors were encountered (the machine has 4 other containers running on it without issues)

No changes have been made to /etc/subgid or /etc/subuid.

Is there anything else I can try?

Thank you.

-=david=-

Hmm, that’s interesting, I wonder what’s special with that particular file, it’s talking about ACLs.
Can you show stat /var/lib/incus/storage-pools/default/containers/foo/rootfs/var/lib/tpm2-tss/system/keystore and getfacl /var/lib/incus/storage-pools/default/containers/foo/rootfs/var/lib/tpm2-tss/system/keystore?

Do you actually use tpm2-tss inside of that container? If not, then you can probably delete that file and try starting the instance again.

Hi,

Thank you for your reply. I removed the directory, then it complained (with the same error message) about the /var/log/journal file. So I removed that too (rm -rf all the things! :slight_smile: ) Finally, it’s giving a different error now:

lxc foo 20240130161943.617 ERROR    conf - ../src/lxc/conf.c:lxc_map_ids:3701 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument": newuidmap 29909 0 1000000 1000 1000 1000 1 1001 1001001 999998999 0 1001000000 1000 1001 1001001001 999998999
lxc foo 20240130161943.617 ERROR    start - ../src/lxc/start.c:lxc_spawn:1788 - Failed to set up id mapping.
lxc foo 20240130161943.617 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc foo 20240130161943.617 ERROR    start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "foo"
lxc foo 20240130161943.617 WARN     start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 17 for process 29909
lxc 20240130161943.635 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240130161943.635 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

Getting closer hopefully!

-=david=-

Yeah, so that one is now about ID maps indeed.

Can you show incus config show --expanded foo as well as cat /etc/subuid and cat /etc/subgid?

Hi,

I destroyed the container, but then created a new one:

incus launch images:archlinux foo

Then I started it:

❯ incus start foo
Error: Failed to run: /opt/incus/bin/incusd forkstart foo /var/lib/incus/containers /run/incus/foo/lxc.conf: exit status 1
Try `incus info --show-log foo` for more info

The log file contained the mapping errors:

Name: foo
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2024/01/30 17:09 GMT
Last Used: 2024/01/30 17:10 GMT

Log:

lxc foo 20240130171041.725 ERROR    conf - ../src/lxc/conf.c:lxc_map_ids:3701 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument": newuidmap 38493 0 1000000 1000000000 0 1001000000 1000000000
lxc foo 20240130171041.725 ERROR    start - ../src/lxc/start.c:lxc_spawn:1788 - Failed to set up id mapping.
lxc foo 20240130171041.725 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc foo 20240130171041.725 ERROR    start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "foo"
lxc foo 20240130171041.725 WARN     start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 17 for process 38493
lxc 20240130171041.742 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240130171041.742 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

The container config is as follows:

❯ incus config show --expanded foo
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Archlinux current amd64 (20240130_04:18)
  image.os: Archlinux
  image.release: current
  image.requirements.secureboot: "false"
  image.serial: "20240130_04:18"
  image.type: squashfs
  image.variant: default
  security.nesting: "true"
  user.vendor-data: |
    #cloud-config
    locale: en_GB.UTF-8
    timezone: Europe/London
  volatile.base_image: 5515a8aa7bd0e66cde9589ce9b3f075e1b780fa18db80b33886f61de5e583fcb
  volatile.cloud-init.instance-id: caa02e06-9469-44d6-9392-c8c7833eb281
  volatile.eth0.host_name: vethc1d9fd60
  volatile.eth0.hwaddr: 00:16:3e:6e:ff:e2
  volatile.eth0.name: eth0
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":true,"Isgid":false,"Hostid":1001000000,"Nsid":0,"Maprange":1000000000},{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":true,"Isgid":false,"Hostid":1001000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1001000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1001000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":true,"Isgid":false,"Hostid":1001000000,"Nsid":0,"Maprange":1000000000},{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":true,"Isgid":false,"Hostid":1001000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1001000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1001000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: STOPPED
  volatile.last_state.ready: "false"
  volatile.uuid: 945fc8f7-3377-46e3-a7d7-4d1928a8157d
  volatile.uuid.generation: 945fc8f7-3377-46e3-a7d7-4d1928a8157d
devices:
  eth0:
    nictype: bridged
    parent: bridge0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

subuid contains:

❯ cat subuid
root:1000000:1000000000
david:165536:65536
david:1000:1
root:1001000000:1000000000
root:1000:1001
root:1000000:1000000000
david:165536:65536
root:1001000000:1000000000

subgid contains:

❯ cat subgid
root:1000000:1000000000
david:165536:65536
david:1000:1
root:1001000000:1000000000
root:1000:1001
root:1000000:1000000000
david:165536:65536
root:1001000000:1000000000

Hopefully this helps.

Thank you.

-=david=-

Oh yeah, that /etc/subuid and /etc/subgid is a mess and likely the issue here.
Incus 0.5 fixed a long standing bug where only the first entry in those files would be considered for use by containers.

But then on systems like yours that have multiple conflicting entries, it’s going to cause some issues. I’d recommend that you wipe both files and then only put a single line in each of those:

root:1000000:1000000000

Once that’s done, restart Incus with systemctl restart incus, then confirm that new containers work properly and after that, try restarting your existing containers as they will need to go through another automatic remap stage to get back to working order.

2 Likes

Hi,

I can confirm that with the single entry, the foo container now starts. I’ll go through and remap the others later on (as they are still running, so will have to schedule in the time to do that).

Thank you very much!

-=david=-

Just to add some more info - I got this issue too on a clean install of Incus after first installing LXD on Debian 12.

ERROR    conf - ../src/lxc/conf.c:lxc_map_ids:3701 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument": newuidmap 6100 0 165536 10000001 0 10165537 1000000000

LXD install was 5.0.2 from the debian repos
Incus install was v0.5.1 from the Zabbly repos

My subuid and subgid files look like this:

# cat /etc/subuid
stewart:100000:65536
_lxd:165536:10000001
root:165536:10000001
root:10165537:1000000000
# cat /etc/subgid
stewart:100000:65536
_lxd:165536:10000001
root:165536:10000001
root:10165537:1000000000

I’m guessing the state of what’s here is from setting up LXD…?

Does Incus pick up a range from here on the basis of finding a range for the root user?

I think you should remove these two lines,

_lxd:165536:10000001
root:165536:10000001
  1. They are on duplicate ranges.
  2. root is already defined later on the last line.

Removing just the root:165536:10000001 line then restarting the Incus daemon appears to have done the trick.

I was able to launch a new Debian 12 container with the LXD install too. Although I’m still not sure if it actually makes use of the _lxd user/group or not, given the daemon and LXC monitor runs as root. The only thing I can see using this _lxd name is that the /var/lib/lxd directory has _lxd set as its UID.

Anyway thought it was worth documenting my situation in the previous post just to highlight that it seems this subuid file mess-up can happen just as part of default installation of LXD(?). Not sure what actually touched those files as part of the setup though.

Yeah ever since fixing the idmap parser issue with Incus 0.5, we’ve certainly uncovered many systems where a combination of past package installations and occasional manual changes has caused rather invalid configurations :slight_smile:

It’s certainly annoying to hit that as part of an upgrade, though the issue technically was already and there and could have caused some more subtle issues down the line, so overall, it’s good that everyone gets to notice their bad config files and fix those.

OK so… I did some testing in a new blank Debian 12 VM (under Incus :wink: ) and, yeah, it’s the LXD preinst script that fiddles with the subuid and subgid files. See debian/lxd.preinst · 6835dadbb20ebe7672db889fc9365d06bc24a306 · Debian Go Packaging Team / packages / lxd · GitLab

After first installing LXD, we get this:

root@debian-idmap-test:~# cat /etc/subgid
_lxd:100000:10000001
root:100000:10000001
root@debian-idmap-test:~# cat /etc/subuid
_lxd:100000:10000001
root:100000:10000001

…then after installing Incus 0.5.1 from Zabbly, we get

root@debian-idmap-test:~# cat /etc/subuid
_lxd:100000:10000001
root:100000:10000001
root:10100001:1000000000
root@debian-idmap-test:~# cat /etc/subgid
_lxd:100000:10000001
root:100000:10000001
root:10100001:1000000000

So, if you install both on Debian, you will end up in this broken state unable to start containers in Incus:

root@debian-idmap-test:~# incus launch images:debian/12 debian-test --storage main
Launching debian-test
Error: Failed instance creation: Failed to run: /opt/incus/bin/incusd forkstart debian-test /var/lib/incus/containers /run/incus/debian-test/lxc.conf: exit status 1
root@debian-idmap-test:~# incus info --show-log debian-test
Name: debian-test
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2024/02/08 10:46 UTC
Last Used: 2024/02/08 10:46 UTC

Log:

lxc debian-test 20240208104619.676 ERROR    conf - ../src/lxc/conf.c:lxc_map_ids:3701 - newuidmap failed to write mapping "newuidmap: write to uid_map failed: Invalid argument": newuidmap 14874 0 100000 10000001 0 10100001 1000000000
lxc debian-test 20240208104619.676 ERROR    start - ../src/lxc/start.c:lxc_spawn:1788 - Failed to set up id mapping.
lxc debian-test 20240208104619.676 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc debian-test 20240208104619.677 ERROR    start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "debian-test"
lxc debian-test 20240208104619.677 WARN     start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 17 for process 14874
lxc 20240208104619.705 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20240208104619.705 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"

I can see Incus currently has basically the same subuid/subgid allocation code in a postinst script: incus/debian/incus-base.postinst at 141fb0736cc12083b086c389c68c434f86d5749e · zabbly/incus · GitHub . But it doesn’t check to see if root already has a valid range? I assume it is the duplicate set of root entries that causes the problem but am not 100% sure.

So… I guess this is a Debian packaging issue/bug? Can I help by reporting an issue Debian-side or is this something to look at Incus-side?