Need help: containers failed to start after snap refresh to 5.3

I also ran into the same issue. Running lxd 5.3 via snapd on an rpi. Any containers I have will fail to start if I try to mount a path that is outside of the root filesystem it seems.

Here is my config snippet from the containers:

  homedir:
    path: /home/david/
    source: /home/david/container-home/
    type: disk
  storage:
    path: /storage/incoming/
    source: /storage/incoming/
    type: disk

the homedir mounts fine, however storage will fail to start in the same exact way as the OP’s log. my /storage path is simply:

/dev/sda1 on /storage type ext4 (rw,noexec,relatime,stripe=8191)

I am wondering if it is because of noexec flag? I can probably test later on this but at the moment, I cannot take down my containers again. I’ve since reverted to 5.2 and locked to the 5.2-stable channel.

I think noexec is not relevant since all my ext4 disks are using default mount options.


/dev/sda1 on /mnt/st1000dm010 type ext4 (rw,nosuid,nodev,relatime)

I’d like to note that for me headless containers mount disk devices fine, the issue seems to be only with GUI-enabled containers: Cannot start lxc containers with gui profile - #2 by isolin

Hmm, tried with ext4 now:

root@v1:~# truncate -s 10G foo.img
root@v1:~# mkfs.ext4 foo.img 
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done                            
Creating filesystem with 2621440 4k blocks and 655360 inodes
Filesystem UUID: 89a220a4-b614-405d-b7be-ff774b40a508
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 

root@v1:~# mkdir /mnt/foo
root@v1:~# mount foo.img /mnt/foo
root@v1:~# lxc init images:alpine/edge a1
Creating a1
root@v1:~# lxc config device add a1 foo disk source=/mnt/foo path=/mnt/foo
Device foo added to a1
root@v1:~# lxc start a1

That should have been pretty much as close as it gets to your setup, you’d think…

To be clear, we can clearly see that there is a problem and we’d love to fix it but the log doesn’t provide enough details to figure out what’s going on and we’ve so far been unable to reproduce it on one of our own systems.

So we either need step by step instructions to reproduce this on a clean system or need access to an affected system so we can hopefully see what’s going on and then reproduce this ourselves.

Just contemplating if this would help: snap connect lxd:removable-media?

I already rolled back to 5.2 and I need to keep working with my lxd, so I can’t test that now.

LXD doesn’t use the removable-media snap interface.

Same here. I cant mount my RCLONE mounts from my host. my zfs mount works fine.
My rclone mounts have worked fine for years. No over night they failed.

Edit: I can add the mounts if the lxc is running. But if I restart /stop/start it fails. Ive also tried adding when the lxc is down and then start. Also fails.

lxc test 20220630202354.768 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc test 20220630202354.768 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc test 20220630202354.769 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc test 20220630202354.769 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc test 20220630202354.770 WARN cgfsng - …/src/src/lxc/cgroups/cgfsng.c:fchowmodat:1252 - No such file or directory - Failed to fchownat(42, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc test 20220630202354.888 ERROR conf - …/src/src/lxc/conf.c:mount_entry:2459 - Operation not permitted - Failed to mount “/var/snap/lxd/common/lxd/devices/test/disk.CBR.home-david-rclone” on “/var/snap/lxd/common/lxc//home/david/rclone”
lxc test 20220630202354.888 ERROR conf - …/src/src/lxc/conf.c:lxc_setup:4375 - Failed to setup mount entries
lxc test 20220630202354.888 ERROR start - …/src/src/lxc/start.c:do_start:1275 - Failed to setup container “test”
lxc test 20220630202354.888 ERROR sync - …/src/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 3)
lxc test 20220630202354.895 WARN network - …/src/src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from “eth0” to its initial name “veth0d2b4b0a”
lxc test 20220630202354.895 ERROR lxccontainer - …/src/src/lxc/lxccontainer.c:wait_on_daemonized_start:877 - Received container state “ABORTING” instead of “RUNNING”
lxc test 20220630202354.895 ERROR start - …/src/src/lxc/start.c:__lxc_start:2074 - Failed to spawn container “test”
lxc test 20220630202354.895 WARN start - …/src/src/lxc/start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 43 for process 1086490
lxc test 20220630202400.128 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc test 20220630202400.129 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc 20220630202400.676 ERROR af_unix - …/src/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220630202400.676 ERROR commands - …/src/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command “get_state”

Exactly the same as you did except I mount the image to /DATA instead of /mnt/DATA

    ~  lxc info --show-log lemp                                                          1 ✘  5s  
Name: lemp
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/07/01 09:29 WIB
Last Used: 2022/07/01 09:38 WIB

Log:

lxc lemp 20220701023825.325 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc lemp 20220701023825.326 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc lemp 20220701023825.326 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc lemp 20220701023825.326 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc lemp 20220701023825.419 ERROR    conf - ../src/src/lxc/conf.c:mount_entry:2459 - Operation not permitted - Failed to mount "/var/snap/lxd/common/lxd/devices/lemp/disk.nginx.etc-nginx" on "/var/snap/lxd/common/lxc//etc/nginx"
lxc lemp 20220701023825.419 ERROR    conf - ../src/src/lxc/conf.c:lxc_setup:4375 - Failed to setup mount entries
lxc lemp 20220701023825.419 ERROR    start - ../src/src/lxc/start.c:do_start:1275 - Failed to setup container "lemp"
lxc lemp 20220701023825.419 ERROR    sync - ../src/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 3)
lxc lemp 20220701023825.426 WARN     network - ../src/src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from "eth0" to its initial name "vethe55eebe3"
lxc lemp 20220701023825.426 ERROR    lxccontainer - ../src/src/lxc/lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING"
lxc lemp 20220701023825.426 ERROR    start - ../src/src/lxc/start.c:__lxc_start:2074 - Failed to spawn container "lemp"
lxc lemp 20220701023825.426 WARN     start - ../src/src/lxc/start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 17 for process 5199
lxc lemp 20220701023830.483 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc lemp 20220701023830.483 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc 20220701023830.502 ERROR    af_unix - ../src/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220701023830.502 ERROR    commands - ../src/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"

I can also confirm that ext4 loop device works ok.

When I rollback LXD to 5.1, everything came back!

What filesystem is this on?

I think we are hitting the same issue.

Ubuntu 20.04 host, Ubuntu 20.04 container, LXD 5.3, currently booted on 5.4.0-113-generic kernel, /var/snap/lxd/common/lxd on btrfs.

This has started happening on a container that worked fine previously.

We can work around the issue by setting the container as privileged.

lxc ourcontainer 20220702094110.198 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc ourcontainer 20220702094110.198 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc ourcontainer 20220702094110.200 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc ourcontainer 20220702094110.200 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc ourcontainer 20220702094110.201 WARN     cgfsng - ../src/src/lxc/cgroups/cgfsng.c:fchowmodat:1252 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc ourcontainer 20220702094110.343 ERROR    conf - ../src/src/lxc/conf.c:mount_entry:2459 - Operation not permitted - Failed to mount "/var/snap/lxd/common/lxd/devices/ourcontainer/disk.aadisable.sys-module-apparmor-parameters-enabled" on "/var/snap/lxd/common/lxc//sys/module/apparmor/parameters/enabled"
lxc ourcontainer 20220702094110.344 ERROR    conf - ../src/src/lxc/conf.c:lxc_setup:4375 - Failed to setup mount entries
lxc ourcontainer 20220702094110.344 ERROR    start - ../src/src/lxc/start.c:do_start:1275 - Failed to setup container "ourcontainer"
lxc ourcontainer 20220702094110.344 ERROR    sync - ../src/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 3)
lxc ourcontainer 20220702094110.357 WARN     network - ../src/src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from "eth0" to its initial name "veth981c4d9c"
lxc ourcontainer 20220702094110.357 ERROR    lxccontainer - ../src/src/lxc/lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING"
lxc ourcontainer 20220702094110.357 ERROR    start - ../src/src/lxc/start.c:__lxc_start:2074 - Failed to spawn container "ourcontainer"
lxc ourcontainer 20220702094110.357 WARN     start - ../src/src/lxc/start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 41 for process 2428210
lxc ourcontainer 20220702094115.677 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc ourcontainer 20220702094115.677 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc 20220702094115.738 ERROR    af_unix - ../src/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220702094115.738 ERROR    commands - ../src/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"

I’m hitting this error as well, but only on NFS shares (mounted on the host) added as devices to the containers. Other containers that have regular folders from ext4 drives shared to them start fine.

This is on Jammy, kernel 5.15.0-40-generic. As others have done, reverting the LXD snap to 5.2 allows starting the containers with the same nfs shares without issues.

Edit: the containers storage is btrfs

I ran into this out of the blue today as well. One observation is that config device add does not exhibit the failure while the container is running, even for the exact same disks that cause the container to fail during startup.

This works fine:
lxc start tester
lxc config device add tester test1 disk source=/data/prizm/nasfs/dvr path=/mnt/dvr

But then lxc exec tester reboot… and the instance fails to come back up. Remove the disk, instance starts. Run device add again and it works, until you restart the container.

Thanks, that suggests it is perhaps some regression in liblxc itself rather than lxd, or done difference in how the disk devices are passed at start time vs when running. This is a useful data point and I’ll try and recreate it. Thanks

Looks like the fix for liblxc has been found here by @stgraber Cannot start lxc containers with gui profile - #11 by stgraber

1 Like

I can confirm that it fixed the problems for me :slight_smile:

1 Like

Fixed here as well. NFS mounts are shared properly with containers (switching to latest/candidate channel and updating to 5.3-924be6a)

Fixed my problem too after update to latest

1 Like