Need help: containers failed to start after snap refresh to 5.3

Hi my container failed to start after snap refresh.

I recreate a new container, it works at first, I can finish apt update/upgrade, but after I add a path mapping the container won’t start.

#mapping dirs                                                                                                                                                                                              
while read line                                                                                                                                                                                            
do                                                                                                                                                                                                         
    lxc config device add $CONTAINER `basename $line` disk source=$line path=$line                                                                                                                         
done < ~/bin/lxc/lxcdirs.txt

contents of lxcdirs.txt

alvin@alvin-WS-E500:~/bin/lxc$ cat lxcdirs.txt 
/home/alvin/repo
/home/alvin/repo2
/home/alvin/repo3
/home/alvin/OneDrive

logs:

lxc dev16 20220629081703.808 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc dev16 20220629081703.810 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc dev16 20220629081703.825 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc dev16 20220629081703.825 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc dev16 20220629081703.833 WARN     cgfsng - ../src/src/lxc/cgroups/cgfsng.c:fchowmodat:1252 - No such file or directory - Failed to fchownat(42, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc dev16 20220629081703.202 ERROR    conf - ../src/src/lxc/conf.c:mount_entry:2459 - Operation not permitted - Failed to mount "/var/snap/lxd/common/lxd/devices/dev16/disk.OneDrive.home-alvin-OneDrive" on "/var/snap/lxd/common/lxc//home/alvin/OneDrive"
lxc dev16 20220629081703.203 ERROR    conf - ../src/src/lxc/conf.c:lxc_setup:4375 - Failed to setup mount entries
lxc dev16 20220629081703.203 ERROR    start - ../src/src/lxc/start.c:do_start:1275 - Failed to setup container "dev16"
lxc dev16 20220629081703.203 ERROR    sync - ../src/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 3)
lxc dev16 20220629081703.218 WARN     network - ../src/src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from "eth0" to its initial name "veth57b2e665"
lxc dev16 20220629081703.218 ERROR    lxccontainer - ../src/src/lxc/lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING"
lxc dev16 20220629081703.218 ERROR    start - ../src/src/lxc/start.c:__lxc_start:2074 - Failed to spawn container "dev16"
lxc dev16 20220629081703.218 WARN     start - ../src/src/lxc/start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 43 for process 39685
lxc dev16 20220629081708.312 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc dev16 20220629081708.312 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc 20220629081708.345 ERROR    af_unix - ../src/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220629081708.345 ERROR    commands - ../src/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"

I don’t know why Ubuntu update to 5.3 even though the snap channel is latest/stable
I revert to 5.2 stable and the problem is gone.

LXD 5.3 is being rolled out to everyone, as it’s a phased rollout, snap info will keep showing 5.2 until the rollout reaches 100%.

I’ve reported this issue to @brauner to look into liblxc, most likely it’s the switch from LXC 4.0.12 to LXC 5.0 which is causing this issue.

Can you show the output of uname -a

alvin@alvin-WS-E500:~$ uname -a
Linux alvin-WS-E500 5.15.0-40-generic #43-Ubuntu SMP Wed Jun 15 12:54:21 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

alvin@alvin-WS-E500:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04 LTS
Release:	22.04
Codename:	jammy

And I’ve tried to snap remove lxd and snap install lxd the same symptom still comes back.

What storage backend are you using?

My attempt at reproducing this (same kernel, same LXD) is:

root@v1:~# uname -a
Linux v1 5.15.0-40-generic #43-Ubuntu SMP Wed Jun 15 12:54:21 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@v1:~# lxd init --auto
root@v1:~# lxc launch images:alpine/edge a1
Creating a1
Starting a1                                 
root@v1:~# lxc stop a1
root@v1:~# lxc config device add a1 home disk source=/home path=/srv/home
Device home added to a1
root@v1:~# lxc start a1
root@v1:~# snap list lxd
Name  Version      Rev    Tracking       Publisher   Notes
lxd   5.3-924be6a  23243  latest/stable  canonical✓  -
root@v1:~# 

I’m using zfs

Can you help tried mount points other than main filesystem?

Successful tries: (path on main filesystems)
lxc config device add dev16 etc disk source=/etc path=/home/alvin/etc
lxc config device add dev16 home disk source=/home path=/home/alvin/home

Failed tries: (mountpoints for other disks)
lxc config device add dev16 repo disk source=/mnt/intel660p path=/home/alvin/repo
lxc config device add dev16 repo2 disk source=/mnt/ct1000mx500 path=/home/alvin/repo2

root@v1:~# mkdir /mnt/blah
root@v1:~# mount -t tmpfs tmpfs /mnt/blah
root@v1:~# 
root@v1:~# lxc config device add a1 blah disk source=/mnt/blah path=/srv/blah
Device blah added to a1
root@v1:~# lxc restart a1
root@v1:~# 
  1. It’s so weired, I can mount tmpfs too. But failed on a physical ext4 SSD
root@alvin-WS-E500:~# mkdir /mnt/blah
root@alvin-WS-E500:~# mount -t tmpfs tmpfs /mnt/blah
root@alvin-WS-E500:~# lxc config device add dev16 blah disk source=/mnt/blah path=/srv/blah
Device blah added to dev16
root@alvin-WS-E500:~# lxc start dev16
root@alvin-WS-E500:~# lxc stop dev16
root@alvin-WS-E500:~# lxc config device add dev16 repo2 disk source=/mnt/ct1000mx500 path=/home/alvin/repo2
Device repo2 added to dev16
root@alvin-WS-E500:~# lxc start dev16
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart dev16 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/dev16/lxc.conf:
Try `lxc info --show-log dev16` for more info
  1. path is working on the fly but will fail after a container restart
root@alvin-WS-E500:~# lxc config device remove dev16 repo2
Device repo2 removed from dev16
root@alvin-WS-E500:~# lxc start dev16
root@alvin-WS-E500:~# lxc config device add dev16 repo2 disk source=/mnt/ct1000mx500 path=/srv/repo2
Device repo2 added to dev16
root@alvin-WS-E500:~# lxc stop dev16
root@alvin-WS-E500:~# lxc start dev16
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart dev16 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/dev16/lxc.conf:
Try `lxc info --show-log dev16` for more info

I also ran into the same issue. Running lxd 5.3 via snapd on an rpi. Any containers I have will fail to start if I try to mount a path that is outside of the root filesystem it seems.

Here is my config snippet from the containers:

  homedir:
    path: /home/david/
    source: /home/david/container-home/
    type: disk
  storage:
    path: /storage/incoming/
    source: /storage/incoming/
    type: disk

the homedir mounts fine, however storage will fail to start in the same exact way as the OP’s log. my /storage path is simply:

/dev/sda1 on /storage type ext4 (rw,noexec,relatime,stripe=8191)

I am wondering if it is because of noexec flag? I can probably test later on this but at the moment, I cannot take down my containers again. I’ve since reverted to 5.2 and locked to the 5.2-stable channel.

I think noexec is not relevant since all my ext4 disks are using default mount options.


/dev/sda1 on /mnt/st1000dm010 type ext4 (rw,nosuid,nodev,relatime)

I’d like to note that for me headless containers mount disk devices fine, the issue seems to be only with GUI-enabled containers: Cannot start lxc containers with gui profile - #2 by isolin

Hmm, tried with ext4 now:

root@v1:~# truncate -s 10G foo.img
root@v1:~# mkfs.ext4 foo.img 
mke2fs 1.46.5 (30-Dec-2021)
Discarding device blocks: done                            
Creating filesystem with 2621440 4k blocks and 655360 inodes
Filesystem UUID: 89a220a4-b614-405d-b7be-ff774b40a508
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 

root@v1:~# mkdir /mnt/foo
root@v1:~# mount foo.img /mnt/foo
root@v1:~# lxc init images:alpine/edge a1
Creating a1
root@v1:~# lxc config device add a1 foo disk source=/mnt/foo path=/mnt/foo
Device foo added to a1
root@v1:~# lxc start a1

That should have been pretty much as close as it gets to your setup, you’d think…

To be clear, we can clearly see that there is a problem and we’d love to fix it but the log doesn’t provide enough details to figure out what’s going on and we’ve so far been unable to reproduce it on one of our own systems.

So we either need step by step instructions to reproduce this on a clean system or need access to an affected system so we can hopefully see what’s going on and then reproduce this ourselves.

Just contemplating if this would help: snap connect lxd:removable-media?

I already rolled back to 5.2 and I need to keep working with my lxd, so I can’t test that now.

LXD doesn’t use the removable-media snap interface.

Same here. I cant mount my RCLONE mounts from my host. my zfs mount works fine.
My rclone mounts have worked fine for years. No over night they failed.

Edit: I can add the mounts if the lxc is running. But if I restart /stop/start it fails. Ive also tried adding when the lxc is down and then start. Also fails.

lxc test 20220630202354.768 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc test 20220630202354.768 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc test 20220630202354.769 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc test 20220630202354.769 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc test 20220630202354.770 WARN cgfsng - …/src/src/lxc/cgroups/cgfsng.c:fchowmodat:1252 - No such file or directory - Failed to fchownat(42, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc test 20220630202354.888 ERROR conf - …/src/src/lxc/conf.c:mount_entry:2459 - Operation not permitted - Failed to mount “/var/snap/lxd/common/lxd/devices/test/disk.CBR.home-david-rclone” on “/var/snap/lxd/common/lxc//home/david/rclone”
lxc test 20220630202354.888 ERROR conf - …/src/src/lxc/conf.c:lxc_setup:4375 - Failed to setup mount entries
lxc test 20220630202354.888 ERROR start - …/src/src/lxc/start.c:do_start:1275 - Failed to setup container “test”
lxc test 20220630202354.888 ERROR sync - …/src/src/lxc/sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 3)
lxc test 20220630202354.895 WARN network - …/src/src/lxc/network.c:lxc_delete_network_priv:3631 - Failed to rename interface with index 0 from “eth0” to its initial name “veth0d2b4b0a”
lxc test 20220630202354.895 ERROR lxccontainer - …/src/src/lxc/lxccontainer.c:wait_on_daemonized_start:877 - Received container state “ABORTING” instead of “RUNNING”
lxc test 20220630202354.895 ERROR start - …/src/src/lxc/start.c:__lxc_start:2074 - Failed to spawn container “test”
lxc test 20220630202354.895 WARN start - …/src/src/lxc/start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 43 for process 1086490
lxc test 20220630202400.128 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc test 20220630202400.129 WARN conf - …/src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc 20220630202400.676 ERROR af_unix - …/src/src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220630202400.676 ERROR commands - …/src/src/lxc/commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command “get_state”