Hi,
I have a problems with starting of LXD containers on ubuntu 20.04, on fresh installations as well as on the upgradees from 18.04, problem exhibits on LXD versions:
lxd 4.0.1 14804 latest/stable/… canonical✓ -
and
lxd 4.3 15913 latest/stable/… canonical✓ -
I’ve stripped it down to a minimal case where the problem reproduces - now I have 20 small containers (debian stretch and buster, but i think it does not matter), all of them are in a dedicated LVs formated to ext4:
root@u2004-1:~# mount|grep lxd/common
/dev/mapper/vg0-lxd_common on /var/snap/lxd/common type ext4 (rw,relatime)
/dev/mapper/vg0-c10 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c10 type ext4 (rw,relatime)
/dev/mapper/vg0-c9 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c9 type ext4 (rw,relatime)
/dev/mapper/vg0-c3 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c3 type ext4 (rw,relatime)
/dev/mapper/vg0-c1 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c1 type ext4 (rw,relatime)
/dev/mapper/vg0-c2 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c2 type ext4 (rw,relatime)
/dev/mapper/vg0-c8 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c8 type ext4 (rw,relatime)
/dev/mapper/vg0-c11 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c11 type ext4 (rw,relatime)
/dev/mapper/vg0-c7 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c7 type ext4 (rw,relatime)
/dev/mapper/vg0-c14 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c14 type ext4 (rw,relatime)
/dev/mapper/vg0-c13 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c13 type ext4 (rw,relatime)
/dev/mapper/vg0-c12 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c12 type ext4 (rw,relatime)
/dev/mapper/vg0-c4 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c4 type ext4 (rw,relatime)
/dev/mapper/vg0-c5 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c5 type ext4 (rw,relatime)
/dev/mapper/vg0-c6 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c6 type ext4 (rw,relatime)
/dev/mapper/vg0-c16 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c16 type ext4 (rw,relatime)
/dev/mapper/vg0-c18 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c18 type ext4 (rw,relatime)
/dev/mapper/vg0-c17 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c17 type ext4 (rw,relatime)
/dev/mapper/vg0-c15 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c15 type ext4 (rw,relatime)
/dev/mapper/vg0-c20 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c20 type ext4 (rw,relatime)
/dev/mapper/vg0-c19 on /var/snap/lxd/common/lxd/storage-pools/default/containers/c19 type ext4 (rw,relatime)
tmpfs on /var/snap/lxd/common/ns type tmpfs (rw,relatime,size=1024k,mode=700)
nsfs on /var/snap/lxd/common/ns/shmounts type nsfs (rw)
nsfs on /var/snap/lxd/common/ns/mntns type nsfs (rw)
after launching them everything is fine, but after reboot some of them, always random set, do not start:
root@u2004-1:~# lxc start --all
c2: error: Failed to run: /snap/lxd/current/bin/lxd forkstart c2 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c2/lxc.conf:
c2: Try `lxc info --show-log c2` for more info
c13: error: Failed to run: /snap/lxd/current/bin/lxd forkstart c13 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c13/lxc.conf:
c13: Try `lxc info --show-log c13` for more info
c14: error: Failed to run: /snap/lxd/current/bin/lxd forkstart c14 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c14/lxc.conf:
c14: Try `lxc info --show-log c14` for more info
c18: error: Failed to run: /snap/lxd/current/bin/lxd forkstart c18 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c18/lxc.conf:
c18: Try `lxc info --show-log c18` for more info
c20: error: Failed to run: /snap/lxd/current/bin/lxd forkstart c20 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c20/lxc.conf:
c20: Try `lxc info --show-log c20` for more info
c16: error: Failed to run: /snap/lxd/current/bin/lxd forkstart c16 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c16/lxc.conf:
c16: Try `lxc info --show-log c16` for more info
c19: error: Failed to run: /snap/lxd/current/bin/lxd forkstart c19 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c19/lxc.conf:
c19: Try `lxc info --show-log c19` for more info
c15: error: Failed to run: /snap/lxd/current/bin/lxd forkstart c15 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c15/lxc.conf:
c15: Try `lxc info --show-log c15` for more info
c17: error: Failed to run: /snap/lxd/current/bin/lxd forkstart c17 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c17/lxc.conf:
c17: Try `lxc info --show-log c17` for more info
Error: Some instances failed to start
root@u2004-1:~# lxc ls --format csv|grep -i running|wc -l
11
root@u2004-1:~# lxc info --show-log c17
Name: c17
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/07/04 18:18 UTC
Status: Stopped
Type: container
Profiles: default
Log:
lxc c17 20200704221929.441 ERROR cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1143 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.monitor.c17"
lxc c17 20200704221929.812 ERROR cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1143 - File exists - Failed to create directory "/sys/fs/cgroup/cpuset//lxc.payload.c17"
lxc c17 20200704221929.829 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1455 - No such file or directory - Failed to fchownat(17, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc c17 20200704221930.585 ERROR dir - storage/dir.c:dir_mount:152 - No such file or directory - Failed to mount "/var/snap/lxd/common/lxd/containers/c17/rootfs" on "/var/snap/lxd/common/lxc/"
lxc c17 20200704221930.585 ERROR conf - conf.c:lxc_mount_rootfs:1256 - Failed to mount rootfs "/var/snap/lxd/common/lxd/containers/c17/rootfs" onto "/var/snap/lxd/common/lxc/" with options "(null)"
lxc c17 20200704221930.585 ERROR conf - conf.c:lxc_setup_rootfs_prepare_root:3178 - Failed to setup rootfs for
lxc c17 20200704221930.585 ERROR conf - conf.c:lxc_setup:3277 - Failed to setup rootfs
lxc c17 20200704221930.586 ERROR start - start.c:do_start:1231 - Failed to setup container "c17"
lxc c17 20200704221930.605 ERROR sync - sync.c:__sync_wait:41 - An error occurred in another process (expected sequence number 5)
lxc c17 20200704221930.251 WARN network - network.c:lxc_delete_network_priv:3213 - Failed to rename interface with index 0 from "eth0" to its initial name "vethbc59a0f9"
lxc c17 20200704221930.251 ERROR start - start.c:__lxc_start:1952 - Failed to spawn container "c17"
lxc c17 20200704221930.251 WARN start - start.c:lxc_abort:1025 - No such process - Failed to send SIGKILL via pidfd 30 for process 3522
lxc c17 20200704221930.252 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:852 - Received container state "ABORTING" instead of "RUNNING"
lxc 20200704221931.283 WARN commands - commands.c:lxc_cmd_rsp_recv:122 - Connection reset by peer - Failed to receive response for command "get_state"
it does not matter if I run lxc start --all or start one by one with some delay between starts, once this happens the failed container cannot be started again.
I’ve discovered there is a file backup.yaml created under the mount point, not on the mounted LV and that causes this problem:
root@u2004-1:~# lxc stop --all
root@u2004-1:~# cd /var/snap/lxd/common/lxd/storage-pools/default/containers/
root@u2004-1:/var/snap/lxd/common/lxd/storage-pools/default/containers# umount *
root@u2004-1:/var/snap/lxd/common/lxd/storage-pools/default/containers# ls */backup.yaml -1
c13/backup.yaml
c14/backup.yaml
c15/backup.yaml
c16/backup.yaml
c17/backup.yaml
c18/backup.yaml
c19/backup.yaml
c2/backup.yaml
c20/backup.yaml
while the file is present, start of the container fails like cited above.
after their manual removal/renaming all the containers start again:
root@u2004-1:/var/snap/lxd/common/lxd/storage-pools/default/containers# for file in */backup.yaml ; do s=$(stat -c %y $file); rename "s/yaml/yaml-${s// /-}/" $file; done
root@u2004-1:/var/snap/lxd/common/lxd/storage-pools/default/containers# for x in c*; do mount $x;done
root@u2004-1:/var/snap/lxd/common/lxd/storage-pools/default/containers# lxc start --all
root@u2004-1:/var/snap/lxd/common/lxd/storage-pools/default/containers# lxc ls --format csv|grep -i running|wc -l
20
until next reboot stoping and starting works well, problem appears again after the reboot.
the same applies even if all containers are explicitely stopped prior to reboot and not autostarted on the boot.
i do not observe this problem with smaller count of containers.
it is possible to live with this workaround once one knows it, but I believe it is a bug that should be fixed…
bye,
tomas