Common start logic: Expand symlinks: lstat /var/lib/lxd: no such file or directory"

I’ve just upgraded my LXD server from a 4.0 binary release to the latest (4.4) Snap release.

I created a bind mount in my fstab /var/lib/lxd /var/snap/lxd/common/lxd bind defaults,bind thinking that would get rid of any issues with the folders but I think I thought wrong.

All the containers I try to start throw a

t=2020-09-17T15:37:26+0200 lvl=eror msg="Failed to start instance 'X': Common start logic: Expand symlinks: lstat /var/lib/lxd: no such file or directory"

However that folder does normally exist.

root@procyon:/# ls -la /var/lib/lxd
total 76
drwx--x--x 17 root root 4096 Sep 17 15:23 .
drwxr-xr-x 39 root root 4096 Sep 17 15:44 ..
drwx------  2 root root 4096 Apr 15 14:57 backups
drwx------  3 root root 4096 Sep 17 14:55 cache
lrwxrwxrwx  1 root root   10 Apr 15 15:14 cluster.crt -> server.crt
lrwxrwxrwx  1 root root   10 Apr 15 15:14 cluster.key -> server.key
drwx--x--x  2 root root 4096 Sep 14 08:51 containers
drwx------  3 root root 4096 Sep 17 15:46 database
drwx--x--x 45 root root 4096 Sep 17 15:23 devices
drwxr-xr-x  2 root root 4096 Apr 15 14:57 devlxd
drwx------  2 root root 4096 Apr 15 14:57 disks
drwx------  2 root root 4096 Sep 17 05:02 images
drwx------ 42 root root 4096 Sep 17 15:23 logs
drwx--x--x  3 root root 4096 Apr 15 14:59 networks
srwx------  1 root root    0 Sep 17 15:23 seccomp.socket
drwx------  4 root root 4096 Apr 15 15:18 security
-rw-r--r--  1 root root  765 Apr 15 14:57 server.crt
-rw-------  1 root root  288 Apr 15 14:57 server.key
lrwxrwxrwx  1 root root   39 Sep 17 15:23 shmounts -> /var/snap/lxd/common/shmounts/instances
drwx------  2 root root 4096 Apr 15 14:57 snapshots
drwx--x--x  4 root root 4096 Apr 15 15:30 storage-pools
srw-rw----  1 root lxd     0 Sep 17 15:23 unix.socket
drwx--x--x  2 root root 4096 Apr 15 14:57 virtual-machines
drwx------  2 root root 4096 Apr 15 14:57 virtual-machines-snapshots

Same content for the snap.

root@procyon:/# ls -la /var/snap/lxd/common/lxd
total 76
drwx--x--x 17 root root 4096 Sep 17 15:23 .
drwxr-xr-x  7 root root 4096 Sep 17 15:23 ..
drwx------  2 root root 4096 Apr 15 14:57 backups
drwx------  3 root root 4096 Sep 17 14:55 cache
lrwxrwxrwx  1 root root   10 Apr 15 15:14 cluster.crt -> server.crt
lrwxrwxrwx  1 root root   10 Apr 15 15:14 cluster.key -> server.key
drwx--x--x  2 root root 4096 Sep 14 08:51 containers
drwx------  3 root root 4096 Sep 17 15:46 database
drwx--x--x 45 root root 4096 Sep 17 15:23 devices
drwxr-xr-x  2 root root 4096 Apr 15 14:57 devlxd
drwx------  2 root root 4096 Apr 15 14:57 disks
drwx------  2 root root 4096 Sep 17 05:02 images
drwx------ 42 root root 4096 Sep 17 15:23 logs
drwx--x--x  3 root root 4096 Apr 15 14:59 networks
srwx------  1 root root    0 Sep 17 15:23 seccomp.socket
drwx------  4 root root 4096 Apr 15 15:18 security
-rw-r--r--  1 root root  765 Apr 15 14:57 server.crt
-rw-------  1 root root  288 Apr 15 14:57 server.key
lrwxrwxrwx  1 root root   39 Sep 17 15:23 shmounts -> /var/snap/lxd/common/shmounts/instances
drwx------  2 root root 4096 Apr 15 14:57 snapshots
drwx--x--x  4 root root 4096 Apr 15 15:30 storage-pools
srw-rw----  1 root lxd     0 Sep 17 15:23 unix.socket
drwx--x--x  2 root root 4096 Apr 15 14:57 virtual-machines
drwx------  2 root root 4096 Apr 15 14:57 virtual-machines-snapshots

Anybody got a clue what’s wrong?

What are the symlinks in containers and virtual-machines like?

For containers

lrwxrwxrwx  1 root root   55 Aug 30 13:41 zoom -> /var/lib/lxd/storage-pools/ceph-erasure/containers/zoom/
lrwxrwxrwx  1 root root   55 Jul 14 09:22 zuul -> /var/lib/lxd/storage-pools/ceph-erasure/containers/zuul/

The targets for these symlinks exist.

I have no VMs yet on this node.

Right, so you need to edit all those symlinks to point to /var/snap/lxd/common/lxd/storage-pools/ceph-erasure/... effectively replacing the /var/lib/lxd with /var/snap/lxd/common/lxd in all symlink targets.

If you also have ZFS backed containers, you’ll need to also manually edit all mountpoints as listed in zfs list -t all -o name,mountpoint.

Mister Graber to the rescue yet again! I had to also create a rootfs folder in each symlink target but after that most containers started working again! (I wanted to share the script I wrote but the script ate itself as I run it because of a silly rm line)

My dev container is still throwing cgroup errors but seeing as I do all kind of black magic with that container it’s not that crazy. I will dive deeper once I’m out of panic mode.

Thanks again!