Containers recovery after snap channel switch

When the LXD snap updated to 4.5, none of my X11 enabled containers was able to start. This issue is already being discussed in the 4.5 announcement thread.

I tried to revert to the previous version using first sudo snap revert lxd (just changed revision but staed at 4.5) then calling snap refresh --channel=4.4/stable.

The channel change somehow disrupted the database of containers. lxc list showed nothing. Calling back snap refresh --channel=4.5/stable didn’t help either, so I started to mount the respective zfs containers following Simos’ blog post.

I was able to recover only one of them - the one that has no X11 profile enabled. The remaining 17 containers return an error:

sudo lxd import xyz
Error: The instance's directory "/var/snap/lxd/common/lxd/storage-pools/default/containers/xyz" appears to be empty. Please ensure that the instance's storage volume is mounted

I checked the path and no, it is not empty. Everything is there: backup.yaml, metadata.yaml, rootfsandtemplates`.

Please help me to figure out how I can recover all the containers. I use them for my work and this is a very serious problem for me.

Ok, lets try to get things back online in a more reasonable way first.
Can you show ls -lh /var/snap/lxd/common/lxd/database?

With a bit of luck, your backups are usable and we can just get you back on 4.5, then upgrade to 4.6 and get you the fix I just submitted to fix the X11 thing.

Thank you very much for taking care!

drwxr-x--- 2 root root 12K Sep 18 20:46 global
drwxr-x--- 2 root root 12K Sep 18 20:46 global.bak
-rw-r--r-- 1 root root 40K Sep 18 13:24 local.db
-rw-r--r-- 1 root root 40K Jul 10 18:59 local.db.bak

Hmm, this isn’t very promising given the very similar times on the two global databases…

Anyway, let’s try it.

  • cp -R /var/snap/lxd/common/lxd/database /root/lxd.db
  • snap refresh lxd --candidate
  • systemctl stop snap.lxd.daemon snap.lxd.daemon.unix.socket
  • rm -rf /var/snap/lxd/common/lxd/database/global
  • cp -r /root/lxd.db/global.bak /var/snap/lxd/common/lxd/database/global
  • systemctl start snap.lxd.daemon.unix.socket
  • lxc list

(LXD automatically makes a backup before performing schema changes, this makes it bulletproof should a version upgrade fail, however, attempts at downgrading or reinstalling could have triggered another backup, eliminating the previous backup in the process…)

This was certainly not the first 4.4. -> 4.5 upgrade time. It was probably when coming back to 4.5 from the reverted 4.4.

lxc list is empty and now I need sudo for lxc list :frowning:

So the backup is from the second channel change.

Ok, unless you have a backup of /var/snap/lxd/common/lxd/database somewhere, then there’s no way to recover your DB.

In such case, your best bet is probably:

  • snap refresh lxd --channel=4.4/stable
  • systemctl stop snap.lxd.daemon.unix.socket snap.lxd.daemon
  • rm -rf /var/snap/lxd/common/lxd/database
  • umount -l /run/snapd/ns/lxd.mnt
  • rm /run/snapd/ns/lxd.mnt
  • MOUNT ALL YOUR DATASETS AND CONFIRM THEY’RE ACCESSIBLE
  • systemctl start snap.lxd.daemon.unix.socket snap.lxd.daemon
  • lxd import NAME

No, I have no other backup. I will try the imports in 4.4.

Like this? sudo zfs mount lxd/containers/mycontainer

So zfs was alrady mounted for all. I had to recreate some profiles (like X11). Around half of the containers recovered fine, but some didn’t. I see no pattern there. The error is the same (the instance’s dir appears to be empty) but I can confirm that there are all files as expected in /var/snap/lxd/common/lxd/storage-pools/default/containers/mycontainer

Look through /var/snap/lxd/common/lxd/mntns/var/snap/lxd/common/lxd/storage-pools, chances are, they’re not actually mounted where it matters.

There is no /var/snap/lxd/common/lxd/mntns

sudo ls /var/snap/lxd/common/lxd/
backups     devices  logs	 server.key	unix.socket
cache	    devlxd   networks	 shmounts	virtual-machines
containers  disks    security	 snapshots	virtual-machines-snapshots
database    images   server.crt  storage-pools

Over here it looks fine:

sudo ls /var/snap/lxd/common/lxd/storage-pools/default/containers/mycontainer
backup.yaml  metadata.yaml  rootfs  templates

Oops, sorry /var/snap/lxd/common/mntns

Interesting, here I can see folders for all the containers, but some of them are indeed empty.

Yeah, most likely those wouldn’t have been mounted before LXD was started, anything after that would not propagate into the mntns.

That worked … kind of. I imported all containers and I can start them. X11 apps work as well.

But it looks like none of the containers has access to the network bridge of the host. They have no internet connection.

So the network bridge created at lxd init is gone. netplan shows only my direct network interface of the host.

  • So how can I setup the bridge again so that the old containers as well as any new ones automatically get it? Is it safe to call the init again?
  • Should I follow these instructions to create the bridge manually?
  • Did the database backup/replace reset any other settings I haven’t noticed yet?

First of all, I updated the top of the blog post to guide users to upgrade to LXD 4.6/candidate in order to keep using GUI containers, https://blog.simos.info/running-x11-software-in-lxd-containers/

lxd init is a wizard that guides you to create a baseline configuration so that you can create containers straight away. If something is missing, like lxdbr0, then you can just create it using lxc network. No need to run lxd init again.

Here is how to re-create lxdbr0, if none exists.

ubuntu@mycontainer:~$ sudo lxc network list
+------+------+---------+-------------+---------+
| NAME | TYPE | MANAGED | DESCRIPTION | USED BY |
+------+------+---------+-------------+---------+
ubuntu@mycontainer:~$ sudo lxc network create lxdbr0
Network lxdbr0 created
ubuntu@mycontainer:~$ sudo lxc network list
+--------+--------+---------+-------------+---------+
|  NAME  |  TYPE  | MANAGED | DESCRIPTION | USED BY |
+--------+--------+---------+-------------+---------+
| lxdbr0 | bridge | YES     |             | 0       |
+--------+--------+---------+-------------+---------+
ubuntu@mycontainer:~$ 

Unfortunately that failed for me:

Error: Failed to automatically find an unused IPv6 subnet, manual configuration required

Well I fear there can be more than just the networking missing. That’s why I thought of running the full init.