Containers recovery after snap channel switch

isolin · September 18, 2020, 8:17pm

When the LXD snap updated to 4.5, none of my X11 enabled containers was able to start. This issue is already being discussed in the 4.5 announcement thread.

I tried to revert to the previous version using first sudo snap revert lxd (just changed revision but staed at 4.5) then calling snap refresh --channel=4.4/stable.

The channel change somehow disrupted the database of containers. lxc list showed nothing. Calling back snap refresh --channel=4.5/stable didn’t help either, so I started to mount the respective zfs containers following Simos’ blog post.

I was able to recover only one of them - the one that has no X11 profile enabled. The remaining 17 containers return an error:

sudo lxd import xyz
Error: The instance's directory "/var/snap/lxd/common/lxd/storage-pools/default/containers/xyz" appears to be empty. Please ensure that the instance's storage volume is mounted

I checked the path and no, it is not empty. Everything is there: backup.yaml, metadata.yaml, rootfsandtemplates`.

Please help me to figure out how I can recover all the containers. I use them for my work and this is a very serious problem for me.

stgraber · September 18, 2020, 8:24pm

Ok, lets try to get things back online in a more reasonable way first.
Can you show ls -lh /var/snap/lxd/common/lxd/database?

With a bit of luck, your backups are usable and we can just get you back on 4.5, then upgrade to 4.6 and get you the fix I just submitted to fix the X11 thing.

isolin · September 18, 2020, 8:26pm

Thank you very much for taking care!

drwxr-x--- 2 root root 12K Sep 18 20:46 global
drwxr-x--- 2 root root 12K Sep 18 20:46 global.bak
-rw-r--r-- 1 root root 40K Sep 18 13:24 local.db
-rw-r--r-- 1 root root 40K Jul 10 18:59 local.db.bak

stgraber · September 18, 2020, 8:28pm

Hmm, this isn’t very promising given the very similar times on the two global databases…

Anyway, let’s try it.

cp -R /var/snap/lxd/common/lxd/database /root/lxd.db
snap refresh lxd --candidate
systemctl stop snap.lxd.daemon snap.lxd.daemon.unix.socket
rm -rf /var/snap/lxd/common/lxd/database/global
cp -r /root/lxd.db/global.bak /var/snap/lxd/common/lxd/database/global
systemctl start snap.lxd.daemon.unix.socket
lxc list

stgraber · September 18, 2020, 8:29pm

(LXD automatically makes a backup before performing schema changes, this makes it bulletproof should a version upgrade fail, however, attempts at downgrading or reinstalling could have triggered another backup, eliminating the previous backup in the process…)

isolin · September 18, 2020, 8:31pm

This was certainly not the first 4.4. -> 4.5 upgrade time. It was probably when coming back to 4.5 from the reverted 4.4.

isolin · September 18, 2020, 8:33pm

lxc list is empty and now I need sudo for lxc list

So the backup is from the second channel change.

stgraber · September 18, 2020, 8:37pm

Ok, unless you have a backup of /var/snap/lxd/common/lxd/database somewhere, then there’s no way to recover your DB.

In such case, your best bet is probably:

snap refresh lxd --channel=4.4/stable
systemctl stop snap.lxd.daemon.unix.socket snap.lxd.daemon
rm -rf /var/snap/lxd/common/lxd/database
umount -l /run/snapd/ns/lxd.mnt
rm /run/snapd/ns/lxd.mnt
MOUNT ALL YOUR DATASETS AND CONFIRM THEY’RE ACCESSIBLE
systemctl start snap.lxd.daemon.unix.socket snap.lxd.daemon
lxd import NAME

isolin · September 18, 2020, 8:39pm

No, I have no other backup. I will try the imports in 4.4.

isolin · September 18, 2020, 8:48pm

Like this? sudo zfs mount lxd/containers/mycontainer

isolin · September 18, 2020, 9:19pm

So zfs was alrady mounted for all. I had to recreate some profiles (like X11). Around half of the containers recovered fine, but some didn’t. I see no pattern there. The error is the same (the instance’s dir appears to be empty) but I can confirm that there are all files as expected in /var/snap/lxd/common/lxd/storage-pools/default/containers/mycontainer

stgraber · September 18, 2020, 10:11pm

Look through /var/snap/lxd/common/lxd/mntns/var/snap/lxd/common/lxd/storage-pools, chances are, they’re not actually mounted where it matters.

isolin · September 18, 2020, 10:51pm

There is no /var/snap/lxd/common/lxd/mntns

sudo ls /var/snap/lxd/common/lxd/
backups     devices  logs	 server.key	unix.socket
cache	    devlxd   networks	 shmounts	virtual-machines
containers  disks    security	 snapshots	virtual-machines-snapshots
database    images   server.crt  storage-pools

Over here it looks fine:

sudo ls /var/snap/lxd/common/lxd/storage-pools/default/containers/mycontainer
backup.yaml  metadata.yaml  rootfs  templates

stgraber · September 18, 2020, 11:12pm

Oops, sorry /var/snap/lxd/common/mntns

isolin · September 18, 2020, 11:25pm

Interesting, here I can see folders for all the containers, but some of them are indeed empty.

stgraber · September 18, 2020, 11:56pm

Yeah, most likely those wouldn’t have been mounted before LXD was started, anything after that would not propagate into the mntns.

isolin · September 19, 2020, 1:14am

That worked … kind of. I imported all containers and I can start them. X11 apps work as well.

But it looks like none of the containers has access to the network bridge of the host. They have no internet connection.

isolin · September 19, 2020, 9:48am

So the network bridge created at lxd init is gone. netplan shows only my direct network interface of the host.

So how can I setup the bridge again so that the old containers as well as any new ones automatically get it? Is it safe to call the init again?
Should I follow these instructions to create the bridge manually?
Did the database backup/replace reset any other settings I haven’t noticed yet?

simos · September 19, 2020, 10:42am

First of all, I updated the top of the blog post to guide users to upgrade to LXD 4.6/candidate in order to keep using GUI containers, https://blog.simos.info/running-x11-software-in-lxd-containers/

lxd init is a wizard that guides you to create a baseline configuration so that you can create containers straight away. If something is missing, like lxdbr0, then you can just create it using lxc network. No need to run lxd init again.

Here is how to re-create lxdbr0, if none exists.

ubuntu@mycontainer:~$ sudo lxc network list
+------+------+---------+-------------+---------+
| NAME | TYPE | MANAGED | DESCRIPTION | USED BY |
+------+------+---------+-------------+---------+
ubuntu@mycontainer:~$ sudo lxc network create lxdbr0
Network lxdbr0 created
ubuntu@mycontainer:~$ sudo lxc network list
+--------+--------+---------+-------------+---------+
|  NAME  |  TYPE  | MANAGED | DESCRIPTION | USED BY |
+--------+--------+---------+-------------+---------+
| lxdbr0 | bridge | YES     |             | 0       |
+--------+--------+---------+-------------+---------+
ubuntu@mycontainer:~$

isolin · September 19, 2020, 11:10am

Unfortunately that failed for me:

Error: Failed to automatically find an unused IPv6 subnet, manual configuration required

Well I fear there can be more than just the networking missing. That’s why I thought of running the full init.