Containers recovery after snap channel switch

Ok, unless you have a backup of /var/snap/lxd/common/lxd/database somewhere, then there’s no way to recover your DB.

In such case, your best bet is probably:

  • snap refresh lxd --channel=4.4/stable
  • systemctl stop snap.lxd.daemon.unix.socket snap.lxd.daemon
  • rm -rf /var/snap/lxd/common/lxd/database
  • umount -l /run/snapd/ns/lxd.mnt
  • rm /run/snapd/ns/lxd.mnt
  • MOUNT ALL YOUR DATASETS AND CONFIRM THEY’RE ACCESSIBLE
  • systemctl start snap.lxd.daemon.unix.socket snap.lxd.daemon
  • lxd import NAME

No, I have no other backup. I will try the imports in 4.4.

Like this? sudo zfs mount lxd/containers/mycontainer

So zfs was alrady mounted for all. I had to recreate some profiles (like X11). Around half of the containers recovered fine, but some didn’t. I see no pattern there. The error is the same (the instance’s dir appears to be empty) but I can confirm that there are all files as expected in /var/snap/lxd/common/lxd/storage-pools/default/containers/mycontainer

Look through /var/snap/lxd/common/lxd/mntns/var/snap/lxd/common/lxd/storage-pools, chances are, they’re not actually mounted where it matters.

There is no /var/snap/lxd/common/lxd/mntns

sudo ls /var/snap/lxd/common/lxd/
backups     devices  logs	 server.key	unix.socket
cache	    devlxd   networks	 shmounts	virtual-machines
containers  disks    security	 snapshots	virtual-machines-snapshots
database    images   server.crt  storage-pools

Over here it looks fine:

sudo ls /var/snap/lxd/common/lxd/storage-pools/default/containers/mycontainer
backup.yaml  metadata.yaml  rootfs  templates

Oops, sorry /var/snap/lxd/common/mntns

Interesting, here I can see folders for all the containers, but some of them are indeed empty.

Yeah, most likely those wouldn’t have been mounted before LXD was started, anything after that would not propagate into the mntns.

That worked … kind of. I imported all containers and I can start them. X11 apps work as well.

But it looks like none of the containers has access to the network bridge of the host. They have no internet connection.

So the network bridge created at lxd init is gone. netplan shows only my direct network interface of the host.

  • So how can I setup the bridge again so that the old containers as well as any new ones automatically get it? Is it safe to call the init again?
  • Should I follow these instructions to create the bridge manually?
  • Did the database backup/replace reset any other settings I haven’t noticed yet?

First of all, I updated the top of the blog post to guide users to upgrade to LXD 4.6/candidate in order to keep using GUI containers, https://blog.simos.info/running-x11-software-in-lxd-containers/

lxd init is a wizard that guides you to create a baseline configuration so that you can create containers straight away. If something is missing, like lxdbr0, then you can just create it using lxc network. No need to run lxd init again.

Here is how to re-create lxdbr0, if none exists.

ubuntu@mycontainer:~$ sudo lxc network list
+------+------+---------+-------------+---------+
| NAME | TYPE | MANAGED | DESCRIPTION | USED BY |
+------+------+---------+-------------+---------+
ubuntu@mycontainer:~$ sudo lxc network create lxdbr0
Network lxdbr0 created
ubuntu@mycontainer:~$ sudo lxc network list
+--------+--------+---------+-------------+---------+
|  NAME  |  TYPE  | MANAGED | DESCRIPTION | USED BY |
+--------+--------+---------+-------------+---------+
| lxdbr0 | bridge | YES     |             | 0       |
+--------+--------+---------+-------------+---------+
ubuntu@mycontainer:~$ 

Unfortunately that failed for me:

Error: Failed to automatically find an unused IPv6 subnet, manual configuration required

Well I fear there can be more than just the networking missing. That’s why I thought of running the full init.

Indeed there are other things not working.

$ lxc launch ubuntu-minimal:bionic test
Creating test
Error: Failed instance creation: Create instance: Create instance: Invalid devices: Failed detecting root disk device: No root device could be found

I fear there is somethig seriously wrong. I will do a backup of all the containers using `lxc export. Would it then be possible to remove all lxd and zfs and reinstall again (so that the containers keep working incl. services like networking? If that would work, it would be perhaps a good point to get rid of blocked deleted images in zfs eating up my storage.

The first error on the networking says that LXD could not find an unused IPv6 subnet. This is somewhat weird as the address space for IPv6 is rather big. If you do not use IPv6 anyway, you can create lxdbr0 without IPv6 in any case.

The second error says that your default profiles does not mention a storage pool.
Show us the output of

lxc profile show default

Then, run the following to show what storage is there in LXD.

lxc storage list
$ lxc profile show default
config: {}
description: Default LXD profile
devices: {}
name: default
used_by:
- /instances/mycontainer
... (17 more)
$ lxc storage list
+---------+-------------+--------+--------------------------------------------+---------+
|  NAME   | DESCRIPTION | DRIVER |                   SOURCE                   | USED BY |
+---------+-------------+--------+--------------------------------------------+---------+
| default |             | zfs    | /var/snap/lxd/common/lxd/disks/default.img | 18      |
+---------+-------------+--------+--------------------------------------------+---------+

Okay, your profile is missing the two necessary devices as follows.

devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk

First, you need to create the lxdbr0 network device.
Try with the following,

lxc network create lxdbr0 ipv6.address=none

Then, the storage. The following should show the details of the ZFS pool.

lxc storage show default

Once you have both the lxdbr0 network device running, and the default storage pool, then you can complete the configuration of the default profile, and all should be working.

So the bridge was created now, but the containers still have no internet access.

$ lxc network list
+--------+----------+---------+-----------------+------+-------------+---------+
|  NAME  |   TYPE   | MANAGED |      IPV4       | IPV6 | DESCRIPTION | USED BY |
+--------+----------+---------+-----------------+------+-------------+---------+
| enp5s0 | physical | NO      |                 |      |             | 0       |
+--------+----------+---------+-----------------+------+-------------+---------+
| lxdbr0 | bridge   | YES     | 10.122.224.1/24 | none |             | 0       |
+--------+----------+---------+-----------------+------+-------------+---------+

When I call lxc list it shows running containers but the IPv4 and IPv6 are both empty for all containers.

The storage pool is

$ lxc storage show default
config:
  size: 64GB
  source: /var/snap/lxd/common/lxd/disks/default.img
  zfs.pool_name: default
description: ""
name: default
driver: zfs
used_by:
- /1.0/instances/mycontainer
... (17 more)
status: Created
locations:
- none

You need to add both the info about the network and the storage to the default profile.
When you lxc profile show default, you should get an output similar to what I showed above.

Here are the commands to add them to your default LXD profile.

lxc profile device add default eth0 nic nictype=bridged parent=lxdbr0
lxc profile device add default root disk pool=default path="/"

Thank you so much! So far it works again :slight_smile: