Need help: Recovering instances from a failed system

Ok, you’ll need to run sudo zpool import -d /var/snap/lxd/common/lxd/disks -a which should then have sudo zfs list -t all show the datasets.

This is weird, no luck.

root@cmp4rpp-h1:/var/snap/lxd/common# zpool import -d /var/snap/lxd/common/lxd/disks -a
no pools available to import

root@cmp4rpp-h1:/var/snap/lxd/common/lxd/disks# ls -lsa
total 32180508
       4 drwx------  2 root root           4096 Feb 29 17:56 .
       4 drwx--x--x 18 lxd  nogroup        4096 Jul  2 10:14 ..
32180500 -rw-------  1 root root    80000000000 Jul  1 20:14 local.img

A bit weird indeed. Can you show sudo zdb -l /var/snap/lxd/common/lxd/disks/local.img?

root@cmp4rpp-h1:/var/snap/lxd/common/lxd/disks# sudo zdb -l /var/snap/lxd/common/lxd/disks/local.img
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3

That doesn’t look like zfs… Can you show sudo file /var/snap/lxd/common/lxd/disks/local.img?

Sorry for the delay. Looks like BTRFS

root@cmp4rpp-h1:/var/snap/lxd/common/lxd/disks# sudo file /var/snap/lxd/common/lxd/disks/local.img
/var/snap/lxd/common/lxd/disks/local.img: BTRFS Filesystem label "local", sectorsize 4096, nodesize 16384, leafsize 16384, UUID=53b02392-73bd-467d-b9db-5d97fcbed85f, 25118007296/80000000000 bytes used, 1 devices

Ok, so you can try:

  • sudo systemctl stop snap.lxd.daemon snap.lxd.daemon.unix.socket
  • sudo mv /var/snap/lxd/common/lxd/database /var/snap/lxd/common/lxd/database.broken
  • sudo nsenter --mount=/run/snapd/ns/lxc.mnt mount -o loop /var/snap/lxd/common/lxd/disks/local.img /var/snap/lxd/common/lxd/storage-pools/local/
  • sudo lxc list
  • sudo lxd import NAME-OF-CONTAINER

No dice on the nsenter, any ideas?

root@cmp4rpp-h1:/var/snap/lxd/common/lxd/disks# sudo nsenter --mount=/run/snapd/ns/lxc.mnt mount -o loop /var/snap/lxd/common/lxd/disks/local.img /var/snap/lxd/common/lxd/storage-pools/local/
nsenter: cannot open /run/snapd/ns/lxc.mnt: No such file or directory

There is no LXC, but there is a LXD. Trying that.

–mount=/run/snapd/ns/lxd.mnt

That didn’t work. :frowning:

Oh yeah, that was a typo, what happened with sudo nsenter --mount=/run/snapd/ns/lxd.mnt mount -o loop /var/snap/lxd/common/lxd/disks/local.img /var/snap/lxd/common/lxd/storage-pools/local/ ?

The command executed without error, nothing is visible in “/var/snap/lxd/common/lxd/storage-pools/local/” and the import command isn’t working.

Attempting to rerun the command returns that it’s already mounted.

Ok, so that’s a good sign.

What does sudo lxc list show? It should show an empty list.
And what does the lxd import NAME get you?

list is empty, import does not pick anything up.

root@cmp4rpp-h1:/var/snap/lxd/common/lxd/disks# lxc import WebServices
Error: open /var/snap/lxd/common/lxd/disks/WebServices: no such file or directory

I said lxd import not lxc import :slight_smile:

DERP! That worked.

Okay, now we’re back on track! I’ll get these container’s back into the system, properly export them then reload the host. Is LXC import/export still the way to go.

Yeah, lxd import will perform the disaster recovery to re-generate the DB entries, your containers should then work as normal.

If you have multiple systems, doing a lxc copy or lxc move over the network is usually a fair bit faster than export+import but if you need an offline storage solution, then lxc export is the way to go.

Thank you for the help Stéphane!

For what it’s worth: I had exactly this issue. Installed lxd (via snap) on Ubuntu 20.04 (after upgrading it from 18.04). It worked at first, but now I wanted to work with some containers it gave me this error. Processes got stuck. Database has address 0.

I compared the database.pre-migration database, and it looks like the raft_nodes were not properly migrated, ie. changed from the actual IP-adres and port to “0”. If I put that value for the raft_nodes back I get 404 errors.

I have ZFS, I tried doing an import of my containers, but they suddenly appear empty even.

Rather problematic, the upgrade and migration that happened in May?

Yet the fix was to stop lxd (via snap), reset the raft_nodes value back to the IP-address and port it previously had, and start again. On a “live system” that gave an error, but stop/start worked.

I tried the lxd import, but that didn’t work well for me in zfs. I probably had to configure lxd to use ZFS first given the database was cleared out, before doing an import? (I assumed I didn’t need an nsenter.)

Glad I was able to recover, but hope to understand what went wrong and ways to prevent this. Kind of fear the automatic upgrades that snap is doing at the moment.