Recovery from Errant lxd init

We have an LXD host that finally got upgraded from 16.04 to 18.04. In the process of this upgrade we realized that the old Apt version of LXD was still present. I was able to uninstall the Apt packages and refresh the Snap package and now the client tools and such seem to be fine.

However, unfortunately I did not discover this was the issue until after I had re-ran lxd init. Now I have an intact zpool but an LXD application that knows nothing of its previous self.
lxd recover fails with: Error: Failed validation request: Failed checking volumes on pool "default": Instance "containername" in project "default" has a different instance type in its backup file ("") and the zfs mountpoint for that container shows legacy. I can temporarily manually set the container dataset mountpoint to a known location and zfs mount and am able to view backup.yaml.

I am pretty sure my re-execution of lxd init was a bad idea but would like to know how I can recover to a working state. Thanks.

The problem is that you should have run lxd.migrate when you had both the deb and snap, that would have seamlessly transferred the data over.

The fact that you ran lxd init on the new install doesn’t really change anything, it’s just a clean empty LXD.

I think we could make LXD a bit smarter in lxd recover so that if no type is set, we assume it is containers. @tomp

In the meantime, you could go and edit the backup.yaml by hand to have a type: container inside of the container section. This may be enough to make lxd recover happy.

Thank you, Stéphane. That seems to get me past the first pass but now lxd recover fails with: snapshot inconsistency: Snapshot count in backup config and storage device are different: Backup snapshots mismatch.

I have removed the snapshot data in backup.yaml and see the snapshots on the filesystem. I do not care about recovering the snapshot on this or any other container. Can I just delete the snapshots with zfs?

Yeah, that should be fine, it just wants the two to line up.

With snapshots cleaned up I am seeing a new error.

$ sudo lxd recover
This LXD server currently has the following storage pools:
Would you like to recover another storage pool? (yes/no) [default=no]: yes
Name of the storage pool: default
Name of the storage backend (cephfs, dir, lvm, zfs, ceph, btrfs): zfs
Source of the storage pool (block device, volume group, dataset, path, ... as applicable): tank/lxd
Additional storage pool configuration property (KEY=VALUE, empty when done): zfs.pool_name=tank/lxd
Additional storage pool configuration property (KEY=VALUE, empty when done):
Would you like to recover another storage pool? (yes/no) [default=no]:
The recovery process will be scanning the following storage pools:
 - NEW: "default" (backend="zfs", source="tank/lxd")
Would you like to continue with scanning for lost volumes? (yes/no) [default=yes]:
Scanning for unknown volumes...
Error: Failed validation request: Post "http://unix.socket/internal/recover/validate": EOF

Hmm, that suggests LXD crashed…

Can you check journalctl -u snap.lxd.daemon -n 300 for some kind of stack trace or error?

Not a single syslog entry for that unit when running recover and receive the same result. I did snap restart lxd beforehand just to be sure. There are a few entries from the zed daemon during the recover execution but that’s it (other than the sudo entries, of course).

To close the loop on this, I was never able to get recover to work with the existing storage pool. I ended up removing/installing/initializing LXD with a new dataset. A zfs send/receive of the individual container datasets from the old to the new pool allowed lxd recover to run correctly.