Lxd.migrate results in no containers present after "successful" migration

I attempted to migrate from the LXD apt package (version 3.0.3) to the LXD 4.0.9 snap. After I had both installed, I ran lxd.migrate and it appeared to work, but after snap.lxd.daemon was restarted (with the migrated database), it failed to start due to this error. I eventually was able to get it working, but had to downgrade the snap version to 3.0.4 in order to successfully run the SQL query to delete the “bad” snapshot. It appears that lxd.migrate saw this 3.0.4 version and thought the migration was complete:

 $ sudo lxd.migrate
=> Connecting to source server
=> Connecting to destination server
=> Running sanity checks

=== Source server
LXD version: 3.0.3
LXD PID: 21635
  Containers: 69
  Images: 3
  Networks: 0
  Storage pools: 1

=== Destination server
LXD version: 4.0.9
LXD PID: 9283
  Containers: 0
  Images: 0
  Networks: 0
  Storage pools: 0

The migration process will shut down all your containers then move your data to the destination LXD.
Once the data is moved, the destination LXD will start and apply any needed updates.
And finally your containers will be brought back to their previous state, completing the migration.

Are you ready to proceed (yes/no) [default=no]? yes
=> Shutting down the source LXD
=> Stopping the source LXD units
=> Stopping the destination LXD unit
=> Unmounting source LXD paths
=> Unmounting destination LXD paths
=> Wiping destination LXD clean
=> Backing up the database
=> Moving the data
=> Updating the storage backends
=> Starting the destination LXD
=> Waiting for LXD to come online

=== Destination server
LXD version: 3.0.4
LXD PID: 17975
  Containers: 0
  Images: 0
  Networks: 0
  Storage pools: 0

The migration is now complete and your containers should be back online.
Do you want to uninstall the old LXD (yes/no) [default=yes]? yes

All done. You may need to close your current shell and open a new one to have the "lxc" command work.
To migrate your existing client configuration, move ~/.config/lxc to ~/snap/lxd/common/config

However now after the migration, /snap/bin/lxc list shows no containers! Help! How can I restore all of my containers? They are stored on a zpool, so all of the data is there it is just the global database which seems to be empty. The storage pool used to look something like this:

 # lxc storage list
| mypool | zfs    | storage |             | 179     | CREATED |

It’s worth noting that the above process appears to have emptied out /var/lib/lxd (the apt package is no longer present) so I can’t recover a copy of the database from there. There do appear to be some larger files here - I’m not sure if one of those is a opaquely-named database backup:

 # ls -l /var/snap/lxd/common/lxd/database/global
total 25744
-rw------- 1 root root      64 Apr  6 21:20 0000000000000001-0000000000000001
-rw------- 1 root root  550952 Apr  6 21:21 0000000000000002-0000000000000015
-rw------- 1 root root  123992 Apr  6 21:38 0000000000000016-0000000000000027
-rw------- 1 root root  123992 Apr  6 21:39 0000000000000028-0000000000000039
-rw------- 1 root root  380928 Apr  6 21:39 db.bin
-rw------- 1 root root      32 Apr  6 21:20 metadata1
-rw------- 1 root root 8388608 Apr  6 22:03 open-1
-rw------- 1 root root 8388608 Apr  6 22:03 open-2
-rw------- 1 root root 8388608 Apr  6 22:03 open-3

Can you show lxc project list and lxc storage show mypool?

There’s something weird with your migration output too.
It starts by showing a 3.0.3 to 4.0.9 migration but then shows the destination as 3.0.4.

# /snap/bin/lxc project list
| default (current) | YES    | YES      | YES             | 1       |
 # /snap/bin/lxc storage show mypool
Error: Not Found

I had the 4.0.9 snap installed when I started running lxd.migrate, but it hung for awhile at the Waiting for LXD to come online step. I ran systemctl status snap.lxd.migrate in another window to see what was going on and saw that it had failed to start. I ran journalctl -u snap.lxd.daemon and saw that it had failed on an error similar to the one seen here. I then tried following the guidance in that topic to fix it:

Write a patch.global.sql file in /var/snap/lxd/common/lxd/database containing:

DELETE FROM containers WHERE name="mytest/snap0";

However the above didn’t work; the “bad” snapshot was still present (even after running systemctl restart snap.lxd.daemon several times). Something I saw in another post made me think it was failing to remove the snapshot from the database because the version was too new, so I ran snap refresh lxd --channel=3.0/stable and attempted the same above steps to remove the “bad” snapshot. This time it worked and then I was able to run snap refresh lxd --channel=4.0/stable to get back to the desired version and successfully start snap.lxd.daemon. However, at this point when I went back to the lxd.migrate window, it had finished with the output you see.

Hmm, so I’m afraid I have absolutely no idea what state your system may be in then.
LXD never supports downgrades and you effectively downgraded your LXD in the middle of a database and data migration…

LXD 3.0 and LXD 4.0 use a different database format, so depending on the exact timing of things, some data may have gotten converted or you may have gotten a blank state.

If you have a backup of your /var/lib/lxd prior to the switch to the snap, I’d recommend you remove the snap, restore that backup, get the LXD 3.0.3 deb back online and then attempt the migration to the snap again, maybe this time starting with the 3.0.4 snap, then upgrading from there to 4.0.9.

If you don’t. You can make a tarball of /var/snap/lxd/common/lxd/database and send me that to stgraber at ubuntu dot com and I’ll see if any data made it over at all. If not, you’ll probably need to wipe the snap, install a fresh copy of 4.0.9 and then use lxd recover to import your instances from your zfs pool, but importing 3.0.x instances can be a bit tricky (there’s another forum topic going over some of those issues).

Unfortunately I don’t have a backup of /var/lib/lxd so I’ll need to use lxd recover. I’m working on it now but I seem to need to manually set the mountpoint for each dataset or I get the following error:

Error: Failed validation request: Failed checking volumes on pool "mypool": Instance "container1" in project "default" has a different instance type in its backup file ("")

Thank you for the help and guidance in getting this resolved. For reference, these are the steps I followed to recover these LXD 3.0.x containers onto a fresh LXD 5.0.x system (from the pre-existing zpool):

  • mount the container’s dataset, edit backup.yaml to include type: container in the container section, near the existing status_code key (this was missing from the LXD 3.0.x container config)
  • umount dataset and ensure it has no mountpoint set with zfs inherit mountpoint mypool/containers/$container
  • run lxc profile show default and ensure it matches another existing system (particularly the networking settings)
  • run lxc network list and if necessary run lxc network create lxdbr0 to ensure lxdbr0 exists
  • run lxd recover and follow the prompts to import the containers; for the KEY=VALUE pairs, you will likely need to specify zfs.pool_name=mypool
  • you should now see the containers in lxc list and be able to start them