Thanks for the write up! 
So going back to your original issue with lxd.migrate
, you said it hung after failing to update the ZFS storage pool mount point. This is using LXD 4.0.9 and so LXD still set a mount point then.
The error suggests that when LXD ran zfs set mountpoint=/var/lib/snapd/hostfs/pogo1 pogo1
under the hood ZFS tried to unmount the old mount point /var/lib/snapd/hostfs/pogo1
but that failed because the mount was still in use.
It then failed to create all of the other mount points below that.
=> Updating the storage backends
error: Failed to update the storage pools: Failed to run: nsenter --mount=/run/snapd/ns/lxd.mnt zfs set mountpoint=/var/lib/snapd/hostfs/pogo1 pogo1: umount: /var/lib/snapd/hostfs/pogo1: target is busy.
cannot unmount '/var/lib/snapd/hostfs/pogo1': umount failed
cannot mount '/pogo1/backup': failed to create mountpoint
cannot mount '/pogo1/containers': failed to create mountpoint
cannot mount '/pogo1/deleted': failed to create mountpoint
cannot mount '/pogo1/deleted/images': failed to create mountpoint
cannot mount '/pogo1/gateway2': failed to create mountpoint
cannot mount '/pogo1/gateway3': failed to create mountpoint
cannot mount '/pogo1/home': failed to create mountpoint
cannot mount '/pogo1/images': failed to create mountpoint
cannot mount '/pogo1/kfse_backups': failed to create mountpoint
cannot mount '/pogo1/kms_backup': failed to create mountpoint
cannot mount '/pogo1/oldpooh2': failed to create mountpoint
cannot mount '/pogo1/psql-attempt1': failed to create mountpoint
cannot mount '/pogo1/snapshots': failed to create mountpoint
cannot mount '/pogo1/snapshots/dc3': failed to create mountpoint
cannot mount '/pogo1/snapshots/fs3': failed to create mountpoint
cannot mount '/pogo1/snapshots/samba': failed to create mountpoint
property may be set but unable to remount filesystem
Now lxd.migrate
shouldn’t hang like that in that situation. But the root cause of the problem is that it appears that ZFS and snap has got their mount tables confused again and something is still holding the mount open. All of the instances should have been stopped by then, so potentially it was something else holding it open.
So my recommendation, especially if using ZFS, would be to reboot your machine and stop all instances before running lxd.migrate
to ensure there is a clean mount table and no processes holding it open.
I suspect if that had been done rather than removing LXD’s via apt then lxd.migrate
could have been re-run successfully.
I just tried this in a fresh Ubuntu Bionic VM going from LXD 3.0.3 deb to LXD 4.0.9 snap with a single running container on the ZFS pool
Before:
zfs list
NAME USED AVAIL REFER MOUNTPOINT
zfs 226M 13.2G 24K none
zfs/containers 2.88M 13.2G 24K none
zfs/containers/c1 2.85M 13.2G 222M /var/lib/lxd/storage-pools/zfs/containers/c1
zfs/custom 24K 13.2G 24K none
zfs/deleted 24K 13.2G 24K none
zfs/images 222M 13.2G 24K none
zfs/images/afba58aa16219124c4da851b91bd59f012ea955b982961bad7218afdabf6e89e 222M 13.2G 222M none
zfs/snapshots 24K 13.2G 24K none
root@v1:/# lxd.migrate
=> Connecting to source server
=> Connecting to destination server
=> Running sanity checks
=== Source server
LXD version: 3.0.3
LXD PID: 2171
Resources:
Containers: 1
Images: 1
Networks: 1
Storage pools: 2
=== Destination server
LXD version: 4.0.9
LXD PID: 12832
Resources:
Containers: 0
Images: 0
Networks: 0
Storage pools: 0
The migration process will shut down all your containers then move your data to the destination LXD.
Once the data is moved, the destination LXD will start and apply any needed updates.
And finally your containers will be brought back to their previous state, completing the migration.
Are you ready to proceed (yes/no) [default=no]? y
=> Shutting down the source LXD
=> Stopping the source LXD units
=> Stopping the destination LXD unit
=> Unmounting source LXD paths
=> Unmounting destination LXD paths
=> Wiping destination LXD clean
=> Backing up the database
=> Moving the data
=> Updating the storage backends
=> Starting the destination LXD
=> Waiting for LXD to come online
=== Destination server
LXD version: 4.0.9
LXD PID: 13262
Resources:
Containers: 1
Images: 1
Networks: 1
Storage pools: 2
The migration is now complete and your containers should be back online.
Do you want to uninstall the old LXD (yes/no) [default=yes]? yes
All done. You may need to close your current shell and open a new one to have the "lxc" command work.
To migrate your existing client configuration, move ~/.config/lxc to ~/snap/lxd/common/config
After:
root@v1:/# zfs list
NAME USED AVAIL REFER MOUNTPOINT
zfs 226M 13.2G 24K none
zfs/containers 3.03M 13.2G 24K none
zfs/containers/c1 3.00M 13.2G 223M none
zfs/custom 24K 13.2G 24K none
zfs/deleted 120K 13.2G 24K none
zfs/deleted/containers 24K 13.2G 24K none
zfs/deleted/custom 24K 13.2G 24K none
zfs/deleted/images 24K 13.2G 24K none
zfs/deleted/virtual-machines 24K 13.2G 24K none
zfs/images 222M 13.2G 24K none
zfs/images/afba58aa16219124c4da851b91bd59f012ea955b982961bad7218afdabf6e89e 222M 13.2G 222M none
zfs/snapshots 24K 13.2G 24K none
zfs/virtual-machines 24K 13.2G 24K none
I then went from LXD 4.0.9 snap to LXD 5.6 snap:
snap refresh lxd --channel=latest/stable
After:
root@v1:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
zfs 226M 13.2G 24K legacy
zfs/buckets 24K 13.2G 24K legacy
zfs/containers 3.03M 13.2G 24K legacy
zfs/containers/c1 3.00M 13.2G 223M none
zfs/custom 24K 13.2G 24K legacy
zfs/deleted 144K 13.2G 24K legacy
zfs/deleted/buckets 24K 13.2G 24K legacy
zfs/deleted/containers 24K 13.2G 24K legacy
zfs/deleted/custom 24K 13.2G 24K legacy
zfs/deleted/images 24K 13.2G 24K legacy
zfs/deleted/virtual-machines 24K 13.2G 24K legacy
zfs/images 222M 13.2G 24K legacy
zfs/images/afba58aa16219124c4da851b91bd59f012ea955b982961bad7218afdabf6e89e 222M 13.2G 222M none
zfs/snapshots 24K 13.2G 24K none
zfs/virtual-machines 24K 13.2G 24K legacy
And there we can see the legacy mountpoint has been applied (to stop ZFS from controlling the mount points).
Running instance’s dont have their mountpoint set to legacy until next restart.