Stateful snapshot not working for Focal vms

Hi,

I open this topic to continue the discussion opened in Bug #1975736 “stateful snapshot not working " : Bugs : lxd package : Ubuntu.

Statefull snapshot does not work for focal vms :

$ lxc launch ubuntu:focal f-vm --vm
$ lxc snapshot --stateful f-vm
Error: Migration is disabled when VirtFS export path '/var/lib/lxd/devices/f-vm/config.mount' is mounted in the guest using mount_tag 'config'

In the lp bug Stefan explained that this is due to an outdated LXD agent setup which is keeping the 9p mount active and preventing the stateful snapshot and suggested to manually install lxd-agent and reboot.
It worked.

However, this is not a one time thing. It is reproducible every time.
In addition we have users that run 100+ vms and manually installing and rebooting is not ideal.
This also affects Groovy, Hirsute and Impish vms.
Jammy works fine.

I am not sure this is an lxd bug, or at least not only lxd, I suspect systemd plays a role here.

While debbuging this I noticed that when launching a focal vm, the vm reboots, this does not happend in jammy.
Is this expected ?

Looking at the logs of the focal vm, the vm starts, systemd reports starting LXD agent and 9p mount :

Jun 28 16:32:15 ubuntu systemd[1]: Starting LXD - agent - 9p mount...
...
Jun 28 16:32:15 ubuntu systemd[1]: Finished LXD - agent - 9p mount.
Jun 28 16:32:15 ubuntu systemd[1]: Started LXD - agent.

and then

Jun 28 16:32:15 ubuntu systemd[1]: Started Network Time Synchronization.
Jun 28 16:32:15 ubuntu systemd[1]: Reached target System Time Set.
Jun 28 16:32:15 ubuntu systemd[1]: Reached target System Time Synchronized.
Jun 28 16:32:15 f-vm systemd[1]: Requested transaction contradicts existing jobs: Transaction for systemd-networkd.service/start is destructive (reboot.target has 'start' job queued, but 'stop' is included in transaction).
Jun 28 16:32:15 f-vm systemd[1]: systemd-networkd.socket: Failed to queue service startup job (Maybe the service file is missing or not a non-template unit?): Transaction for systemd-networkd.service/start is destructive (reboot.target has 'start' job queued, but 'stop' is included in transaction).>
Jun 28 16:32:15 f-vm systemd[1]: systemd-networkd.socket: Failed with result 'resources'.

After this, systemd stop the services and reboots.
While stopping the services I can see :

Jun 28 16:32:17 f-vm umount[779]: umount: /run/lxd_config/9p: target is busy.
Jun 28 16:32:17 f-vm systemd[1]: Unmounting Mount unit for lxd, revision 22753...
Jun 28 16:32:17 f-vm systemd[1]: Unmounting Mount unit for snapd, revision 16010...
Jun 28 16:32:17 f-vm systemd[1]: boot-efi.mount: Succeeded.
Jun 28 16:32:17 f-vm systemd[1]: Unmounted /boot/efi.
Jun 28 16:32:17 f-vm systemd[1]: run-lxd_config-9p.mount: Mount process exited, code=exited, status=32/n/a
Jun 28 16:32:17 f-vm systemd[1]: Failed unmounting /run/lxd_config/9p.

I guess this is why 9p mount is kept alive.
IIUC, manually executing the install.sh script clears the mount, removes the lxd-agent-9p.service and leaves the lxd-agent.service to run the lxd-agent.

What does happen in normal case? Who should clear 9p mount?

In jammy logs I don’t get much information about lxd-agent related services, only this line:

Jun 28 16:39:32 ubuntu systemd[1]: Condition check resulted in LXD - agent being skipped.

Any help is much appreciated!

Have you tried using the lxd focal image?

“images:ubuntu/focal”?

I’ve tried launching a vm from “images:ubuntu/focal” and it works fine!
What’s the difference between images:ubuntu/focal and ubuntu:focal images ?

The LXD team maintain them and so we can ensure the lxd-agent loading logic is up to date.