Minor issue moving VMs between hosts (non-clustered)

candlerb · June 20, 2024, 9:04pm

Copying a stopped VM (not container) from one standalone host to another, where the source host is 6.0-202404270902-ubuntu20.04 and the destination is 6.0-202405282244-ubuntu22.04

% incus stop nuc1:jool && incus move nuc1:jool nuc3: && incus start nuc3:jool
Error: Failed to delete original instance after copying it: Failed deleting instance "jool" in project "default": Error deleting storage volume: Failed to remove '/var/lib/incus/storage-pools/default/virtual-machines/jool': remove /var/lib/incus/storage-pools/default/virtual-machines/jool: directory not empty
%

Checking on the source host:

root@nuc1:~# ls /var/lib/incus/storage-pools/default/virtual-machines/jool
agent-client.crt  agent-client.key  agent.crt  agent.key

The VM was able to start on the target machine, and it recreated those files by itself:

root@nuc3:~# ls /var/lib/incus/storage-pools/default/virtual-machines/jool/
agent-client.crt  agent-client.key  agent.crt  agent.key  backup.yaml  config  metadata.yaml  OVMF_VARS.4MB.ms.fd  qemu.nvram

Furthermore, incus delete jool on the source machine cleaned it up. So it’s not really a major problem, but it was a bit annoying to get an error message and a non-zero exit status. An opportunity to tidy up a bit better, perhaps? Or has this already been fixed in a more recent 6.0?

stgraber · June 21, 2024, 3:57am

This most likely shows a bug that could have happened a long time ago on your system.

Basically VMs have two volumes, a block volume (the disk) and a config volume that stores metadata files. That second volume is what you’re seeing containing those agent files and extra metadata.

Now when the volume isn’t mounted, the path should be empty. The error you’re getting indicates that even once unmounted, the path contained files. That would indicate that at some point Incus (or possibly LXD if it’s a system which was converted) will have had its config volume unmounted and then created files directly on the system’s root disk instead, creating this issue.

Now if this is something you can reproduce by creating a new VM on your source system and then moving it on the target system, that’d indicate that such a bug is still present and we should be able to track it down and fix it, but my bet would be on something more complex as the common case should be covered in our tests.

candlerb · June 21, 2024, 6:01am

Yes that sounds reasonable. The source machine was converted from lxd, although the VM itself wasn’t created that long ago and I’m not sure if it had been converted or not at the time.