I wrote a simple script for backing up containers on a remote LXD instance. What it does is simple: it stops containers, snapshots them, copies them to remote and starts them again. However, I have a problem with a single VM instance. If it doesn’t exist on the remote, it gets copied just fine. But the next time the script will run, I’m getting:
Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: More recent snapshots must be deleted: [snapshot-2023-06-13_2
It references the snapshot that was created right before the copy operation. I’m using
copy --refresh. What’s going on here? What’s weird is that this issue appears to be affecting only a single VM instance, but it’s a single one on my list. The rest of instances are containers. Why would this happen only to VM? I would expect the same copy behavior for all instances. Do you guys have any idea what’s going on here?
Do you experience this with lxd 5.14?
After upgrading to 5.14 and deleting the instance on the backup server, on the first copy I got:
Error: Invalid config: Unknown configuration key: volatile.idmap.next
But it seems to get copied. However, after performing the snapshot and copying again, I’m getting:
Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: Snapshot "snapshot-2023-06-14_04-00-45" cannot
be restored due to subsequent snapshot(s). Set zfs.remove_snapshots to override
And after setting:
volume.zfs.remove_snapshots: "true" on the remote’s pool I’m going back to:
Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: More recent snapshots must be deleted: [snap0]
I got this
Set zfs.remove_snapshots to override error before, I just forgot that I set this value, and yesterday I unset it, as an experiment.
BTW. There seems to be something wrong with snap channels:
refresh-date: today at 11:06 CEST
latest/stable: 5.14-7072c7b 2023-06-01 (24918) 178MB -
latest/candidate: 5.14-7072c7b 2023-05-29 (24918) 178MB -
latest/edge: git-bdbac88 2023-06-11 (24984) 181MB -
5.14/candidate: 5.14-7072c7b 2023-05-31 (24918) 178MB -
5.13/stable: 5.13-8e2d7eb 2023-05-31 (24846) 174MB -
5.0/stable: 5.0.2-838e1b2 2023-01-25 (24322) 117MB -
5.0/candidate: 5.0.2-838e1b2 2023-01-18 (24322) 117MB -
5.0/edge: git-2a04cf3 2023-04-15 (24732) 118MB -
4.0/stable: 4.0.9-a29c6f1 2022-12-04 (24061) 96MB -
4.0/candidate: 4.0.9-a29c6f1 2022-12-02 (24061) 96MB -
4.0/edge: git-407205d 2022-11-22 (23988) 96MB -
3.0/stable: 3.0.4 2019-10-10 (11348) 55MB -
3.0/candidate: 3.0.4 2019-10-10 (11348) 55MB -
3.0/edge: git-81b81b9 2019-10-10 (11362) 55MB -
installed: 5.14-7072c7b (24918) 178MB -
5.14/stable is empty, but it’s
latest/stable points into 5.14. I had to use candidate. That’s why I didn’t upgrade to 5.14, I didn’t realize that it’s available because
5.14/stable was empy.
The volatile issue will be fixed in LXD 5.15 with
I’ll try and get a reproducer going.
Please can you show
sudo zfs list -t all on both source and target machines so we can see the state of the problem VM. What is its name btw?
I created a new instance, with minimal setup, and minimal amount of snapshot, just to test if it would still happen, and it does.
snapshot-2023-06-15_04-01-33 was done by my backup script.
snap0 I have just done to test
lxc copy --refresh. The problem persists.
$ sudo zfs list -t all | grep mailcow-luken
lxdpool/virtual-machines/mailcow-luken 14.8M 85.2M 13.7M legacy
lxdpool/virtual-machines/mailcow-luken@snapshot-snapshot-2023-06-15_04-01-33 552K - 13.7M -
lxdpool/virtual-machines/mailcow-luken@snapshot-snap0 556K - 13.7M -
lxdpool/virtual-machines/mailcow-luken.block 10.5G 1.47T 13.6G -
lxdpool/virtual-machines/mailcow-luken.block@snapshot-snapshot-2023-06-15_04-01-33 334M - 13.6G -
lxdpool/virtual-machines/mailcow-luken.block@snapshot-snap0 50.6M - 13.6G -
$ sudo zfs list -t all | grep mailcow-luken
rpool/virtual-machines/hypervisor_mailcow-luken 14.3M 85.7M 13.7M legacy
rpool/virtual-machines/hypervisor_mailcow-luken@snapshot-snapshot-2023-06-15_04-01-33 552K - 13.7M -
rpool/virtual-machines/hypervisor_mailcow-luken@snapshot-snap0 25.5K - 13.7M -
rpool/virtual-machines/hypervisor_mailcow-luken.block 13.6G 1.48T 13.6G -
rpool/virtual-machines/hypervisor_mailcow-luken.block@snapshot-snapshot-2023-06-15_04-01-33 0B - 13.6G -
Is that enough?
Thanks I’ve assigned this to myself to investigate once the edge snap vms are working again.
Are you able to log an issue over at https://github.com/lxc/lxd/issues with the reproducer steps you’ve identified here, so we don’t lose track of it.