Problem with copy --refresh of a vm instance to remote LXD instance

Luken · June 13, 2023, 8:31pm

I wrote a simple script for backing up containers on a remote LXD instance. What it does is simple: it stops containers, snapshots them, copies them to remote and starts them again. However, I have a problem with a single VM instance. If it doesn’t exist on the remote, it gets copied just fine. But the next time the script will run, I’m getting:

Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: More recent snapshots must be deleted: [snapshot-2023-06-13_2
2-19-59]

It references the snapshot that was created right before the copy operation. I’m using copy --refresh. What’s going on here? What’s weird is that this issue appears to be affecting only a single VM instance, but it’s a single one on my list. The rest of instances are containers. Why would this happen only to VM? I would expect the same copy behavior for all instances. Do you guys have any idea what’s going on here?

Additional information:
LXD 5.13
ZFS storage.

tomp · June 13, 2023, 9:45pm

Do you experience this with lxd 5.14?

Luken · June 14, 2023, 9:29am

After upgrading to 5.14 and deleting the instance on the backup server, on the first copy I got:

Error: Invalid config: Unknown configuration key: volatile.idmap.next

But it seems to get copied. However, after performing the snapshot and copying again, I’m getting:

Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: Snapshot "snapshot-2023-06-14_04-00-45" cannot 
be restored due to subsequent snapshot(s). Set zfs.remove_snapshots to override

And after setting: volume.zfs.remove_snapshots: "true" on the remote’s pool I’m going back to:

Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: More recent snapshots must be deleted: [snap0]

I got this Set zfs.remove_snapshots to override error before, I just forgot that I set this value, and yesterday I unset it, as an experiment.

BTW. There seems to be something wrong with snap channels:

refresh-date: today at 11:06 CEST
channels:
  latest/stable:    5.14-7072c7b  2023-06-01 (24918) 178MB -
  latest/candidate: 5.14-7072c7b  2023-05-29 (24918) 178MB -
  latest/beta:      ↑                                      
  latest/edge:      git-bdbac88   2023-06-11 (24984) 181MB -
  5.14/stable:      –                                      
  5.14/candidate:   5.14-7072c7b  2023-05-31 (24918) 178MB -
  5.14/beta:        ↑                                      
  5.14/edge:        ↑                                      
  5.13/stable:      5.13-8e2d7eb  2023-05-31 (24846) 174MB -
  5.13/candidate:   ↑                                      
  5.13/beta:        ↑                                      
  5.13/edge:        ↑                                      
  5.0/stable:       5.0.2-838e1b2 2023-01-25 (24322) 117MB -
  5.0/candidate:    5.0.2-838e1b2 2023-01-18 (24322) 117MB -
  5.0/beta:         ↑                                      
  5.0/edge:         git-2a04cf3   2023-04-15 (24732) 118MB -
  4.0/stable:       4.0.9-a29c6f1 2022-12-04 (24061)  96MB -
  4.0/candidate:    4.0.9-a29c6f1 2022-12-02 (24061)  96MB -
  4.0/beta:         ↑                                      
  4.0/edge:         git-407205d   2022-11-22 (23988)  96MB -
  3.0/stable:       3.0.4         2019-10-10 (11348)  55MB -
  3.0/candidate:    3.0.4         2019-10-10 (11348)  55MB -
  3.0/beta:         ↑                                      
  3.0/edge:         git-81b81b9   2019-10-10 (11362)  55MB -
installed:          5.14-7072c7b             (24918) 178MB -

Notice that 5.14/stable is empty, but it’s latest/stable points into 5.14. I had to use candidate. That’s why I didn’t upgrade to 5.14, I didn’t realize that it’s available because 5.14/stable was empy.

tomp · June 14, 2023, 9:33am

The volatile issue will be fixed in LXD 5.15 with

I’ll try and get a reproducer going.

tomp · June 15, 2023, 7:00am

Please can you show sudo zfs list -t all on both source and target machines so we can see the state of the problem VM. What is its name btw?

Luken · June 15, 2023, 9:16am

I created a new instance, with minimal setup, and minimal amount of snapshot, just to test if it would still happen, and it does.

snapshot-2023-06-15_04-01-33 was done by my backup script.
snap0 I have just done to test lxc copy --refresh. The problem persists.

Main server:

$ sudo zfs list -t all | grep mailcow-luken
lxdpool/virtual-machines/mailcow-luken                                                                  14.8M  85.2M     13.7M  legacy
lxdpool/virtual-machines/mailcow-luken@snapshot-snapshot-2023-06-15_04-01-33                             552K      -     13.7M  -
lxdpool/virtual-machines/mailcow-luken@snapshot-snap0                                                    556K      -     13.7M  -
lxdpool/virtual-machines/mailcow-luken.block                                                            10.5G  1.47T     13.6G  -
lxdpool/virtual-machines/mailcow-luken.block@snapshot-snapshot-2023-06-15_04-01-33                       334M      -     13.6G  -
lxdpool/virtual-machines/mailcow-luken.block@snapshot-snap0                                             50.6M      -     13.6G  -

Backup server:

$ sudo zfs list -t all | grep mailcow-luken
rpool/virtual-machines/hypervisor_mailcow-luken                                              14.3M  85.7M     13.7M  legacy
rpool/virtual-machines/hypervisor_mailcow-luken@snapshot-snapshot-2023-06-15_04-01-33         552K      -     13.7M  -
rpool/virtual-machines/hypervisor_mailcow-luken@snapshot-snap0                               25.5K      -     13.7M  -
rpool/virtual-machines/hypervisor_mailcow-luken.block                                        13.6G  1.48T     13.6G  -
rpool/virtual-machines/hypervisor_mailcow-luken.block@snapshot-snapshot-2023-06-15_04-01-33     0B      -     13.6G  -

Is that enough?

tomp · June 19, 2023, 8:50am

Thanks I’ve assigned this to myself to investigate once the edge snap vms are working again.

tomp · June 26, 2023, 3:52pm

Hi,

Are you able to log an issue over at https://github.com/lxc/lxd/issues with the reproducer steps you’ve identified here, so we don’t lose track of it.

Thanks