Incus copy --refresh fails for migrated ubuntu VMs

incus 0.5.1
storage: zfs

I do backups of (container) instances to another server by running copy --refresh.

Recently I also added some Ubuntu VMs which have been migrated from VMware using incus-migrate. These VMs are working fine so far, including the snapshots created with snapshots.schedule. Now these VMs fail with a subsequent copy --refresh

Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: Snapshot "snapshot-27" cannot be restored due to subsequent snapshot(s). Set zfs.remove_snapshots to override

latest snapshot on source is snapshot-29 and latest snapshot on target is snapshot-27
Interestingly such a vm has two devices having different last snapshots (27 vs. 28) on the target:

zfs list -t snapshot zpool1/virtual-machines/dc1 zpool1/virtual-machines/dc1.block

zpool1/virtual-machines/dc1@snapshot-snapshot-26         104K      -     7.25M  -
zpool1/virtual-machines/dc1@snapshot-snapshot-27         104K      -     7.25M  -
zpool1/virtual-machines/dc1@snapshot-snapshot-28         104K      -     7.25M  -
zpool1/virtual-machines/dc1.block@snapshot-snapshot-26  4.48M      -     8.52G  -
zpool1/virtual-machines/dc1.block@snapshot-snapshot-27     0B      -     8.52G  -

Is this a known/expected behavior?
Are VM snapshots not supported for copy --refresh or is this a special feature of a migrated VM?

Maybe I need to install/configure incus-agent in these VMs to get this working?

When I create a new VM from images I don’t see these issues.

incus launch images:ubuntu/22.04/cloud test-vm

I found others, having similar issues using lxd 5.1 like Problem with copy --refresh of a vm instance to remote LXD instance but without a solution

The two ZFS volumes is normal for VMs, but them not having snapshots be in sync, that part isn’t normal. Can you check what incus snapshot list dc1 shows on the target as well as incus storage volume list zpool1 al on that target?

Basically trying to figure out what Incus thinks is there, then we can make sure that reality lines up and see if that sorts it out.

thanks for taking a look into this.

One additional info: the first copy attempt failed because a source path for a disk device at the target was missing. This has been fixed before the second copy attempt.

On source incus:

[snip of full list]
+-------------+----------------------+----------------------+----------+
| snapshot-26 | 2024/01/29 07:59 CET | 2024/02/28 07:59 CET | NO       |
+-------------+----------------------+----------------------+----------+
| snapshot-27 | 2024/01/29 11:59 CET | 2024/02/28 11:59 CET | NO       |
+-------------+----------------------+----------------------+----------+
| snapshot-28 | 2024/01/29 15:59 CET | 2024/02/28 15:59 CET | NO       |
+-------------+----------------------+----------------------+----------+
| snapshot-29 | 2024/01/29 19:59 CET | 2024/02/28 19:59 CET | NO       |
+-------------+----------------------+----------------------+----------+
| snapshot-30 | 2024/01/29 23:59 CET | 2024/02/28 23:59 CET | NO       |
+-------------+----------------------+----------------------+----------+

zfs list -t snapshot ssd1/virtual-machines/dc1 ssd1/virtual-machines/dc1.block

ssd1/virtual-machines/dc1@snapshot-snapshot-26         104K      -     7.25M  -
ssd1/virtual-machines/dc1@snapshot-snapshot-27         104K      -     7.25M  -
ssd1/virtual-machines/dc1@snapshot-snapshot-28         104K      -     7.25M  -
ssd1/virtual-machines/dc1@snapshot-snapshot-29         108K      -     7.25M  -
ssd1/virtual-machines/dc1@snapshot-snapshot-30         108K      -     7.25M  -

ssd1/virtual-machines/dc1@snapshot-snapshot-26         104K      -     7.25M  -
ssd1/virtual-machines/dc1@snapshot-snapshot-27         104K      -     7.25M  -
ssd1/virtual-machines/dc1@snapshot-snapshot-28         104K      -     7.25M  -
ssd1/virtual-machines/dc1@snapshot-snapshot-29         108K      -     7.25M  -
ssd1/virtual-machines/dc1@snapshot-snapshot-30         108K      -     7.25M  -

on target incus:

incus snapshot list dc1

[snip]
+-------------+----------------------+----------------------+----------+
| snapshot-26 | 2024/01/29 06:59 UTC | 2024/02/28 06:59 UTC | NO       |
+-------------+----------------------+----------------------+----------+
| snapshot-27 | 2024/01/29 10:59 UTC | 2024/02/28 10:59 UTC | NO       |
+-------------+----------------------+----------------------+----------+
zfs list -t snapshot zpool1/virtual-machines/dc1 zpool1/virtual-machines/dc1.block

zpool1/virtual-machines/dc1@snapshot-snapshot-26         104K      -     7.25M  -
zpool1/virtual-machines/dc1@snapshot-snapshot-27         104K      -     7.25M  -
zpool1/virtual-machines/dc1@snapshot-snapshot-28         104K      -     7.25M  -
zpool1/virtual-machines/dc1@snapshot-snapshot-29         108K      -     7.25M  -

zpool1/virtual-machines/dc1.block@snapshot-snapshot-26  4.48M      -     8.52G  -
zpool1/virtual-machines/dc1.block@snapshot-snapshot-27     0B      -     8.52G  -

I will try to delete / copy one of these VMs from scratch having the required paths for disk devices in place.

btw. It would be nice to have a force feature on copy which should not fail to be able to fix config / devices after copy.

incus storage volume list zpool1|grep dc
| virtual-machine            | dc1                     |             | block        | 1       |
[snap]
| virtual-machine (snapshot) | dc1/snapshot-26         |             | block        | 0       |
| virtual-machine (snapshot) | dc1/snapshot-27         |             | block        | 0       |

Okay, you didn’t show the incus storage volume list piece, but assuming it lines up, you could try:

zfs destroy zpool1/virtual-machines/dc1@snapshot-snapshot-29
zfs destroy zpool1/virtual-machines/dc1@snapshot-snapshot-28

That should bring the target back to something consistent and so hopefully another refresh will then properly figure out what needs to be transferred.

very strange: even if I remove these two snapshots the issue persists:

incus copy lxc04:dc1 dc1 --refresh
Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed creating instance on target: Snapshot "snapshot-27" cannot be restored due to subsequent snapshot(s). Set zfs.remove_snapshots to override
zfs list -t snapshot zpool1/virtual-machines/dc1.block

[snip]
zpool1/virtual-machines/dc1.block@snapshot-snapshot-25  7.21M      -     8.31G  -
zpool1/virtual-machines/dc1.block@snapshot-snapshot-26  4.48M      -     8.52G  -
zpool1/virtual-machines/dc1.block@snapshot-snapshot-27     0B      -     8.52G  -

incus snapshot list dc1

[snip]
+-------------+----------------------+----------------------+----------+
| snapshot-25 | 2024/01/29 03:59 CET | 2024/02/28 03:59 CET | NO       |
+-------------+----------------------+----------------------+----------+
| snapshot-26 | 2024/01/29 07:59 CET | 2024/02/28 07:59 CET | NO       |
+-------------+----------------------+----------------------+----------+
| snapshot-27 | 2024/01/29 11:59 CET | 2024/02/28 11:59 CET | NO       |
+-------------+----------------------+----------------------+----------+

incus storage volume list zpool1|grep dc

[snip]
| virtual-machine (snapshot) | dc1/snapshot-25         |             | block        | 0       |
| virtual-machine (snapshot) | dc1/snapshot-26         |             | block        | 0       |
| virtual-machine (snapshot) | dc1/snapshot-27         |             | block        | 0       |

In debug I see:

metadata:
  context:
    args: '{IndexHeaderVersion:1 Name:dc1 Description: Config:map[] Snapshots:[snapshot-28
      snapshot-29 snapshot-30 snapshot-31 snapshot-32] MigrationType:{FSType:ZFS Features:[migration_header
      compress]} TrackProgress:true Refresh:true Live:false VolumeSize:17301504000
      ContentType: VolumeOnly:false ClusterMoveSourceName:}'
    driver: zfs
    instance: dc1
    pool: zpool1
    project: default
  level: debug
  message: CreateInstanceFromMigration finished

metadata:
  context:
    action: create
    err: <nil>
    instance: dc1/snapshot-32
    project: default
    reusable: "false"
  level: debug
  message: Instance operation lock finished

metadata:
  context:
    action: create
    err: <nil>
    instance: dc1/snapshot-31
    project: default
    reusable: "false"
  level: debug
  message: Instance operation lock finished

metadata:
  context:
    action: create
    err: <nil>
    instance: dc1/snapshot-28
    project: default
    reusable: "false"
  level: debug
  message: Instance operation lock finished

metadata:
  context:
    instance: dc1
    instanceType: virtual-machine
    project: default
  level: debug
  message: Migrate receive transfer finished

metadata:
  context:
    instance: dc1
    instanceType: virtual-machine
    project: default
  level: debug
  message: Migrate receive control monitor finished

metadata:
  context:
    err: 'Failed creating instance on target: Snapshot "snapshot-27" cannot be restored
      due to subsequent snapshot(s). Set zfs.remove_snapshots to override'
    instance: dc1
    instanceType: virtual-machine
    project: default
  level: debug
  message: Sending migration failure response to source

I tried to delete another VM from the target and copied that one again. The first copy attempts is successfull but the second fails for the same cannot be restored due to subsequent snapshot(s) error although the snapshots and incus db seem to be in sync.

Okay, that’s pretty weird. Can you file a bug at Issues · lxc/incus · GitHub?
I’ll try to reproduce it here and get to the bottom of it.

sure - It took me some time to create a reproducible testcase. In the end, it is the missing incus-agent that provokes the error.

btw - since the incus-migrate seems not to configure/install the incus-agent, what are the “official” steps to install it? I found Ship multiple VM agent binaries #263 but not how to do the bind-mount. For my tests I downloaded the deb package from debian sid and installed it inside the ubuntu vm. When creating a vm via cloud image I see incus-agent running from /run/incus_agent/incus-agent but I don’t yet get how this is beeing mounted/started.

→ I created a new question for this: How to install incus-agent on migrated systems?

I created new issue on GitHub: copy --refresh for vm from another server fails with: Snapshot cannot be restored due to subsequent snapshot(s) #457

1 Like