Lxc copy --refresh from/to zfs backed storage pool leaves @migration-xxxx zfs snapshots behind

I was experimenting with using a lxc copy --refresh as a poor man’s HA cluster and realized that every lxc copy --refresh leaves the @migration snapshots on the zfs filesystem by default, probably rightfully so as it can not assume that one has only one destination that one is "copy --refresh"ing to. This is quite dangerous for someone that isn’t aware of it, as I’ve learned the hard way by running out of disk space :wink:

However, that does not happen if one copies within the same host (or maybe storage pool?). Not sure what was the thinking here, that it is cheap enough to do a full copy all the time? Or is it a bug?

That being said, maybe it would be a nice feature to add, actually two features:

  1. add a --id option (or construct one automatically from e.g. the destination remote or nodename) and use that as part of the snapshot name and then clean up all but the last snapshot after successful lxc copy --refresh
  2. add a --zfs-bookmark option that would create a bookmark after the copy --refresh and clean up all the snapshots with the same id on the source, so that they don’t waste the disk space.

default remote is a cluster, alexsv is my local single instance

~$ lxc launch u20 migtest --target lxd12
Creating migtest
Starting migtest
~$ ssh lxd12 zfs list lxd12/lxd/containers/migtest -r -t all
NAME                           USED  AVAIL     REFER  MOUNTPOINT
lxd12/lxd/containers/migtest  10.3M  14.4G      645M  legacy
~$ lxc copy --refresh migtest alexsv:
~$ ssh lxd12 zfs list lxd12/lxd/containers/migtest -r -t all
NAME                           USED  AVAIL     REFER  MOUNTPOINT
lxd12/lxd/containers/migtest  10.3M  14.4G      645M  legacy
~$ lxc copy --refresh migtest alexsv:
~$ ssh lxd12 zfs list lxd12/lxd/containers/migtest -r -t all
NAME                                                                          USED  AVAIL     REFER  MOUNTPOINT
lxd12/lxd/containers/migtest                                                 10.6M  14.4G      645M  legacy
lxd12/lxd/containers/migtest@migration-b893e86b-66e6-407f-85d7-d4dab43ce20c   304K      -      645M  -

however:

~$ lxc copy --refresh migtest migtest-copy --target lxd11
~$ ssh lxd12 zfs list lxd12/lxd/containers/migtest -r -t all
NAME                                                                          USED  AVAIL     REFER  MOUNTPOINT
lxd12/lxd/containers/migtest                                                 10.6M  14.4G      645M  legacy
lxd12/lxd/containers/migtest@migration-b893e86b-66e6-407f-85d7-d4dab43ce20c   304K      -      645M  -
lxd12/lxd/containers/migtest@migration-1372dc3c-9d2f-4a73-b0a6-88d5507754b9     0B      -      645M  -
~$ lxc copy --refresh migtest migtest-copy2 --target lxd12
~$ ssh lxd12 zfs list lxd12/lxd/containers/migtest -r -t all
NAME                                                                          USED  AVAIL     REFER  MOUNTPOINT
lxd12/lxd/containers/migtest                                                 10.6M  14.4G      645M  legacy
lxd12/lxd/containers/migtest@migration-b893e86b-66e6-407f-85d7-d4dab43ce20c   304K      -      645M  -
lxd12/lxd/containers/migtest@migration-1372dc3c-9d2f-4a73-b0a6-88d5507754b9     0B      -      645M  -
lxd12/lxd/containers/migtest@copy-6e776fcc-8a89-410d-a360-143c10e2cba2          0B      -      645M  -
~$ ssh lxd12 zfs list lxd12/lxd/containers/migtest -r -t all
NAME                                                                          USED  AVAIL     REFER  MOUNTPOINT
lxd12/lxd/containers/migtest                                                 10.6M  14.4G      645M  legacy
lxd12/lxd/containers/migtest@migration-b893e86b-66e6-407f-85d7-d4dab43ce20c   304K      -      645M  -
lxd12/lxd/containers/migtest@migration-1372dc3c-9d2f-4a73-b0a6-88d5507754b9     0B      -      645M  -
lxd12/lxd/containers/migtest@copy-6e776fcc-8a89-410d-a360-143c10e2cba2          0B      -      645M  -
~$ lxc copy --refresh migtest migtest-copy2 --target lxd12
~$ ssh lxd12 zfs list lxd12/lxd/containers/migtest -r -t all
NAME                                                                          USED  AVAIL     REFER  MOUNTPOINT
lxd12/lxd/containers/migtest                                                 10.7M  14.4G      645M  legacy
lxd12/lxd/containers/migtest@migration-b893e86b-66e6-407f-85d7-d4dab43ce20c   304K      -      645M  -
lxd12/lxd/containers/migtest@migration-1372dc3c-9d2f-4a73-b0a6-88d5507754b9    68K      -      645M  -
lxd12/lxd/containers/migtest@copy-d84426b1-61d2-4a23-b88c-b4d66631afa3          0B      -      645M  -
~$ lxc copy --refresh migtest migtest-copy --target lxd11
~$ ssh lxd12 zfs list lxd12/lxd/containers/migtest -r -t all
NAME                                                                          USED  AVAIL     REFER  MOUNTPOINT
lxd12/lxd/containers/migtest                                                 10.7M  14.4G      645M  legacy
lxd12/lxd/containers/migtest@migration-b893e86b-66e6-407f-85d7-d4dab43ce20c   304K      -      645M  -
lxd12/lxd/containers/migtest@migration-1372dc3c-9d2f-4a73-b0a6-88d5507754b9    68K      -      645M  -
lxd12/lxd/containers/migtest@copy-d84426b1-61d2-4a23-b88c-b4d66631afa3          0B      -      645M  -
lxd12/lxd/containers/migtest@migration-47a33898-8170-43bb-85bf-445bdf0d656b     0B      -      645M  -

It is either a bug or a misconfiguration; LXD is not supposed to leave those behind. I do not use ZFS, but with Btrfs on all of my systems, it only creates the migration-subvol temporarily, cleaning it up after the copying is finished.

Yeah, but OTOH if the snapshot is not kept there, how is incremental sync (–refresh) going to be made? from what?

@monstermunchkin this sounds like a bug in the optimized transfer implementation, are you able to take a look please?

@Aleks it seems you are trying to do live migration which is experimental and can lead to failures. Instead of doing that, try stopping the instance before copying. That should work fine.

If a failure occurs (which as you say is likely with live migration) shouldn’t temporary snapshots be cleaned up though?

Looking at the code, those snapshots should be cleaned up. I haven’t been able to get live migration working. In case of an error however, they don’t get cleaned up which is not ideal.

OK thanks, yes we should fix that then, as that would be considered a bug.

hmm, if I stop the source container how will that be “poor man’s HA”? :wink:

Why is live migration even attempted? I have on both source and destination server criu.enable set to false and if I attempt an “lxc move”, I corrrectly get “Error: Unable to perform container live migration. CRIU isn’t installed on the source server”

It would be nice that lxc copy --refresh (–stateless?) officially works even with live migration turned off, that is enough for my needs, and I guess quite a few other people too. I could theoretically do a snapshot and then copy the snapshot but in that case I can’t do --refresh, I get “startError: --refresh can only be used with instances”, so it is a waste, every time I have to copy complete data.

It is supposed to work. I, for example, can perfectly well copy both running and stopped containers without any issues and I don’t have live migration enabled. I’ve actually used this for several years already for taking backups of my containers and virtual machines.

You only need criu for live migration (move) not a refresh copy. Criu doesnt work with most (maybe all lxd containers now) due to criu not working in most cases to capture process state on modern kernels and oses.

I’m not sure why @monstermunchkin thought you were using criu.

But in any case lxd shouldn’t be leaving temporary snapshots (unless its being killed by an external process) so we can call this a bug.

Are you getting any errors during the copy process btw?

If you have a reproducer it would be great if you could log this as an issue at Issues · lxc/lxd · GitHub

I saw lxc launch before lxc copy, and therefore assumend a live migration.

Anyway, I can reproduce this when using lxc copy --refresh.

1 Like

ok, shall I open a ticket nevertheless?

Yes please.

BTW, how do you reproduce these left over snapshots @monstermunchkin I was just trying to reproduce it and couldn’t.

opened Lxc copy --refresh from/to zfs backed storage pool leaves @migration-xxxx zfs snapshots behind · Issue #11194 · lxc/lxd · GitHub

1 Like

Create a cluster with zfs backend. Then, simply run lxc copy c1 c2 --target=node2 --refresh. On node1, you then run zfs list -t all which should list the leftover snapshots.

Interesting, so it must be clustered? I was using a copy --refresh between remote targets and that didn’t seem affected.