@migration snapshots left over

Hej,

on my LXD there are some snapshots in the ZFS pool with @migration suffix . What are these snapshots are for ? They do not exist for every container but some of them. e.g.

VIRT-MASTER:~ root @ zfs list -t all | grep migration
dpool/lxd/containers/archive@migration-abcf98c1-d1bb-4144-9cad-66989bd4db88                          312K      -  2.41G  -                                                                                                                 
dpool/lxd/containers/archive@migration-350b94a5-d795-4302-82e8-0fa66eeb5485                          313K      -  2.41G  -                                                                                                                 
dpool/lxd/containers/archive@migration-2ac8ae01-1d83-4412-b065-d3a0933a75ab                          431K      -  2.32G  -                                                                                                                 
dpool/lxd/containers/archive@migration-60358e05-239a-4582-be8b-356e6fd6bcc3                          431K      -  2.32G  -                                                                                                                 
dpool/lxd/containers/darwin@migration-bdfd8690-798f-4da6-910a-0a8e4f7a44ec                          1.25M      -  3.41G  -                                                                                                                 
dpool/lxd/containers/darwin@migration-fc1dc522-b6e3-475f-8868-a23d23f1776c                          1.25M      -  3.41G  -                                                                                                                 
dpool/lxd/containers/darwin@migration-517ec125-fce8-4519-9f14-7f93a32172d0                          1.26M      -  3.40G  -                                                                                                                 
dpool/lxd/containers/darwin@migration-610bbfba-6411-489b-87b8-3eca6f0cafa2                          1.26M      -  3.40G  -                                                                                                                 
dpool/lxd/containers/librenms@migration-c1819d42-5656-4e2a-98f0-0f177bc1271c                         752M      -  9.06G  -                                                                                                                 
dpool/lxd/containers/librenms@migration-e00aeec0-c2af-41ab-a4d5-e028001cd48d                         752M      -  9.06G  -                                                                                                                 
dpool/lxd/containers/librenms@migration-01396c8b-4c19-4d76-9e03-237a1efe0d48                         789M      -  9.21G  -                                                                                                                 
dpool/lxd/containers/librenms@migration-74a77423-0c5b-4e79-82e2-219fa1fcd749                         789M      -  9.21G  -                                                                                                                 
dpool/lxd/containers/librenms@migration-9748f7ab-726b-4533-be2b-a51638c3932f                        11.5M      -  9.41G  -                                                                                                                 
dpool/lxd/containers/librenms@migration-6e825bdf-cae7-48d4-b9e6-a6cf650dc36a                        11.5M      -  9.41G  -                                                                                                                 
dpool/lxd/containers/librenms@migration-c4e2e3c5-6267-4a09-a1ba-6cedf581cab7                        3.00M      -  9.43G  -                                                                                                                 
dpool/lxd/containers/librenms@migration-0c1941dd-5777-4acb-85dd-92af49e2ba58                        3.00M      -  9.43G  -                                                                                                                 
dpool/lxd/containers/librenms@migration-90b36250-90af-4256-9fd9-0f5d0e86b1a5                        47.1M      -  9.50G  -                                                                                                                 
dpool/lxd/containers/librenms@migration-196b6d55-7572-4ffc-850f-fb8ad8c115ee                        47.5M      -  9.50G  -                                                                                                                 
dpool/lxd/containers/mx1@migration-97db9fa0-d986-4483-a6e0-a03be54ccefb                              800K      -  2.48G  -                                                                                                                 
dpool/lxd/containers/mx1@migration-a12f8281-c1b8-40b4-b8db-dd9c46e8e24e                              796K      -  2.48G  -                                                                                                                 
dpool/lxd/containers/mx1@migration-460a8cfd-cfff-4150-8a23-a53f1242a635                              103M      -  2.49G  -                                                                                                                 
dpool/lxd/containers/mx1@migration-5e15b618-dbc9-4e74-b148-434e9990b4f6                              103M      -  2.49G  

I can also find sometimes a process that seems not to end:

8974 ? S 0:00 zfs send -c -L -i dpool/lxd/containers/librenms@migration-90b36250-90af-4256-9fd9-0f5d0e86b1a5 dpool/lxd/containers/librenms@migration-196b6d55-7572-4ffc-850f-fb8ad8c115ee

I sync every night with lxc delete and lxc copy to another host.

They’re safe to delete, they should just be there temporarily during a send/receive operation. They may have been left over in your case due to a crash or other issue during the migration.

The same may be true of that zfs send process which may hint at a network issue or something else causing the migration to fail partway through.

Thanks for the reply. I’ve deleted the snapshot without problems. I’ll have to investigate this but it seems uncommon that this a network problem because the two machines are connected directly over 100Gbit LAN.

I know its an old one, but I have an interesting finding, leaving migration snapshots might be a “feature”. I was experimenting with using a lxc copy --refresh as a poor man’s HA cluster and realized that every lxc copy --refresh leaves the @migration snapshots by default, probably rightfully so as it can not assume that one has only one destination that one is "copy --refresh"ing to. This is quite dangerous for someone that isn’t aware of it, as I’ve learned the hard way by running out of disk space :wink:

However, that does not happen if one copies within the same host (or maybe storage pool?). Not sure what was the thinking here, that it is cheap enough to do a full copy all the time?

That being said, maybe it would be a nice feature to add, actually two features:

  1. add a --id option (or construct one automatically from e.g. the destination remote or nodename) and use that as part of the snapshot name and then clean up all but the last snapshot after successful lxc copy --refresh
  2. add a --zfs-bookmark option that would create a bookmark after the copy --refresh and clean up all the snapshots with the same id on the source, so that they don’t waste the disk space.

I’m running into the same problem, for the same reasons. I have a ton of @migration snapshots that ballooned over time and ate all my disk space. What’s worse, I can’t delete them, I can’t zfs destroy -f them, I get “cannot destroy snapshots: permission denied” every time.

I’ve been looking all over the interwebs for an answer on this one and haven’t found one. I just torched a bunch of data on one of my containers to buy me some time. There’s got to be a way to torch all of these @migration snapshots sitting here eating 800GB of space.

To add: I’ve rebooted the host, I’ve tried destroying them as root, no joy. It would be great if there was an error message that told me why I can’t do this. When I’m root and I get “permission denied” that’s quite a head scratcher.

Well I managed to get this fixed to some extent, but I hated it. I deleted a container entirely, then did an lxc copy --refresh from my backup host to my primary. All of those @migration snapshots disappeared once I did that, but obviously only for that container. At the very least I’ve bought myself a lot of time to work on this now, as that was the container with 800GB of @migration snapshots. There’s got to be a clean way of removing the others without deleting the containers!

What lxd version?

Sorry for the late response, 5.11

I think perhaps these are left over from failed migrations on the sender side.
I’ve been working on the migration subsystem recently and have observed that the zfs send command can get stuck if the far side goes away/fails for some reason.

I am working on improving the cleanup on failure handling in this PR https://github.com/lxc/lxd/pull/11459

Ah okay, that’s good. What do I do about these @migration snapshots for the time being? It seems the only way I can delete them is by deleting the containers. Is there any other way?

When your machine gets LXD 5.12, it includes a packaging fix that should help with the long-standing issue that causes mount reference leaks with ZFS. This may be what is holding the reference to the snapshots open.

Ah, okay. I do see that we’re on 5.12 now, but the @migration snapshots are still in the system. Should I try a zfs destroy on them again or something else?

Yeah, you’ll want to blow them away manually. The fixes from @tomp will hopefully make it so they don’t come back.

Unfortunately, the only way I’ve been able to make them go away is by deleting the containers completely. Even copying and renaming them doesn’t work.

So what I do is run an lxc copy --refresh to my backup host, then I delete the container, and then I copy it back, and then the @migration snapshots disappear. This is not ideal but is do-able. I did that with my containers a month ago and everything seemed okay.

Today I was going to create some more containers and I see I’m out of disk space again. I run:

sudo zfs list -H -t snapshot -o name -s creation | grep @migration

And sure enough, tons of entries again. I must be doing something wrong with our nightly backup process? I really need this to get back to a stable point where it just does nightly backups and quits creating these, as I can’t even delete them without shutting down, deleting, and restoring containers, and that’s got me biting my nails at midnight every time I’m doing this.

Here’s my nightly backup script, where “backuplxdhost” is the name of the backup lxd host, and the echo is there because I send the output to a log file that gets emailed to me nightly:

for x in $(/snap/bin/lxc ls -c n --format csv)
do echo “Refreshing $x”
/snap/bin/lxc copy --refresh $x backuplxdhost:$x
done

Which LXD version are you running now? I think LXD 5.13 had a fix for left over migration snapshots.

I’m on 5.13. I have LXD installed via snap.

Just checking this morning, and there’s still tons of them, and some of the containers I’ve “fixed” via deleting them and restoring them, have created new @migration snapshots.

Have they been created since being on LXD 5.13 though?

That’s a good question, as I can’t really say for sure when my LXD installation updated to 5.13. Only way to know is to clean one up today and then watch, so I’ll do that!

2 Likes