Lxc copy --refresh workaround: efficient incremental ZFS snapshot sync with send/receive

I need to update/sync cold and hot standby containers on separate machines efficiently and I realized that lxc copy --refresh can’t help here because rsync is used in the refresh case and this can be very slow (slower than an initial full copy).

References: this requirement was mentioned in Lxc copy --refresh error and Future incremental copy - (lxc refresh) - ZFS backend already.

So I scripted the following files for an incremental ZFS snapshot send / receive to a backup server. In my tests the snapshot sync even worked with a running target container that finally is restored to the newly synchronized snapshot. This sync’ed container intentionally does not include further snapshots that are created on the source machine.

Now I’d like to discuss the completeness and relative stability of this approach regarding LXD updates as well as future simplifications e.g. remove the direct execution of zfs send on the remote source machine / switch over to lxc functionality.

1 Like

Update on this topic: the ZFS snapshot sync that is implemented with the scripts I posted in February is running stable in test setups since 2 months. After some minor changes I will use this in production for hot and cold standby containers on ZFS storage volumes.

1 Like

I do a similar thing but use Syncoid instead.

Hi Florian

I have a couple of containers running under LXD using a ZFS storage pool so I too would like to see LXD integrate something like this so that we could take advantage of zfs send and receive to sync containers or VMs running on ZFS based LXD hosts.

How well does your script integrate with LXD’s (ZFS) snapshot support? I’m not very clear on how they link up yet. When I copy my container from one LXD server to another using this script, will the number of snapshots as output by lxc list match?

How would you recommend I do the initial copying of my containers from one LXD host to another, if I wanted to preserve the ZFS snapshots? These wouldn’t be preserved by lxc copy by the sounds of things but maybe combining it with zfs send will do the trick somehow?

Hi Dan,

in the first run the script copies the state of an initial snapshot of the remote source container to the local target container.
On subsequent syncs it uses zfs send to sync the delta between a new snapshot and the last snapshot on the source container. So this sync works relatively fast compared to the rsync/copy --refresh approach of lxd. On one container I’m using this sync every 30 minutes to be up-to-date for a hot-standby failover.

So you won’t see all the snapshots that you created on the source container, i.e lxc list won’t match; you will just have the latest snapshot named ‘bsync’ that was created by the script.

Take a deeper look into the script to see what needed to be hacked so that this zfs send approach became usable. I don’t see that a match of snapshots of the source container is possible this way.

BTW: I just updated https://gist.github.com/usrflo/f4f62e886490b2efed2a1503aed3e228 so that a sync operation is executed on a single container only. This enables:

  • different sync source servers
  • different sync intervals (by specific cron jobs)

Regards,
Florian

Hi Florian

Thanks for clarifying your script! On further consideration, I think I’d prefer to backup to a machine with a ZFS pool that isn’t running LXD as I don’t need a hot standby. Might you be able to answer my latest questions regarding LXD and ZFS in this thread?

Thanks!

Hi again Florian

It looks like I’ll be better off basing my LXD backups around lxd recovery but it appears that hasn’t made it into the stable LXD branch yet.

Your script makes use of something called bsync and bsync-last. I presume these are some custom scripts you wrote to fetch the latest snapshot name? Could you share those too please?

4.0.8 has it and it’s being rolled out now (phased rollout should take around 48h to hit everyone).

1 Like

OK great! I don’t need to test it yet, that can wait a few days at least. I’d rather not update to the beta snap.

Hi Dan,

“bsync” and “bsync-last” are constants only that are used as snapshot names to differ between the last sync’ed state and the new state (zfs snapshot diff).

@usrflo thank you so much for sharing this!

My two cents:

This assumes that the “backup” container was originally imported using lxc export --optimized-storage at the origin ( with a subsequent lxc import ... at destination ), and that the incremental deltas are sent via zfs send | zfs recv ; the script tries to go through the list of the new snapshots at destination, and to lxd sql add them to the “global” database, if they are missing.

PS. I definitely wouldn’t call it production quality, more like a (working) proof of concept thing.

One thing it misses in this version is that it uses the present date as a stub, when in fact it shall get the snapshot date from zfs properties, and convert it to an sqlite timestamp.

Not too hard to add, but at the moment I’m more concerned about the fragility of the whole lxd sql approach, as I mentioned here.