I am able to take a snapshot of container1 and then copy the snapshot for the remote server fine.
I am also able to use lxc list remote: to list the images on the remote server fine.
Lxd snap 3.18 on both sides
ZFS storage on both sides
As a final question, is there a documentation showing how to use --refresh properly? Should I copy a snapshot first and then refresh, or should I use --refresh on a running container since the first copy?
I’m interested as well knowing more about this --refresh copy option.
According to me, this feature seems very promising for speeding up synchronization between containers, specially with remote ones.
I guess you’ve probably noticed that you cannot copy towards a running container (without CRIU). Usually, the error message is very helpful and it’s not the message you posted. So that’s not the good answer for you.
As far as I understand, I use this feature for backup purpose. I’m wondering whether it’s a good idea or not. Every day, a cron-daily script copy --refresh all of my production containers (locally) towards stopped containers (remotely). And it does work so far. The first time, I had to snapshot once, then copy this snapshot to the remote LXD machine. But then, after this first initialization, the copy --refresh seems to synchronize properly live containers and it runs fast (kind of rsync, maybe ?).
Actually I just found out that the refresh command is updating the remote container. However I am still receiving the following error message after the copy process:
The copy process runs very slow, between 10 and 500KB/s, while a copy of a snapshot to the same destination server runs around 34MB/s. Is this normal?
Should I trust that the remote image is being correctly updated, even with the above error message? I did start a copy of the remote container to test, and it did run fine, with the latest information.
Please help me to figure this out. I really need to remotely backup my containers using the refresh option to save bandwidth and time. Is it ready for production, or should I use zfs send/receive for now?
23 is Partial transfer due to error according to the rsync manual, so something didn’t transfer too well.
You’ll want to look at lxd.log on the source and target server, one of the two should have a more detailed error with the rsync output to let you know what file things blew up on.
The speed difference is possibly due to only transferring the difference, so spending more time going through files, comparing their file info and hash before transferring just a tiny bit of data.
There is also the difference that your initial copy was likely done using zfs send/receive but refresh updates can only be done through rsync, so the difference in protocol could also explain things working quite differently.
Why can’t refresh use ZFS send? When I use syncoid (zfs send wrapper) to replicate containers that have already been sent with lxc copy it finishes in an instant as ZFS is much faster at sending the diff than rsync is (or seems to be).
Just a bit confused as to why it reverts to using rsync over ZFS?
It’s not impossible to do but it’s just not done at this point.
Our migration protocol is somewhat simple and doesn’t allow much back and forth between source and target, more back and forth would be needed to first determine exactly what snapshots need to be transferred (source sends full list, target filters list based on what it has, send list back, then source would have to figure out nearest snapshot for each and send those), then a new temporary snapshot would need to be made on the source, sent, restored on target and deleted on both sides.
We would also need to add some fs details in the migration protocol so that part of the negotiation would be ensuring that both sides are actually the same base dataset as otherwise send/receive just can’t work.
Today, it’d actually be perfectly fine to do:
Copy a container from a remote server with zfs on both sides (uses send/receive)
Move the target container to a btrfs pool (converts everything to subvolumes)
Move back to the zfs pool (converts everything back to datasets and snapshots)
Do a refresh from the source
In this case, even though it’s still the same container and the same snapshots, the dataset itself isn’t the same on source and target, so send/receive cannot work at all. Since we use rsync, that’s fine, but if we were to support zfs, we’d need the extra data in the migration protocol so that we can detect it and switch to rsync for such cases.
Sounds like you have the zfs snapdir enabled on your volumes, this isn’t something that LXD would ever do itself and it may be causing the behavior you’re seeing.
Thanks for the help. I use snapdir so I can access my snapshot files using the directory .zfs and then copy any file I need from a particular snapshot. Do I have to disable it for the lxd dataset?
NAME PROPERTY VALUE SOURCE
dados snapdir visible local
dados/lxd snapdir visible inherited from dados
dados/lxd/containers snapdir visible inherited from dados
dados/lxd/containers/lucsim-zimbra snapdir visible inherited from dados
dados/lxd/containers/ns1 snapdir visible inherited from dados
dados/lxd/containers/ns2 snapdir visible inherited from dados
dados/lxd/custom snapdir visible inherited from dados
dados/lxd/custom-snapshots snapdir visible inherited from dados
dados/lxd/deleted snapdir visible inherited from dados
dados/lxd/images snapdir visible inherited from dados
dados/lxd/snapshots snapdir visible inherited from dados
The only problem I have right now is the very low transfer speed when using --refresh, comparing when copying a snapshot, the difference is unbelievable. Around 30 to 100 KB/s for the refresh and around 30 MB/s for the snapshot copy. Is there something wrong here?