LXD copy --stateless performance, zfs backed, can it be improved/tweaked?

bodleytunes · October 13, 2019, 1:50pm

Hi,
I’m copying between two servers both zfs backed and I’m strugging to get above 20-21MByte/second which is ~160MBit/S

The servers are remote to each other on the internet, Hetzner and OVH datacenters. Iperf3 between the two over zerotier is running around 350MBit and wireguard around 500Mbit.

I’ve tested lxc copy over both VPN’s and with both get the same 160Mbit cap on copy speed.

I then decided to rule out the overhead of any VPN by running over 8443 directly and not in the tunnel. The performance was exactly the same, which seems to suggest something else is slowing down the copy.

Is there any way of changing the encryption cipher of whatever LXD is using by default for comms over 8443, possibly to something more performant, or even switch it off for running through a VPN. I’m wondering if this is becoming the bottleneck, I would expect I should be able to send between servers of 400 to 500mbits through wireguard, or if there are any other parameters that can be tuned.

Both of these servers are bare metal Xeon’s of reasonable performance, 64GB ram each.

Many thanks,

Jon.

stgraber · October 13, 2019, 4:26pm

Could also be the zfs processing on either side slowing you down in this case.
Here with even a smaller Xeon E3 on the source side, I can get close to gigabit line rate on other LXD API transfers.

One way to test it is with:

lxc exec remote:some-container-- truncate -s 1G /tmp/test.img
lxc file pull remote:some-container/tmp/test.img out.img

This will take a little while as LXD first needs to fetch the data from the container and store it on the host, then after that it will perform the transfer over the API and give you the transfer speed.

I was getting 115MB/s here which is about as close to gigabit linerate as one will ever get.

bodleytunes · October 13, 2019, 4:31pm

Cheers, I will give it a test and see what I come back with.

bodleytunes · October 14, 2019, 9:26pm

OK did the tests, got similar speeds between 20 and 26Mbytes/second so thinking it’s something to do with the storage backend…
I wonder if its slower because its disk partitions instead of entire block devices used for the zfs mirror.
Pretty sure when I use syncoid to copy the zfs datasets over manually its faster
Will do some experiments.
Cheers!
Jon

alex2alex · October 23, 2019, 7:32pm

Found anything related to the slow speeds?
Have the same or even worse issue, currently copying like

Transferring container: nv1: 3.45GB (4.34MB/s)
…and this is “only” an 44GB container, i have others around 150GB

Source storage is a zfs mirror pool of 2 partitions each on it’s own sas disk
Destination storage is local-dir on mdraid mirror consisting of 2 sas disks
Transfer is being done over the local network

bodleytunes · October 24, 2019, 7:42am

Mine was ZFS to ZFS, yours is probably something else, I’m guessing its dropping down to using Rsync instead of zfs send.

I find my own ZFS to ZFS copy is fastest if I use a 3rd party util such as “syncoid” to send the datasets directly.