Cannot copy/move from LXD 2.0 to LXD 4.4

We have one more LXD server that we want to migrate to a new cluster (snap LXD 4.4). The old server is on Ubuntu 16.04 with LXD 2.0.11

I’ve setup a “remote” relationship and they can see each other (lxc list shows the containers of the remote)

From the “new” server, this is the copy command.

$ sudo lxc copy angstel:survey survey
Error: Failed instance creation:
 - Error transferring instance data: Unable to connect to:
 - Error transferring instance data: exit status 22
 - Error transferring instance data: websocket: bad handshake

On the “old” server I see this line in /var/log/lxd/lxd.log

lvl=eror msg="Rsync send failed: /var/lib/lxd/containers/survey/: exit status 22: ERROR: buffer overflow in recv_rules [Receiver]\nrsync error: error allocating core memory buffers (code 22) at util2.c(112) [Receiver=3.1.2]\n" t=2020-08-18T21:13:13+0200

If I do the copy command on the “old” server, it also fails.

On the “old” server we have rsync 3.1.1, and on the “new” there is LXD snap 4.4. I don’t know how to check if rsync is part of the snap. But I think it has version 3.1.2 because that is what I see in the error message on the “old” server. The Ubuntu package rsync is 3.1.3

BTW In LXD 2.0 there is no export, so I can’t use that as a workaround.

The only option I can think of is to upgrade LXD to 3.0.3. However, there are some important containers for which we want to avoid downtime. And of course, it would be trouble if the upgrade somehow fails.

Can you initiate the copy from the target server so you have a more modern CLI and use lxc copy old:CONTAINER new: --mode relay?

That should leave us just with one error to deal with :slight_smile:
It’s indeed quite likely to be a rsync feature mismatch, though the target should be assuming the least amount of feature in that case so maybe there are some other flags that have since changed and are confusing rsync.

You may want to run forkstat | grep rsync on both source and target, so we can easily compare the final set of arguments on both sides.

If we’re just missing a few arguments on the source, one workaround is to put in place a /usr/local/bin/rsync wrapper which executes the real rsync with the few extra arguments needed.

$ sudo lxc copy angstel:survey survey --mode relay
Error: The source server is missing the required "container_push" API extension

I’ll do the forkstat dance in a moment.

Oh, so much for that idea then :wink:

Yeah, forkstat should help figure out any mismatch then.

On the 2.0 source

15:51:30 exec  30380                 rsync -arvP --devices --numeric-ids --partial --sparse /var/lib/lxd/containers/survey/ localhost:/tmp/foo -e sh -c "/usr/bin/lxd netcat @lxd/fc00c6d5-85de-4b2a-b7b9-7ecbf10ece77 survey"
15:51:30 fork  30380 parent          rsync -arvP --devices --numeric-ids --partial --sparse /var/lib/lxd/containers/survey/ localhost:/tmp/foo -e sh -c "/usr/bin/lxd netcat @lxd/fc00c6d5-85de-4b2a-b7b9-7ecbf10ece77 survey"
15:51:30 fork  30381 child           rsync -arvP --devices --numeric-ids --partial --sparse /var/lib/lxd/containers/survey/ localhost:/tmp/foo -e sh -c "/usr/bin/lxd netcat @lxd/fc00c6d5-85de-4b2a-b7b9-7ecbf10ece77 survey"
15:51:30 exec  30381                 sh -c /usr/bin/lxd netcat @lxd/fc00c6d5-85de-4b2a-b7b9-7ecbf10ece77 survey localhost rsync --server -vlogDtprSe.iLsfx --partial --numeric-ids . /tmp/foo
15:51:30 fork  30381 parent          sh -c /usr/bin/lxd netcat @lxd/fc00c6d5-85de-4b2a-b7b9-7ecbf10ece77 survey localhost rsync --server -vlogDtprSe.iLsfx --partial --numeric-ids . /tmp/foo
15:51:30 fork  30382 child           sh -c /usr/bin/lxd netcat @lxd/fc00c6d5-85de-4b2a-b7b9-7ecbf10ece77 survey localhost rsync --server -vlogDtprSe.iLsfx --partial --numeric-ids . /tmp/foo
15:51:35 exit  30381     10    5.627 sh -c /usr/bin/lxd netcat @lxd/fc00c6d5-85de-4b2a-b7b9-7ecbf10ece77 survey localhost rsync --server -vlogDtprSe.iLsfx --partial --numeric-ids . /tmp/foo
15:51:35 exit  30380   5632    5.629 rsync -arvP --devices --numeric-ids --partial --sparse /var/lib/lxd/containers/survey/ localhost:/tmp/foo -e sh -c "/usr/bin/lxd netcat @lxd/fc00c6d5-85de-4b2a-b7b9-7ecbf10ece77 survey"

On the 4.4 target

15:51:35 exec  1886288                 rsync --version
15:51:35 exit  1886288      0   0.001s rsync --version
15:51:35 exec  1886289                 rsync --server -vlogDtpre.iLsfx --numeric-ids --devices --partial --sparse --xattrs --delete --compress --compress-level=2 . /var/snap/lxd/common/lxd/storage-pools/local/containers/survey/
15:51:35 exit  1886289   5632   0.105s rsync --server -vlogDtpre.iLsfx --numeric-ids --devices --partial --sparse --xattrs --delete --compress --compress-level=2 . /var/snap/lxd/common/lxd/storage-pools/local/containers/survey/

Hmm, right, so there’s something wrong on the target as it’s mistakenly assuming all features are supported…

@tomp is that something you could take a look at? Ideally we’d want rsync copies to work between 2.0.11 and higher, 3.0.3 and higher, 4.0.0 and higher and whatever is latest. I’m not sure it’s possible, will depend on exactly what 3.0.3 does as a behavior and if we have to chose between 2.0 and 3.0, we’ll obviously pick 3.0 as the one that should work.

@keesbghs so in your case, you can create a file at /usr/local/bin/rsync containing:

exec /usr/bin/rsync --xattrs --delete --compress --compress-level=2 "$@"

Then make that path executable and attempt another copy. This should cause LXD to use the wrapper script which will inject the missing args on the source.

1 Like

The wrapper on the LXD 2.0 source… Yes! That works.

Thanks, Stéphane, for the workaround. This is perfect for us.

Glad it worked!

Hopefully we can tweak the code in latest LXD to better handle importing from 2.0 without such hacks. I just fear that our current detection behavior is there to accommodate 3.0 and that as 2.0 doesn’t have interactive feature negotiation, it may not be possible to fix this for everyone…

I’m happy. I’m going to move the containers, one by one, to our cluster (LXD 4.4), and after that we will wipe the old system and newly install Ubuntu 20.04, and then add it to the cluster.

I will take a look into this.