Lxc copy: Error transferring instance data: exit status 22

I am trying to copy containers from an old 3.0.2 deb based install to a newer clustered 4.0.1 snap based install. I keep getting the following error:

lxc copy c1 ceph:c1
Error: Failed container creation: Error transferring instance data: exit status 22

The lxd.log shows the following:

lvl=eror msg=“Rsync send failed: /var/lib/lxd/containers/c1/: exit status 22: ERROR: buffer overflow in recv_rules [Receiver]\nrsync error: error allocating core memory buffers (code 22) at util2.c(112) [Receiver=3.1.2]\n” t=2020-05-05T12:22:19-0700

Any idea what is happening?

I just upgraded to 3.0.3 on the source server since it was available through apt. Now I get the following errors:

Error: Failed container creation: Error transferring instance data: exit status 2

t=2020-05-05T12:48:46-0700 lvl=eror msg=“Rsync send failed: /var/lib/lxd/containers/c1/: exit status 2: [Receiver] Invalid dir index: -1 (-101 - -101)\nrsync error: protocol incompatibility (code 2) at flist.c(2634) [Receiver=3.1.2]\n”

The source server is using a ZFS storage pool and the target server is using a Ceph RBD storage pool.

Hmm, looks like something wrong going on with the migration negotiation, that or quite a different rsync version.

Depending on whether your source server can take the downtime, upgrading it to the 4.0 snap would likely work around this issue.

snap install lxd
lxd.migrate

Should take care of moving the data and cleaning things up, then you’ll be dealing with the same version on source and target.

Source server:

$ lsb_release -d
Description: Ubuntu 18.04.1 LTS
$ rsync --version | head -n1
rsync version 3.1.2 protocol version 31
$ lxc version
Client version: 3.0.3
Server version: 3.0.3

Destination servers (3 in an LXD cluster):

$ lsb_release -d
Description: Ubuntu 18.04.4 LTS
$ rsync --version | head -n1
rsync version 3.1.2 protocol version 31
$ lxc version
Client version: 4.0.1
Server version: 4.0.1

Although, in the case of the destination servers that might not be the correct way to get the rsync version because isn’t the rsync version based on the Snap core dependency of the LXD snap?

I was able to move the container using a publish command on the source. In this case the ‘ceph’ remote is one of the 3 servers in the destination cluster:

lxc publish c1 ceph: --alias c1

Followed by a launch command on the destination:

lxc launch c1 c1
lxc image delete c1

However, this has a few ugly side effects:

  1. The ‘volatile.base_image’ and ‘image.*’ configuration values are all lost due to the published image being given its own fingerprint. I don’t want to lose this information because it is quite useful to have for reference. I was able to manually restore these values to the destination container, but I am not sure it is safe to do so?
  2. Using publish to move the container and then deleting the image after launching the container at the destination leads to a zombie image in the RBD pool. I know this is correct behavior based on how LXD clone copies the base image, but it isn’t ideal:

    $ sudo rbd ls lxd/
    container_c1

    zombie_image_aeeacfe6b70321d45e1f0f7560cc8e513e75b8dec539840c6a7a5ef5e4e953d4_ext4

  3. The publish approach also causes the container to lose its efficient snapshot copy of its true base image ‘9879a79ac2b208c05af769089f0a6c3cbea8529571e056c82e96f1468cd1f610’ as published on https://cloud-images.ubuntu.com/releases.

I have 40-50 containers that I need to get migrated and having the above issues repeated that number of times isn’t good.

The source is a single server storing the containers in a ZFS pool. The destination is a 3 node cluster storing the containers in a CEPH RBD pool.

Is there any way to move the containers without losing their efficient snapshot copies? Any suggestions on how to best do this migration?

@stgraber

I decided to try copying a container using the push mode:

$ lxc copy test-copy ceph: --mode push --verbose
Transferring container: test-copy: 142B (19B/s)

It just hangs at the above state indefinitely despite the following error showing up in ‘lxc monitor’ output on the target cluster:

metadata:
  class: websocket
  created_at: "2020-05-08T11:18:54.187185969-07:00"
  description: Creating container
  err: 'Error transferring instance data: exit status 2'
  id: 11809c5b-cae4-40a6-9c1e-95adfd5eb2ee
  location: node03
  may_cancel: false
  metadata:
    create_instance_from_image_unpack_progress: 'Unpack: 100% (3.90GB/s)'
    progress:
      percent: "100"
      speed: "3900497512"
      stage: create_instance_from_image_unpack
  resources:
    containers:
    - /1.0/containers/test-copy
    instances:
    - /1.0/instances/test-copy
  status: Failure
  status_code: 400
  updated_at: "2020-05-08T11:18:57.590873186-07:00"

Both push and pull modes encounter the same ‘exit status 2’ error. The above is at least a bug because the failure should have propagated out and terminated the hung command. However, I consider it a bug because rsyncing from LXD 3.0.3 on Ubuntu 18.04 LTS to LXD snap 4.0.1 on Ubuntu 18.04 LTS should have close enough rsync versions for this to work, right?

The source LXD daemon reports the following in its log:

lvl=eror msg=“Rsync send failed: /var/lib/lxd/containers/test-copy/: exit status 2: [Receiver] Invalid dir index: -1 (-101 - -101)\nrsync error: protocol incompatibility (code 2) at flist.c(2634) [Receiver=3.1.2]\n”

I tried searching for “[Receiver] Invalid dir index: -1 (-101 - -101)”. Maybe it is related to this?: