Lxc copy on btrfs fails: connection timed out

Hi all,
Setup:

  • Debian bullseye
  • snap LXD (5.0.2)
  • btrfs-progs v5.10.1

I have a problem with lxc copy on btrfs.
I copy my container to another host with:
lxc snapshot host:container snap_name
lxc copy host:container/snap_name snap_name_local

I managed to successfully make one copy but the rest of the times (tried at least 20 times, probably more) I end up with the following error:
Error: Failed instance creation: Error transferring instance data: Got error reading migration source: read tcp 1.2.3.4:32970->1.2.3.5:8443: read: connection timed out

Using: lxc mointor
I get:
Failed sending volume ideaprod/ideaprod-20230206-143040:/: Btrfs send failed: [signal: killed write tcp 1.2.3.5:8443->1.2.3.4:32978: write: broken pipe] (At subvol /var/snap/lxd/common/lxd/storage-pools/default/containers/migration.3863231056/.migration-send

I am not able to find any other errors, not in dmesg and there are no errors reported by btrfs.
On both hosts:
btrfs scrub start -B /var/snap/lxd β†’ No errors

I have the same setup on another host and lxc copy is done every night and that never fails (the containers are extremely small though).
The size of this container is: 2.65MiB

I did a test with just btrfs, created a lxc snapshot and used btrfs send/reveive.
I tried that twice and both of the times it worked fine, no timeout.
btrfs send /var/snap/lxd/common/lxd/storage-pools/default/containers/my_snap | ssh host btrfs receive /var/snap/lxd/common/lxd/storage-pools/default/containers/

Is there some type of timeout that happens in LXD?
As I do not seem to be able to find any other errors I really struggle here to try to figure this one out.

Many thanks in advance!

I did some more testing using lxc monitoring and it seems that it stops on the exact same place every time. Every time there is the same message before the first connection reset by peer.
message: MigrateInstance finished

Here is the last part of the lxc monitor output:
location: none
metadata:
context:
args: β€˜&{IndexHeaderVersion:1 Name:ideaprod/ideaprod-20230207-134402 Snapshots:[]
MigrationType:{FSType:BTRFS Features:[migration_header header_subvolumes header_subvolume_uuids]}
TrackProgress:true MultiSync:false FinalSync:false Data: ContentType: AllowInconsistent:false
Refresh:false Info:0xc000014a40 VolumeOnly:true}’
instance: ideaprod/ideaprod-20230207-134402
project: default
level: debug
message: MigrateInstance finished
timestamp: β€œ2023-02-07T13:44:51.124827787+01:00”
type: logging

location: none
metadata:
context:
err: |-
Failed sending volume ideaprod/ideaprod-20230207-134402:/: Btrfs send failed: [signal: killed write tcp 1.2.3.4:8443->1.2.3.5:35860: write: connection reset by peer] (At subvol /var/snap/lxd/common/lxd/st
orage-pools/default/containers/migration.1812165211/.migration-send
)
instance: ideaprod/ideaprod-20230207-134402
project: default
level: error
message: Migration failed on source
timestamp: β€œ2023-02-07T13:44:51.125105799+01:00”
type: logging

location: none
metadata:
context:
instance: ideaprod/ideaprod-20230207-134402
project: default
level: info
message: Migration channels disconnected on source
timestamp: β€œ2023-02-07T13:44:51.125307033+01:00”
type: logging

location: none
metadata:
context:
class: websocket
description: Transferring snapshot
err: |-
Failed sending volume ideaprod/ideaprod-20230207-134402:/: Btrfs send failed: [signal: killed write tcp 1.2.3.4:8443->1.2.3.5:35860: write: connection reset by peer] (At subvol /var/snap/lxd/common/lxd/st
orage-pools/default/containers/migration.1812165211/.migration-send
)
operation: fdf4df78-d7cd-4f13-85d0-e69a82fc2373
project: default
level: debug
message: Failure for operation
timestamp: β€œ2023-02-07T13:44:51.125386762+01:00”
type: logging

location: none
metadata:
class: websocket
created_at: β€œ2023-02-07T13:44:04.168983296+01:00”
description: Transferring snapshot
err: |-
Failed sending volume ideaprod/ideaprod-20230207-134402:/: Btrfs send failed: [signal: killed write tcp 1.2.3.4:8443->1.2.3.5:35860: write: connection reset by peer] (At subvol /var/snap/lxd/common/lxd/stor
age-pools/default/containers/migration.1812165211/.migration-send
)
id: fdf4df78-d7cd-4f13-85d0-e69a82fc2373
location: none
may_cancel: false
metadata:
control: f3008a4d5464c1310bc14f7e5adcc5c52202a2ca3ea5c18072252166ba057d6a
fs: 05ae55ee9a051f9b54ae488de91c36b5d0bea8593544a4423088152aacb11843
fs_progress: β€˜ideaprod/ideaprod-20230207-134402: 5.16GB (114.51MB/s)’
resources:
containers:
- /1.0/containers/ideaprod
instances:
- /1.0/instances/ideaprod
status: Failure
status_code: 400
updated_at: β€œ2023-02-07T13:44:50.46370078+01:00”
project: default
timestamp: β€œ2023-02-07T13:44:51.125490093+01:00”
type: operation

Hi, did some more digging and I wonder if it can be related to this bug?

After lxc copy fail, there are still btrfs write processes.

What LXD version is the target server?

Yes, I am sorry!
It is the exact same setup: debian bullseye, lxd 5.0.2 and btrfs 5.10.1

I have been testing a bit more and found the following strange behaviour.
The errors above are between 2 servers that are supposed to be replicated so everything is setup exactly the same, lets call them 1 and 2. There is also another server that has lxd installed and setup, similar to the other two, lets call it 3.
All the three of them have the same lxd version (5.0.2), same debian and same btrfs.
I have been copying between the three servers with: lxc copy …
This is the result:

  • 1 β†’ 2: Fails with above error
  • 2 β†’ 1: Fails with above error
  • 1 β†’ 3: Works great
  • 3 β†’ 1: Works great
  • 2 β†’ 3: Works great
  • 3 β†’ 2: Works great

I have been using lxc monitor to monitor the copy, all servers transfer data in similar speed ( +100MB/s).
I can not find any more errors then the ones described above.
I can not get my head around this and I am totally lost by now!!

Where should I continue to look?
How can I resolve this problem?

Thanks!!

Hello,

I’ve a similar problem. When i am trying to copy one container from btrfs to another machine with zfs all process just hangs forever without any progress. Even if i am trying to do remote export i got same result - it hangs there.

on both hosts i’ve snap lxd 5.10

Any ideas where is a problem?

Thanks,

~yuri

Hi all,
After loads of testing and investigation I realize that LXD is totally perfect!!!
LXD told me that was a problem with network and it was so right!!
We have a bad switch that both servers were connected to, after testing with another switch everything worked fine!!

1 Like

Hi,
To me this seems like an odd setup. As I understand it, in the background there is a btrfs send/receive going on and this does not go well with zfs, right?
Is this really a supported solution, lxc copy … btrfs β†’ zfs?

LXD will negotiate the protocol to use to transfer, and in cased of btrfs β†’ zfs will use rsync (for filesystem volumes) or dd (for block volumes).