Copy from host fail on big containers

Hello there,

I’m running LXD 3.5 (from snap package) on two hosts running the same last Ubuntu 18.04 version.
Each time I try to transfer a snapshot from a big container (~50GB) from host A to host B I get the following error:

lxc copy xenon:g-mastodon/g-mstdn g-mastodon
Error: Failed container creation:                        
 - https://REDACTED:8443: Error transferring container data: Unable to connect to: REDACTED]:8443
 - https://REDACTED:8443: Error transferring container data: exit status 11
 - https://REDACTED:8443: Error transferring container data: Unable to connect to: REDACTED:8443
 - https://REDACTED:8443: Error transferring container data: Unable to connect to: REDACTED:8443

On a small container (eg. ~10GB) I don’t have this problem and the command runs quietly.
Thanks in advance for your help.

Ps. here is some diag infos, let me know if u need more

Host B

lxc info
config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    REDACTED
    -----END CERTIFICATE-----
  certificate_fingerprint: 8a3f123bcfeb1d3aec2840f5a58eaf294db5afdb73944a408853d4007174ef2b
  driver: lxc
  driver_version: 3.0.2
  kernel: Linux
  kernel_architecture: x86_64
  kernel_version: 4.15.0-34-generic
  server: lxd
  server_pid: 2429
  server_version: "3.5"
  storage: lvm
  storage_version: 2.02.133(2) (2015-10-30) / 1.02.110 (2015-10-30) / 4.37.0
  server_clustered: false
  server_name: B

Host A

lxc info
config:
  core.https_address: '[::]:8443'
  core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
  - 138.201.32.141:8443
  - '[2a01:4f8:171:340c::2]:8443'
  - 10.217.155.1:8443
  - '[fd42:6b4:84b0:4d34::1]:8443'
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    REDACTED
    -----END CERTIFICATE-----
  certificate_fingerprint: 78effb8ba14c228e9303c26d5577be376201981f512e56b62a7af1fda85e33b5
  driver: lxc
  driver_version: 3.0.2
  kernel: Linux
  kernel_architecture: x86_64
  kernel_version: 4.15.0-34-generic
  server: lxd
  server_pid: 28040
  server_version: "3.5"
  storage: lvm
  storage_version: 2.02.133(2) (2015-10-30) / 1.02.110 (2015-10-30) / 4.37.0
  server_clustered: false
  server_name: A

Rsync error 11 is I/O error, suggesting a filesystem/disk error on source or target during the transfer.

Hello Stéphane,

Thanks for answering. Everything is running on a mirrored RAID of two NVMe Crucial disks, and the whole thing looks very healthy (either mdstats and smartcl checked).

What should I do to debug this furthermore?

You may want to look at /var/snap/lxd/common/lxd/logs/lxd.log on both systems for more detailed errors.

Thanks, here they are on host A & B:

lvl=eror msg="Rsync send failed: /var/snap/lxd/common/lxd/snapshots/g-mastodon/g-mstdn/: exit status 11: rsync: write failed on \"/var/snap/lxd/common/lxd/containers/g-mastodon/rootfs/home/mastodon/live/public/system/media_attachments/files/000/013/745/original/08e9dd98b7e5497e.png\": No space left on device (28)\nrsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.1]\n" t=2018-09-17T14:35:42+0200
ephemeral=false lvl=info msg="Creating container" name=g-mastodon t=2018-09-17T14:35:59+0200
ephemeral=false lvl=info msg="Created container" name=g-mastodon t=2018-09-17T14:35:59+0200
lvl=warn msg="Unable to update backup.yaml at this time" name=g-mastodon t=2018-09-17T14:36:00+0200
err="Unable to connect to: [fd42:6b4:84b0:4d34::1]:8443" lvl=eror msg="Error during migration sink" t=2018-09-17T14:36:00+0200
created=2018-09-17T14:35:59+0200 ephemeral=false lvl=info msg="Deleting container" name=g-mastodon t=2018-09-17T14:36:00+0200 used=1970-01-01T01:00:00+0100
created=2018-09-17T14:35:59+0200 ephemeral=false lvl=info msg="Deleted container" name=g-mastodon t=2018-09-17T14:36:01+0200 used=1970-01-01T01:00:00+0100

I understand there is something related to the storage availability but still… My LVM pool is well sized, with almost 100GB free, much more than needed by my g-mastodon container. :frowning_face:

I suspect the issue is the default volume size that’s set on the remote server, this probably defaults to 10GB and so doesn’t work.

Try doing lxc storage set POOL-NAME volume.size 100GB on the target server and then try the copy again.

1 Like