trouncer
(Richard Trouncer)
September 19, 2018, 2:29pm
1
We’re trying to do live migration with very simple newly created containers, but the move never completes.
Looking at /var/snap/lxd/common/lxd/logs/lxd.log we get rsync errors on the sending host
ephemeral=false lvl=info msg=“Creating container” name=rjtTestServer t=2018-09-19T14:45:10+0100
ephemeral=false lvl=info msg=“Created container” name=rjtTestServer t=2018-09-19T14:45:10+0100
lvl=warn msg=“Unable to update backup.yaml at this time” name=rjtTestServer t=2018-09-19T14:45:10+0100
lvl=warn msg=“Unable to update backup.yaml at this time” name=rjtTestServer t=2018-09-19T14:45:10+0100
ephemeral=false lvl=info msg=“Creating container” name=rjtTestServer/deleteMe t=2018-09-19T14:45:10+0100
ephemeral=false lvl=info msg=“Created container” name=rjtTestServer/deleteMe t=2018-09-19T14:45:10+0100
action=start created=2018-09-19T14:45:10+0100 ephemeral=false lvl=info msg=“Starting container” name=rjtTestServer stateful=false t=2018-09-19T14:45:34+0100 used=1970-01-01T01:00:00+0100
action=start created=2018-09-19T14:45:10+0100 ephemeral=false lvl=info msg=“Started container” name=rjtTestServer stateful=false t=2018-09-19T14:45:34+0100 used=1970-01-01T01:00:00+0100
actionscript=false created=2018-09-19T14:45:10+0100 ephemeral=false features=1 lvl=info msg=“Migrating container” name=rjtTestServer predumpdir= statedir= stop=false t=2018-09-19T14:45:53+0100 used=2018-09-19T14:45:34+0100
actionscript=false created=2018-09-19T14:45:10+0100 ephemeral=false features=0 lvl=info msg=“Migrating container” name=rjtTestServer predumpdir= statedir=/tmp/lxd_checkpoint_685803994 stop=false t=2018-09-19T14:46:04+0100 used=2018-09-19T14:45:34+0100
actionscript=false created=2018-09-19T14:45:10+0100 ephemeral=false features=0 lvl=info msg=“Migrated container” name=rjtTestServer predumpdir= statedir=/tmp/lxd_checkpoint_685803994 stop=false t=2018-09-19T14:46:04+0100 used=2018-09-19T14:45:34+0100
lvl=eror msg=“Rsync send failed: /tmp/lxd_checkpoint_685803994/: exit status 2: [Receiver] Invalid dir index: -1 (-101 - -101)\nrsync error: protocol incompatibility (code 2) at flist.c(2630) [Receiver=3.1.1]\n” t=2018-09-19T14:46:04+0100
We don’t get errors on the receiving host.
Elsewhere in the logs it does repeatedly complain about
lvl=warn msg=“Unable to update backup.yaml at this time” name=testchimg t=2018-09-19T14:38:01+0100
but that may be unrelated.
Both hosts are 18.04.1 new builds, as is the container.
First time posting here, apologies for any breaches of protocol.
Richard
stgraber
(Stéphane Graber)
September 19, 2018, 2:48pm
2
Same LXD version on source and destination?
edmcdonagh
(Ed McDonagh)
September 19, 2018, 3:26pm
3
Both are snap installed 3.0.2.
stgraber
(Stéphane Graber)
September 20, 2018, 9:34am
4
Hmm, odd, we’ll need to try to reproduce that.
What storage backend is that, does it consistently happen and did you try with a very simple container image like Alpine?
The error matches what I’d expect if one server was 3.0.1 and the other 3.0.2 or maybe one 3.0.2 with our cherry-picks and the other without as it’d match what happened prior to us adding logic to detect rsync feature mismatches.
trouncer
(Richard Trouncer)
September 20, 2018, 1:04pm
5
Alpine gives roughly the same error.
The storage backend is zfs. We’ve yet to manage a successful live migration. We’re using the stable snap branch 3.0
actionscript=false created=2018-09-20T13:56:02+0100 ephemeral=false features=1 lvl=info msg="Migrating container" name=testAlpine predumpdir= statedir= stop=false t=2018-09-20T13:59:51+0100 used=2018-09-20T13:56:03+0100
actionscript=false created=2018-09-20T13:56:02+0100 ephemeral=false features=0 lvl=info msg="Migrating container" name=testAlpine predumpdir= statedir=/tmp/lxd_checkpoint_696747950 stop=false t=2018-09-20T13:59:52+0100 used=2018-09-20T13:56:03+0100
actionscript=false created=2018-09-20T13:56:02+0100 ephemeral=false features=0 lvl=info msg="Migrated container" name=testAlpine predumpdir= statedir=/tmp/lxd_checkpoint_696747950 stop=false t=2018-09-20T13:59:52+0100 used=2018-09-20T13:56:03+0100
lvl=eror msg="Rsync send failed: /tmp/lxd_checkpoint_696747950/: exit status 2: [Receiver] Invalid dir index: -1 (-101 - -101)\nrsync error: protocol incompatibility (code 2) at flist.c(2630) [Receiver=3.1.1]\n" t=2018-09-20T13:59:52+0100
stgraber
(Stéphane Graber)
September 20, 2018, 8:14pm
6
Can you show lxc info
for the source and destination server?
edmcdonagh
(Ed McDonagh)
September 21, 2018, 8:40am
7
Looks like I can’t attach, so I’ve gone with long post, sorry:
Host 3:
config:
core.https_address: '[::]:8443'
core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- candid_authentication
- candid_config
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
addresses:
- 10.163.254.33:8443
- 192.168.248.110:8443
- 192.168.122.1:8443
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
MIIFVTCCAz2gAwIBAgIQExnufo8Jn65F11gQP/JPBTANBgkqhkiG9w0BAQsFADA5
<snip>
3BezUmyqzVdMJnMSri8UovKkTTM/4kriHZw9SNAS77tedIveyY7C1dZzuNLrgTnT
UIPWOtJiEOcdpEYLPDHDDVjJmwhzXWPfIA==
-----END CERTIFICATE-----
certificate_fingerprint: <snip>
driver: lxc
driver_version: 3.0.2
kernel: Linux
kernel_architecture: x86_64
kernel_version: 4.15.0-34-generic
server: lxd
server_pid: 27824
server_version: 3.0.2
storage: zfs
storage_version: 0.7.5-1ubuntu16.3
server_clustered: false
server_name: frp-vmhost3
And host 4:
config:
core.https_address: '[::]:8443'
core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- candid_authentication
- candid_config
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
addresses:
- 10.163.254.34:8443
- 192.168.248.149:8443
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
MIIFUDCCAzigAwIBAgIRAPc9dUjEEp/Ld7S7/1XIgfwwDQYJKoZIhvcNAQELBQAw
<snip>
8KorvAlvSmsxS4sqf6YjSNTur3jZMuSyjGivbMwJjxKjyHS9SbKABYZolV4wQzZo
X1ehaKAo5cbbthGtY1Pbrp7CrFU=
-----END CERTIFICATE-----
certificate_fingerprint: <snip>
driver: lxc
driver_version: 3.0.2
kernel: Linux
kernel_architecture: x86_64
kernel_version: 4.15.0-34-generic
server: lxd
server_pid: 2518
server_version: 3.0.2
storage: zfs
storage_version: 0.7.5-1ubuntu16.3
server_clustered: false
server_name: frp-vmhost4
trouncer
(Richard Trouncer)
September 27, 2018, 10:26am
8
Does anyone have any ideas?
Just got the same error with latest lxc, lxd from git on rhel7 with rsync 3.1.2. Will look into what is necessary to fix this.
adrianr
October 11, 2018, 2:25pm
10
@stgraber If I revert the following commit it works again:
commit 7dfc2939ac278d8436ff4b892599c795c5482007
Author: Stéphane Graber stgraber@ubuntu.com
Date: Thu Aug 23 19:27:31 2018 -0400
global: Advertise rsync features
Closes #4962
Somehow the pre-dump loop seems to have problems with the additional information on the channel.
edmcdonagh
(Ed McDonagh)
November 23, 2018, 6:18am
11
@stgraber - does release 3.03 change this at all?
stgraber
(Stéphane Graber)
November 23, 2018, 4:13pm
12
The CRIU migration issue was resolved in LXD 3.7 and the same fix is present in 3.0.3.
edmcdonagh
(Ed McDonagh)
November 23, 2018, 5:23pm
13
Thanks - we’ll check it out.