Lxc copy to internal IP instead of public IP

Hi,

I have a problem with copying containers from lxd-01 to lxd-02. Correction: since yesterday.

lxc remote list

±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| NAME | URL | PROTOCOL | AUTH TYPE | PUBLIC | STATIC | GLOBAL |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| lxd-02 | https://192.168.1.2:8443 | lxd | tls | NO | NO | NO |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| images | https://images.linuxcontainers.org | simplestreams | none | YES | NO | NO |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| local (current) | unix:// | lxd | file access | NO | YES | NO |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| ubuntu | https://cloud-images.ubuntu.com/releases | simplestreams | none | YES | YES | NO |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| ubuntu-daily | https://cloud-images.ubuntu.com/daily | simplestreams | none | YES | YES | NO |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-----

It is trying to use the public ip from lxd-01:

Error: Failed instance creation: Error transferring instance data: Unable to connect to: 81.x.x.x:8443

Command:

lxc copy CT lxd-02:CT-backup

This has worked well for 1.5 years. Out of nowhere, LXD tries to copy the container via the local public address (lxd-01) instead of the internal address from lxd-02.

Network lxd-01: (3: = migration IP, 2: = public ip)

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether x:x:x:x brd ff:ff:ff:ff:ff:ff
inet 81.x.x.x/32 scope global enp0s31f6
valid_lft forever preferred_lft forever
3: vlan4@enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default qlen 1000
link/ether x:x:x:x brd ff:ff:ff:ff:ff:ff
inet 192.168.1.1/24 brd 192.168.1.255 scope global vlan4
valid_lft forever preferred_lft forever

Checked the firewall and ICMP. All ok. Added 192.168.1.2 for the second time with lxc remote add and everything seems to be fine. I’m sure it’s not the port or network. Please check this.

Nothing special in the debug log. Just cancelling the job.

status: Cancelled
status_code: 401
updated_at: “2021-06-24T15:01:09.323267072+02:00”
timestamp: “2021-06-24T15:01:19.484086159+02:00”
type: operation

The way lxc copy works is that it instructs the target server (lxd-02) to connect to the source (local) to fetch the instance.

To do that, LXD fetches the addresses of the source (visible in lxc info local:) and feeds that to the remote server. When multiple addresses are present, we iterate through them.

I know that @tomp made a recent change to avoid needless retries in some cases, but this shouldn’t apply to cases where an address isn’t reachable.

So I’d recommend running lxc info local: and then look at what’s listed under addresses in the environment section and see if those are correct and one is properly reachable from the target server.

If not, then that’s the issue, but if you see both the public address (the one that’s failing) and a private address which should have worked, then it may be a regression in @tomp’s change.

A temporary workaround may be to flip the direction using --mode=push or --mode=relay which will always work at the potential cost of some added cpu/bandwidth usage.

Thanks for the explanation.

@tomp lxd-01 addresses.

environment:
addresses:

  • 81.x.x.x:8443
  • ‘[ipv6]:8443’
  • 192.168.1.1:8443
  • 10.21.121.1:8443
  • ‘[ipv6]:8443’

I want to use 192.168.1.1 to 192.168.1.2 to copy containers and vm’s and not the public IPv6 or IPv4 addresses/network.

After lxc copy CT lxd-02:CT-Backup:

Error: Failed instance creation: Error transferring instance data: Unable to connect to: 81.x.x.x:8443

Ok, that sounds like a regression in @tomp’s change.

The updated logic does not try the other addresses if an address results in a “late” error. In this case it definitely would be expected to keep on trying, but the fact that the error only indicates a single address shows that it’s not doing that.

I’m off today but I can look into this, possibly reverting the change we made.

Until this is done, using --mode=relay should make things behave.

Oh did that get released already?

The change attempted to create a remote operation and if that failed then moved onto next IP, but if it succeeds then it tries to start the actual transfer using the same address.

But it sound like I don’t understand the intracasies of the remote operation’s various flavours.

Sounds like that needs to be reverted and then expand our test suite to cover the normal retry scenario as well as the scenario the PR fixed which was where the connection itself succeeded but the operation itself fails and shouldn’t be retried.

There’s a revert PR and I’ll add it to my list to look into:

1 Like

@stgraber if the original fix can wait until I get back feel free to assign this one to me, I’m tracking it in Trello too. Thanks

Thank you Thomas!

There is a new PR merged now that reattempts the original intention:

1 Like

Thanks! I can’t copy containers yet. When will the fix be available for snap users?

The fix for allowing you to copy containers was the revert 8 days ago:

But perhaps @stgraber hasn’t included that in the snap yet.

The latest fix is to reimplement the fix for the original problem without introducing a regression again.

Error: Failed instance creation: Error transferring instance data: Unable to connect to:

@stgraber any update about this?

Pushed cherry-picks to candidate now which includes the reworked client logic.

1 Like

Yay, it works! Thanks

1 Like