A working theory I have at the moment is that LXD is trying to connect on all of the available IPs sequentially (which it does in some places, although I’m not certain this is one of them, although its a strong possibility given what we are seeing), and this can take time as some IPs are not listening (perhaps a firewall is blocking the request causing a timeout rather than connection refused).
Previously it would have kept trying until it found the right one, but with the listener timeouts added in 4.20, the timeout is hit before the correct IP is found, and by the time the destination member does try and connect on it the websocket on the source has been closed.
One thing you could do to try and prove this theory is to make sure that inbound requests to that IPv6 address are rejected rather than dropped (if they are now) so that LXD destination will immediately move onto the next IP. If that works it’ll prove my theory.
You’ve not shown me the output of ss that I asked for, but based on the config you have shown me I’m assuming that LXD isn’t listening on the IPv6 address.
So based on my comment here and here I’m thinking that perhaps you have a firewall that is blocking inbound connections to port 8443 on the IPv6 address (you’ve not confirmed this), and that one way we could identify if my theory was correct is if you added a rule to your firewall (if you have one on that host) to cause connections to the IPv6 address on port 8443 to be rejected rather than dropped, so that LXD can quickly fail over to the IPv4 address.
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| NAME | URL | PROTOCOL | AUTH TYPE | PUBLIC | STATIC | GLOBAL |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| host-02 | https://192.168.1.2:8443 | lxd | tls | NO | NO | NO |
lxc copy container host-02:container for example.
Reinstalled host-02 today and no difference after the installation. But I’m sure nothing has changed on my side.
Ah so that somewhat changes your earlier statement of “Firewall ports are open between the servers.” When we were talking about if the ipv6 address was allowed, or whether if not, could you make your firewall reject rather than drop.
I do not use and allow public IPv6 communication between the servers. The servers are standalone with an internal switch in between with only IPv4 configured.
A public IPv4 and IPv6 address is configured on the nodes. I do not use this for copying containers and communication should not go through the public addresses. This is by default not allowed.
The VLAN (internal switch) is dedicated for containers backups with lxc copy.
@stgraber Do you have any idea? I have no idea what has been changed in the past few weeks.
I’ve already explained my working theory as to the problem and the change here
And here
If you have ipv6 addresses bound to the host lxd will try and use them. If the firewall is then dropping rather than rejecting the request this could take longer than 10s and cause the web socket listeners to time out.
I was asking if you could setup some ipv6 reject rules to test that theory before we start thinking about a fix in Lxd. Does that make sense?