kriszos
(Kriszos)
November 28, 2021, 3:08am
1
So, I have 5 node cluster with zfs storage. After update to 4.20, I can’t move in any container from any node to one specific node. Lets call it “atl1”. Containers already existing on this node, and newly created can be moved out without any issues.
Container “test1” is stopped. When i issue “lxc move test1 --target atl1” i get error:
“Error: Copy instance operation failed: Failed instance creation: Error transferring instance data: websocket: bad handshake”
it doesn’t matter if i issue this from a remote or a node which is participating in transfer.
tomp
(Thomas Parrott)
November 29, 2021, 9:05am
2
Can you move containers to other members of the cluster OK?
kriszos
(Kriszos)
November 30, 2021, 10:58pm
3
When I was writing my first post i was able to move containers between other nodes. But now I have problems. Between some i can move containers both way, between others only one way. I can’t spot any pattern. Additionally “lxc cluster list” return that all nodes are fully operational.
But I still can create new containers an all nodes without problems.
tomp
(Thomas Parrott)
December 1, 2021, 8:53am
4
Can you show lxc cluster ls
kriszos
(Kriszos)
December 1, 2021, 1:30pm
5
±----------±-----------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| NAME | URL | ROLES | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE | MESSAGE |
±----------±-----------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| atl1 | https://172.31.255.10:8443 | database | x86_64 | default | | ONLINE | Fully operational |
±----------±-----------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| kriszos11 | https://192.168.111.181:8443 | database-standby | x86_64 | default | | ONLINE | Fully operational |
±----------±-----------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| naslxd | https://10.0.1.12:8443 | database | x86_64 | default | | ONLINE | Fully operational |
±----------±-----------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| nazwa1 | https://172.31.255.6:8443 | database | x86_64 | default | | ONLINE | Fully operational |
±----------±-----------------------------±-----------------±-------------±---------------±------------±-------±------------------+
| pc | https://10.0.1.104:8443 | database-standby | x86_64 | default | | ONLINE | Fully operational |
±----------±-----------------------------±-----------------±-------------±---------------±------------±-------±------------------+
P
tomp
(Thomas Parrott)
December 2, 2021, 8:51am
6
Right so your cluster members are on different subnets, possibly with router(s) and firewalls(s) between them?
Have you ensure that all members can communicate bi-directionally with each other (rather than just the leader sending heartbeats to each member)?
Also check that there are no firewalls that are potentially altering the TLS negotiation for connections too.
kriszos
(Kriszos)
December 2, 2021, 5:53pm
7
It never was a problem that my nodes were on different subnets. I checked, and I’m able to establish tcp connection to 8443 port from every node to every other node. My firewalls between them are not sophisticated enough to do TLS inspection.
to further check i added all nodes as remotes on all nodes, like that:
lxc remote add --accept-certificate atl1 172.31.255.10 --password <trust_password>
lxc remote add --accept-certificate nazwa1 172.31.255.6 --password <trust_password>
lxc remote add --accept-certificate naslxd 10.0.1.12 --password <trust_password>
lxc remote add --accept-certificate pc 10.0.1.104 --password <trust_password>
lxc remote add --accept-certificate kriszos11 192.168.111.181 --password <trust_password>
what is interesting only when adding “atl1” remote i get notification
Client certificate now trusted by server: atl1
“atl1” node is the only node that I joined to cluster on August or September this year. using token from command “lxc cluster add”. I don’t know if it is relevant.
Regardless i was able to successfully execute lxd commands on every node like bellow:
lxc cluster list atl1:
lxc cluster list nazwa1:
lxc cluster list naslxd:
lxc cluster list pc:
lxc cluster list kriszos11:
lxc config show atl1:
lxc config show nazwa1:
lxc config show naslxd:
lxc config show pc:
lxc config show kriszos11:
What is also interesting, I am now in the same situation like in my first post. I can’t move in any container from any node to one specific node “atl1”. Containers already existing on this node, and newly created can be moved out without any issues.
Maybe I should move out all my containers from this node, remove this node from cluster, reinstall lxd snap and rejoin the cluster?
tomp
(Thomas Parrott)
December 3, 2021, 10:25am
8
Can you run lxc config trust ls
TomvB
(Tom)
December 4, 2021, 5:22pm
9
Same issue here with LXC Copy without any change. (only snap updates)
LXC Remote host readded without success. No cluster, standalone nodes with remote trust.
The command:
lxc copy CT host-2:CT-COPY
Source host IP: 192.168.1.1
Destination host IP: 192.168.1.2
The error:
Error: Failed instance creation:
https://publicip:8443 : Error transferring instance data: Unable to connect to: publicip:8443
https://[ipv6]:8443: Error transferring instance data: Unable to connect to: [ipv6]:8443
https://192.168.1.1:8443 : Error transferring instance data: websocket: bad handshake
snap list
Name Version Rev Tracking Publisher Notes
core18 20211028 2253 latest/stable canonical✓ base
core20 20211115 1242 latest/stable canonical✓ base
lxd 4.20 21902 latest/stable canonical✓ -
snapd 2.53.2 14066 latest/stable canonical✓ snapd
lxc remote list
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| NAME | URL | PROTOCOL | AUTH TYPE | PUBLIC | STATIC | GLOBAL |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| host-02 | https://192.168.1.2:8443 | lxd | tls | NO | NO | NO |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| images | https://images.linuxcontainers.org | simplestreams | none | YES | NO | NO |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| local (current) | unix:// | lxd | file access | NO | YES | NO |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| ubuntu | Ubuntu Cloud Images | simplestreams | none | YES | YES | NO |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
| ubuntu-daily | Ubuntu Cloud Images | simplestreams | none | YES | YES | NO |
±----------------±-----------------------------------------±--------------±------------±-------±-------±-------+
Nothing special with lxc config trust ls. The old and new key with expiration date 2030 en 2031.
Please check the changelog related to this issue. I cant move and copy my containers now.
lxc remote remove host-2
lxc remote add host-2 192.168.1.2
Certificate fingerprint: fingerprint
ok (y/n/[fingerprint])? y
kriszos
(Kriszos)
December 5, 2021, 7:10pm
10
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| TYPE | NAME | COMMON NAME | FINGERPRINT | ISSUE DATE | EXPIRY DATE |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | 10.0.1.12 | root@naslxd | ba590c32faf3 | Dec 2, 2021 at 4:58pm (UTC) | Nov 30, 2031 at 4:58pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | 10.0.1.104 | kriszos@pc | 83a22c493584 | Nov 26, 2020 at 7:21pm (UTC) | Nov 24, 2030 at 7:21pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | 172.20.20.34 | root@nazwa1 | 4fed755b499b | Dec 2, 2021 at 4:53pm (UTC) | Nov 30, 2031 at 4:53pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | 172.20.20.81 | root@nazwa1 | 72c4e9d6b232 | Nov 12, 2020 at 1:35pm (UTC) | Nov 10, 2030 at 1:35pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | 172.20.20.81 | root@nazwa1.kriszos.pl | bf514843bf20 | Aug 2, 2020 at 11:59pm (UTC) | Jul 31, 2030 at 11:59pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | 172.31.255.10 | root@atl1 | 3472858514b8 | Dec 2, 2021 at 4:57pm (UTC) | Nov 30, 2031 at 4:57pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | 192.168.111.181 | root@kriszos11 | 060cd5fb9f5e | Dec 2, 2021 at 4:57pm (UTC) | Nov 30, 2031 at 4:57pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | lxd.cluster.3d5a401ec032a47c5884478420ba20cbe877b63a8cf343f24c3f1d9592e081f8 | root@nazwa1 | 3d5a401ec032 | Nov 8, 2020 at 3:56pm (UTC) | Nov 6, 2030 at 3:56pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | lxd.cluster.5aa4cfb5baa02ea83ac61c171fab60aef588c4139884dfe3e150dd753f6f77ea | root@nazwa1 | 5aa4cfb5baa0 | Nov 8, 2020 at 12:17am (UTC) | Nov 6, 2030 at 12:17am (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | lxd.cluster.8cedb938ddd3799f70a318eae19dee9d6e2db5352d9434f0617b9aa56c4f25b9 | root@CT11 | 8cedb938ddd3 | Nov 11, 2020 at 5:34pm (UTC) | Nov 9, 2030 at 5:34pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | lxd.cluster.97fd1dff295cc9202c5d36339d49296f285107e429f897d8c1c9f682bd85a68b | root@KRISZOS11 | 97fd1dff295c | Nov 11, 2020 at 10:36pm (UTC) | Nov 9, 2030 at 10:36pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | lxd.cluster.7193c3e42d3025e62291e3cb38d6ce8ab84654134afa1196365bbd63d3ae1bfc | root@ovh1 | 7193c3e42d30 | Oct 16, 2020 at 11:33pm (UTC) | Oct 14, 2030 at 11:33pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | lxd.cluster.32059ae454dcaddb850950869edc55a75247b97f182e7da24970a3cd489439c5 | root@ovh1 | 32059ae454dc | Oct 18, 2020 at 3:19pm (UTC) | Oct 16, 2030 at 3:19pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | lxd.cluster.a76005e6c86f14ab513792fb853cd495e7943352610db7f1a1cb6cafbdeb9485 | root@nazwa1 | a76005e6c86f | Nov 8, 2020 at 4:38pm (UTC) | Nov 6, 2030 at 4:38pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | lxd.cluster.b4bbcf8ada9bcbbf8faeed8f45b413da5568eae6c99fd8e88d5a20643ea71fe0 | root@nazwa1 | b4bbcf8ada9b | Feb 13, 2021 at 8:55pm (UTC) | Feb 11, 2031 at 8:55pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | lxd.cluster.cfc2eb3ee4b31d70588f46953f040bf64f1a32a25dc8a1e488773966f2f7faaa | root@nazwa1 | cfc2eb3ee4b3 | Nov 8, 2020 at 12:43am (UTC) | Nov 6, 2030 at 12:43am (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| client | lxd.cluster.fed6f814e01e3933202655a90c80f1d572c1612c6e9b7b173c955a22ae77f64e | root@nazwa1 | fed6f814e01e | Nov 8, 2020 at 1:05am (UTC) | Nov 6, 2030 at 1:05am (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| server | atl1 | root@atl1 | 8ea259e54402 | Aug 17, 2021 at 10:21pm (UTC) | Aug 15, 2031 at 10:21pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| server | kriszos11 | root@KRISZOS11 | 51867ccd19ca | Oct 5, 2021 at 5:47pm (UTC) | Oct 3, 2031 at 5:47pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| server | naslxd | root@naslxd | 1edd24502197 | Aug 20, 2020 at 7:29pm (UTC) | Aug 18, 2030 at 7:29pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| server | nazwa1 | root@nazwa1 | 1ea5d1d59ecb | Jul 29, 2021 at 2:31pm (UTC) | Jul 27, 2031 at 2:31pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
| server | pc | root@pc | 70ce4eb54aef | Nov 11, 2020 at 6:51pm (UTC) | Nov 9, 2030 at 6:51pm (UTC) |
+--------+------------------------------------------------------------------------------+-----------------------------------------+--------------+-------------------------------+-------------------------------+
tomp
(Thomas Parrott)
December 6, 2021, 11:14am
11
@TomvB @kriszos which version of LXD were you upgrading from and to when this started?
tomp
(Thomas Parrott)
December 6, 2021, 11:16am
12
Can you also confirm the system time is correct on all servers.
tomp
(Thomas Parrott)
December 6, 2021, 11:18am
13
The last change to authenticate was:
committed 08:10PM - 04 Aug 21 UTC
Allow metrics certificates for the endpoints /1.0 and /1.0/metrics.
Signed-off… -by: Thomas Hipp <thomas.hipp@canonical.com>
But this was in LXD 4.19.
kriszos
(Kriszos)
December 6, 2021, 12:08pm
15
Time is correct on all servers. Upgrade has been done via snap from 4.19
tomp
(Thomas Parrott)
December 6, 2021, 12:11pm
16
OK, can you enable debug on the servers affected using:
sudo snap set lxd daemon.debug=true; sudo systemctl reload snap.lxd.daemon
And then capture the output when running the affected command (on both source and target servers) from: /var/snap/lxd/common/lxd/logs/lxd.log
As I have no idea what could be causing this.
TomvB
(Tom)
December 6, 2021, 7:14pm
18
Not sure, using auto updates. Time is correct on both servers.
tomp
(Thomas Parrott)
December 7, 2021, 12:20pm
20
@TomvB @kriszos can you advise if any of the --mode
options for for lxc copy
, i.e it defaults to “pull”, so try “push” or “relay”.
@stgraber and I were wondering if this might be a network issue (perhaps MTU) that is interfering with websocket upgrade, as the errors suggest that TLS certificate negotiation has succeeded, but that its failing after that during websocket upgrade.
tomp
(Thomas Parrott)
December 7, 2021, 12:21pm
21
Also can you confirm if you are using the same version of the client and server (i.e 4.20)?