jrock
(john)
October 30, 2022, 8:56am
1
root@kube01:~# sudo snap list lxd
Name Version Rev Tracking Publisher Notes
lxd 5.7-c62733b 23893 latest/stable canonical✓ -
all nodes are on 5.7-c62733b
Except kube02 which is down because the hardware is broken. I would like to take it out of the cluster
but lxc cluster remove --force kube02 just keeps running without feedback
root@kube01:~# lxd sql global "SELECT * FROM nodes"
+----+--------+-------------+---------------------+--------+----------------+--------------------------------+-------+------+-------------------+
| id | name | description | address | schema | api_extensions | heartbeat | state | arch | failure_domain_id |
+----+--------+-------------+---------------------+--------+----------------+--------------------------------+-------+------+-------------------+
| 1 | kube01 | | 192.168.178.70:8443 | 66 | 332 | 2022-10-30T08:51:26.255385898Z | 0 | 4 | <nil> |
| 2 | kube02 | | 192.168.178.78:8443 | 66 | 327 | 2022-10-17T21:04:55.763750682Z | 0 | 4 | <nil> |
| 3 | kube05 | | 192.168.178.74:8443 | 66 | 332 | 2022-10-30T08:51:23.891626379Z | 0 | 4 | <nil> |
| 4 | kube06 | | 192.168.178.69:8443 | 66 | 332 | 2022-10-30T08:04:56.88908946Z | 0 | 4 | <nil> |
| 5 | kube04 | | 192.168.178.73:8443 | 66 | 332 | 2022-10-30T08:51:22.58399337Z | 0 | 4 | <nil> |
| 6 | kube03 | | 192.168.178.79:8443 | 66 | 332 | 2022-10-30T08:51:26.313618077Z | 0 | 4 | <nil> |
+----+--------+-------------+---------------------+--------+----------------+--------------------------------+-------+------+-------------------+
on all nodes: /var/snap/lxd/common/lxd/logs/lxd.log
time="2022-10-30T08:30:07Z" level=warning msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
time="2022-10-30T08:30:16Z" level=warning msg="Wait for other cluster nodes to upgrade their versions, cluster not started yet"
time="2022-10-30T08:31:20Z" level=warning msg="Wait for other cluster nodes to upgrade their versions, cluster not started yet"
time="2022-10-30T08:32:23Z" level=warning msg="Wait for other cluster nodes to upgrade their versions, cluster not started yet"
time="2022-10-30T08:49:14Z" level=warning msg="Wait for other cluster nodes to upgrade their versions, cluster not started yet"
root@kube01:~# ps aux | grep lxd
root 914 0.0 0.0 2060 1372 ? Ss 08:29 0:00 /bin/sh /snap/lxd/23893/commands/daemon.activate
root 1068 0.1 1.1 1354104 44228 ? Sl 08:29 0:03 lxd activateifneeded
root 1102 0.0 0.0 2060 1388 ? Ss 08:30 0:00 /bin/sh /snap/lxd/23893/commands/daemon.start
root 1279 0.0 0.0 152728 1508 ? Sl 08:30 0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root 1292 1.2 2.8 1575876 109304 ? Sl 08:30 0:19 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root 1293 0.0 0.9 1352796 36824 ? Sl 08:30 0:00 lxd waitready
root 1294 0.1 0.0 2060 696 ? S 08:30 0:02 /bin/sh /snap/lxd/23893/commands/daemon.start
root 3186 0.0 0.0 6420 1840 pts/1 S+ 08:56 0:00 grep --color=auto lxd
My problem is that I can not do anything usefull right now. I can reboot all nodes. But as soon as I do something like “lxc list” or “lxc network list”, etc. The commands just become unresponsive
tomp
(Thomas Parrott)
October 30, 2022, 2:41pm
2
Try running:
sudo snap refresh lxd --cohort="+"
On each cluster member, this should ensure they are all running the same version, which is required for cluster operation.
See also Bug #1990954 “snap info <package> sometimes shows conflicting ve...” : Bugs : Snap Store Server
1 Like
jrock
(john)
October 30, 2022, 3:19pm
3
I applied it to all nodes , did a reboot. but still the problem stays:
installed: 5.7-c62733b (23893) 138MB in-cohort
time="2022-10-30T15:17:16Z" level=warning msg="Wait for other cluster nodes to upgrade their versions, cluster not started yet"
jrock
(john)
October 30, 2022, 3:22pm
4
I am suspecting a network config problem. but since I can’t do anything like lxc network … I have no ideas
tomp
(Thomas Parrott)
October 30, 2022, 4:39pm
5
Please show full output of “snap info lxd” on every member
jrock
(john)
October 30, 2022, 5:11pm
6
kube01:~$ snap info lxd
snap-id: J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking: latest/stable
refresh-date: yesterday at 13:39 UTC
channels:
latest/stable: 5.7-c62733b 2022-10-29 (23893) 138MB -
latest/candidate: 5.7-c62733b 2022-10-28 (23893) 138MB -
latest/beta: ↑
latest/edge: git-ff53106 2022-10-28 (23902) 138MB -
5.7/stable: 5.7-c62733b 2022-10-29 (23893) 138MB -
5.7/candidate: 5.7-c62733b 2022-10-28 (23893) 138MB -
5.7/beta: ↑
5.7/edge: ↑
5.6/stable: 5.6-794016a 2022-09-28 (23687) 137MB -
5.6/candidate: 5.6-794016a 2022-09-23 (23687) 137MB -
5.6/beta: ↑
5.6/edge: ↑
5.5/stable: 5.5-37534be 2022-08-27 (23543) 112MB -
5.5/candidate: 5.5-37534be 2022-08-19 (23543) 112MB -
5.5/beta: ↑
5.5/edge: ↑
5.4/stable: 5.4-1ff8d34 2022-08-13 (23371) 106MB -
5.4/candidate: 5.4-3bf11b7 2022-08-12 (23456) 107MB -
5.4/beta: ↑
5.4/edge: ↑
5.3/stable: 5.3-91e042b 2022-07-06 (23274) 106MB -
5.3/candidate: 5.3-91e042b 2022-07-03 (23274) 106MB -
5.3/beta: ↑
5.3/edge: ↑
5.0/stable: 5.0.1-9dcf35b 2022-08-24 (23545) 106MB -
5.0/candidate: 5.0.1-9dcf35b 2022-08-19 (23545) 106MB -
5.0/beta: ↑
5.0/edge: git-13e1e53 2022-08-21 (23567) 111MB -
4.0/stable: 4.0.9-8e2046b 2022-03-26 (22761) 63MB -
4.0/candidate: 4.0.9-dea944b 2022-09-27 (23697) 65MB -
4.0/beta: ↑
4.0/edge: git-407205d 2022-03-31 (22805) 65MB -
3.0/stable: 3.0.4 2019-10-10 (11376) 49MB -
3.0/candidate: 3.0.4 2019-10-10 (11376) 49MB -
3.0/beta: ↑
3.0/edge: git-81b81b9 2019-10-10 (11378) 49MB -
installed: 5.7-c62733b (23893) 138MB in-cohort
kube02 is down (hardware defect)
kube03:~$ snap info lxd
snap-id: J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking: latest/stable
refresh-date: yesterday at 13:39 UTC
channels:
latest/stable: 5.7-c62733b 2022-10-29 (23893) 138MB -
latest/candidate: 5.7-c62733b 2022-10-28 (23893) 138MB -
latest/beta: ↑
latest/edge: git-ff53106 2022-10-28 (23902) 138MB -
5.7/stable: 5.7-c62733b 2022-10-29 (23893) 138MB -
5.7/candidate: 5.7-c62733b 2022-10-28 (23893) 138MB -
5.7/beta: ↑
5.7/edge: ↑
5.6/stable: 5.6-794016a 2022-09-28 (23687) 137MB -
5.6/candidate: 5.6-794016a 2022-09-23 (23687) 137MB -
5.6/beta: ↑
5.6/edge: ↑
5.5/stable: 5.5-37534be 2022-08-27 (23543) 112MB -
5.5/candidate: 5.5-37534be 2022-08-19 (23543) 112MB -
5.5/beta: ↑
5.5/edge: ↑
5.4/stable: 5.4-1ff8d34 2022-08-13 (23371) 106MB -
5.4/candidate: 5.4-3bf11b7 2022-08-12 (23456) 107MB -
5.4/beta: ↑
5.4/edge: ↑
5.3/stable: 5.3-91e042b 2022-07-06 (23274) 106MB -
5.3/candidate: 5.3-91e042b 2022-07-03 (23274) 106MB -
5.3/beta: ↑
5.3/edge: ↑
5.0/stable: 5.0.1-9dcf35b 2022-08-24 (23545) 106MB -
5.0/candidate: 5.0.1-9dcf35b 2022-08-19 (23545) 106MB -
5.0/beta: ↑
5.0/edge: git-13e1e53 2022-08-21 (23567) 111MB -
4.0/stable: 4.0.9-8e2046b 2022-03-26 (22761) 63MB -
4.0/candidate: 4.0.9-dea944b 2022-09-27 (23697) 65MB -
4.0/beta: ↑
4.0/edge: git-407205d 2022-03-31 (22805) 65MB -
3.0/stable: 3.0.4 2019-10-10 (11376) 49MB -
3.0/candidate: 3.0.4 2019-10-10 (11376) 49MB -
3.0/beta: ↑
3.0/edge: git-81b81b9 2019-10-10 (11378) 49MB -
installed: 5.7-c62733b (23893) 138MB in-cohort
kube04:~$ snap info lxd
snap-id: J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking: latest/stable
refresh-date: yesterday at 13:39 UTC
channels:
latest/stable: 5.7-c62733b 2022-10-29 (23893) 138MB -
latest/candidate: 5.7-c62733b 2022-10-28 (23893) 138MB -
latest/beta: ↑
latest/edge: git-ff53106 2022-10-28 (23902) 138MB -
5.7/stable: 5.7-c62733b 2022-10-29 (23893) 138MB -
5.7/candidate: 5.7-c62733b 2022-10-28 (23893) 138MB -
5.7/beta: ↑
5.7/edge: ↑
5.6/stable: 5.6-794016a 2022-09-28 (23687) 137MB -
5.6/candidate: 5.6-794016a 2022-09-23 (23687) 137MB -
5.6/beta: ↑
5.6/edge: ↑
5.5/stable: 5.5-37534be 2022-08-27 (23543) 112MB -
5.5/candidate: 5.5-37534be 2022-08-19 (23543) 112MB -
5.5/beta: ↑
5.5/edge: ↑
5.4/stable: 5.4-1ff8d34 2022-08-13 (23371) 106MB -
5.4/candidate: 5.4-3bf11b7 2022-08-12 (23456) 107MB -
5.4/beta: ↑
5.4/edge: ↑
5.3/stable: 5.3-91e042b 2022-07-06 (23274) 106MB -
5.3/candidate: 5.3-91e042b 2022-07-03 (23274) 106MB -
5.3/beta: ↑
5.3/edge: ↑
5.0/stable: 5.0.1-9dcf35b 2022-08-24 (23545) 106MB -
5.0/candidate: 5.0.1-9dcf35b 2022-08-19 (23545) 106MB -
5.0/beta: ↑
5.0/edge: git-13e1e53 2022-08-21 (23567) 111MB -
4.0/stable: 4.0.9-8e2046b 2022-03-26 (22761) 63MB -
4.0/candidate: 4.0.9-dea944b 2022-09-27 (23697) 65MB -
4.0/beta: ↑
4.0/edge: git-407205d 2022-03-31 (22805) 65MB -
3.0/stable: 3.0.4 2019-10-10 (11376) 49MB -
3.0/candidate: 3.0.4 2019-10-10 (11376) 49MB -
3.0/beta: ↑
3.0/edge: git-81b81b9 2019-10-10 (11378) 49MB -
installed: 5.7-c62733b (23893) 138MB in-cohort
kube05:~$ snap info lxd
snap-id: J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking: latest/stable
refresh-date: yesterday at 13:40 UTC
channels:
latest/stable: 5.7-c62733b 2022-10-29 (23893) 138MB -
latest/candidate: 5.7-c62733b 2022-10-28 (23893) 138MB -
latest/beta: ↑
latest/edge: git-ff53106 2022-10-28 (23902) 138MB -
5.7/stable: 5.7-c62733b 2022-10-29 (23893) 138MB -
5.7/candidate: 5.7-c62733b 2022-10-28 (23893) 138MB -
5.7/beta: ↑
5.7/edge: ↑
5.6/stable: 5.6-794016a 2022-09-28 (23687) 137MB -
5.6/candidate: 5.6-794016a 2022-09-23 (23687) 137MB -
5.6/beta: ↑
5.6/edge: ↑
5.5/stable: 5.5-37534be 2022-08-27 (23543) 112MB -
5.5/candidate: 5.5-37534be 2022-08-19 (23543) 112MB -
5.5/beta: ↑
5.5/edge: ↑
5.4/stable: 5.4-1ff8d34 2022-08-13 (23371) 106MB -
5.4/candidate: 5.4-3bf11b7 2022-08-12 (23456) 107MB -
5.4/beta: ↑
5.4/edge: ↑
5.3/stable: 5.3-91e042b 2022-07-06 (23274) 106MB -
5.3/candidate: 5.3-91e042b 2022-07-03 (23274) 106MB -
5.3/beta: ↑
5.3/edge: ↑
5.0/stable: 5.0.1-9dcf35b 2022-08-24 (23545) 106MB -
5.0/candidate: 5.0.1-9dcf35b 2022-08-19 (23545) 106MB -
5.0/beta: ↑
5.0/edge: git-13e1e53 2022-08-21 (23567) 111MB -
4.0/stable: 4.0.9-8e2046b 2022-03-26 (22761) 63MB -
4.0/candidate: 4.0.9-dea944b 2022-09-27 (23697) 65MB -
4.0/beta: ↑
4.0/edge: git-407205d 2022-03-31 (22805) 65MB -
3.0/stable: 3.0.4 2019-10-10 (11376) 49MB -
3.0/candidate: 3.0.4 2019-10-10 (11376) 49MB -
3.0/beta: ↑
3.0/edge: git-81b81b9 2019-10-10 (11378) 49MB -
installed: 5.7-c62733b (23893) 138MB in-cohort
kube06:~$ snap info lxd
snap-id: J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking: latest/stable
refresh-date: yesterday at 16:35 CEST
channels:
latest/stable: 5.7-c62733b 2022-10-29 (23893) 138MB -
latest/candidate: 5.7-c62733b 2022-10-28 (23893) 138MB -
latest/beta: ↑
latest/edge: git-ff53106 2022-10-28 (23902) 138MB -
5.7/stable: 5.7-c62733b 2022-10-29 (23893) 138MB -
5.7/candidate: 5.7-c62733b 2022-10-28 (23893) 138MB -
5.7/beta: ↑
5.7/edge: ↑
5.6/stable: 5.6-794016a 2022-09-28 (23687) 137MB -
5.6/candidate: 5.6-794016a 2022-09-23 (23687) 137MB -
5.6/beta: ↑
5.6/edge: ↑
5.5/stable: 5.5-37534be 2022-08-27 (23543) 112MB -
5.5/candidate: 5.5-37534be 2022-08-19 (23543) 112MB -
5.5/beta: ↑
5.5/edge: ↑
5.4/stable: 5.4-1ff8d34 2022-08-13 (23371) 106MB -
5.4/candidate: 5.4-3bf11b7 2022-08-12 (23456) 107MB -
5.4/beta: ↑
5.4/edge: ↑
5.3/stable: 5.3-91e042b 2022-07-06 (23274) 106MB -
5.3/candidate: 5.3-91e042b 2022-07-03 (23274) 106MB -
5.3/beta: ↑
5.3/edge: ↑
5.0/stable: 5.0.1-9dcf35b 2022-08-24 (23545) 106MB -
5.0/candidate: 5.0.1-9dcf35b 2022-08-19 (23545) 106MB -
5.0/beta: ↑
5.0/edge: git-13e1e53 2022-08-21 (23567) 111MB -
4.0/stable: 4.0.9-8e2046b 2022-03-26 (22761) 63MB -
4.0/candidate: 4.0.9-dea944b 2022-09-27 (23697) 65MB -
4.0/beta: ↑
4.0/edge: git-407205d 2022-03-31 (22805) 65MB -
3.0/stable: 3.0.4 2019-10-10 (11376) 49MB -
3.0/candidate: 3.0.4 2019-10-10 (11376) 49MB -
3.0/beta: ↑
3.0/edge: git-81b81b9 2019-10-10 (11378) 49MB -
installed: 5.7-c62733b (23893) 138MB in-cohort
tomp
(Thomas Parrott)
October 30, 2022, 5:19pm
7
Ah ok that missing machine will be the problem then.
Are you happy to remove it and its instances permanently from the cluster?
tomp
(Thomas Parrott)
October 30, 2022, 5:23pm
8
jrock
(john)
October 30, 2022, 5:36pm
9
yes but the “lxc cluster remove --force kube02” just keeps running without any output
tomp
(Thomas Parrott)
October 31, 2022, 1:32pm
10
I see, yes that makes sense as LXD isn’t able to start properly.
Please can you show the output of sudo lxd sql global 'select * from nodes'
?
1 Like
jrock
(john)
October 31, 2022, 5:36pm
11
±—±-------±------------±--------------------±-------±---------------±-------------------------------±------±-----±------------------+
| id | name | description | address | schema | api_extensions | heartbeat | state | arch | failure_domain_id |
±—±-------±------------±--------------------±-------±---------------±-------------------------------±------±-----±------------------+
| 1 | kube01 | | 192.168.178.70:8443 | 66 | 332 | 2022-10-31T17:36:22.632751814Z | 0 | 4 | |
| 2 | kube02 | | 192.168.178.78:8443 | 66 | 327 | 2022-10-17T21:04:55.763750682Z | 0 | 4 | |
| 3 | kube05 | | 192.168.178.74:8443 | 66 | 332 | 2022-10-31T17:36:24.060297351Z | 0 | 4 | |
| 4 | kube06 | | 192.168.178.69:8443 | 66 | 332 | 2022-10-30T18:41:40.399802969Z | 0 | 4 | |
| 5 | kube04 | | 192.168.178.73:8443 | 66 | 332 | 2022-10-31T17:36:23.553059306Z | 0 | 4 | |
| 6 | kube03 | | 192.168.178.79:8443 | 66 | 332 | 2022-10-31T17:36:25.58072139Z | 0 | 4 | |
±—±-------±------------±--------------------±-------±---------------±-------------------------------±------±-----±------------------+
tomp
(Thomas Parrott)
October 31, 2022, 9:24pm
12
Thanks, can you try running:
sudo lxd sql global 'DELETE FROM nodes WHERE name = "kube02"'
This should then allow the cluster to recover.
jrock
(john)
November 1, 2022, 4:19am
13
Great thanks alot! That recovered the cluster!
If I want change the IP addresses of some members of the cluster would then for each node
sudo lxd sql global "update nodes set address=10.100.100.2 where id =1"
be the only config I would need to change?
1 Like
stgraber
(Stéphane Graber)
November 1, 2022, 8:36am
14
No, address changes are quite a bit more difficult to do as they’re not just stored in the LXD database but also in the local database and in the dqlite headers.
How to recover a cluster - LXD documentation covers what needs to be done for that part.
1 Like