I have had to shutdown a 3 machine cluster a couple of times recently due to power outages and/or scheduled outages. In both cases I was able to gracefully shutdown all three machines, but after bringing them back up I am noticing weird issues and errors in LXC commands that deal with the cluster/raft databases.
For example:
$ lxc list
Error: failed to begin transaction: not an error
$ lxc cluster list
Error: failed to begin transaction: not an error
It seems in this situation that at least one machine does respond to the ‘lxc cluster list’ command, but it shows no database nodes:
$ lxc cluster list
±---------------±------------------------±---------±-------±------------------+
| NAME | URL | DATABASE | STATE | MESSAGE |
±---------------±------------------------±---------±-------±------------------+
| node-ctl01 | https://10.0.5.190:8443 | NO | ONLINE | fully operational |
±---------------±------------------------±---------±-------±------------------+
| node03 | https://10.0.5.203:8443 | NO | ONLINE | fully operational |
±---------------±------------------------±---------±-------±------------------+
| node04 | https://10.0.5.204:8443 | NO | ONLINE | fully operational |
±---------------±------------------------±---------±-------±------------------+
I have already done ‘snap restart lxd’ on all machines multiple times, but it didn’t fix anything.
$ snap version
snap 2.42.5
snapd 2.42.5
series 16
ubuntu 18.04
kernel 4.15.0-66-generic
$ snap info lxd
name: lxd
…
snap-id: J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking: stable
refresh-date: 43 days ago, at 18:38 PST
…
installed: 3.18 (12631) 57MB -
They appear to be doing heartbeats just fine:
$ lxc monitor --pretty --type=logging
DBUG[01-14|11:37:40] New event listener: 89460967-cdca-4dc9-9512-64b4c5e8bf0b
DBUG[01-14|11:37:43] Starting heartbeat round
DBUG[01-14|11:37:43] Heartbeat updating local raft nodes to [{ID:1 Address:10.0.5.203:8443} {ID:2 Address:10.0.5.204:8443} {ID:3 Address:10.0.5.190:8443}]
DBUG[01-14|11:37:45] Sending heartbeat to 10.0.5.190:8443
DBUG[01-14|11:37:45] Sending heartbeat request to 10.0.5.190:8443
DBUG[01-14|11:37:45] Successful heartbeat for 10.0.5.190:8443
DBUG[01-14|11:37:49] Sending heartbeat to 10.0.5.203:8443
DBUG[01-14|11:37:49] Sending heartbeat request to 10.0.5.203:8443
DBUG[01-14|11:37:49] Successful heartbeat for 10.0.5.203:8443
DBUG[01-14|11:37:49] Completed heartbeat round
DBUG[01-14|11:37:53] Starting heartbeat round
DBUG[01-14|11:37:53] Heartbeat updating local raft nodes to [{ID:1 Address:10.0.5.203:8443} {ID:2 Address:10.0.5.204:8443} {ID:3 Address:10.0.5.190:8443}]
DBUG[01-14|11:37:56] Sending heartbeat request to 10.0.5.203:8443
DBUG[01-14|11:37:56] Sending heartbeat to 10.0.5.203:8443
DBUG[01-14|11:37:56] Successful heartbeat for 10.0.5.203:8443
DBUG[01-14|11:38:00] Sending heartbeat to 10.0.5.190:8443
DBUG[01-14|11:38:00] Sending heartbeat request to 10.0.5.190:8443
DBUG[01-14|11:38:00] Successful heartbeat for 10.0.5.190:8443
DBUG[01-14|11:38:00] Completed heartbeat round
DBUG[01-14|11:38:03] Starting heartbeat round
DBUG[01-14|11:38:03] Heartbeat updating local raft nodes to [{ID:1 Address:10.0.5.203:8443} {ID:2 Address:10.0.5.204:8443} {ID:3 Address:10.0.5.190:8443}]
Any idea what I should try next?