When I reboot/turn off LXD host 1, the cluster is reporting:
lxc cluster list
Error: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found
How can I set a leader (LXD host 3?) What I also noticed, host 2 has no database. This is normal?
The fact that host-2 is not a database node is not normal, and might be a bug.
Usually with 3 nodes you have 3 database nodes, so if you reboot/shutdown one of the three nodes, the other two remain fully operational. In your case you have only 2 database nodes, so turning off one node makes the cluster unavailable.
On all nodes snap:
lxc --version
3.10
lxd --version
3.10
When host-1 is offline:
DBUG[02-18|09:48:50] Start database node id=3 address=192.168.100.3:8443
EROR[02-18|09:48:55] Failed to start the daemon: Failed to create raft factory: failed to create bolt store for raft logs: timeout
INFO[02-18|09:48:55] Starting shutdown sequence
DBUG[02-18|09:48:55] Not unmounting temporary filesystems (containers are still running)
INFO[02-18|09:48:55] Saving simplestreams cache
INFO[02-18|09:48:55] Saved simplestreams cache
Error: Failed to create raft factory: failed to create bolt store for raft logs: timeout
Do you remember how you built this cluster? I would expect that you added and removed some nodes at some point. It would be useful to know the detail lifecycle of the cluster, wrt node added and node removed.
lxd init on the host-1 as first one in the cluster with the default instructions. (LXD 3.1)
Reinstalled host-2 yesterday (with --force removal)
host-3 untouched since the cluster join.
apt-get update -y
apt-get upgrade -y
adduser user
apt remove --purge lxd lxd-client
groupadd --system lxd
usermod -G lxd -a user
snap install lxd
apt install zfsutils-linux
lxd init as sudo user
lxc remote add host-1 192.168.100.1
lxc remote add host-2 192.168.100.2
lxc remote add host-3 192.168.100.3
I’m confused by the fact that you mention both apt and snap. Are you using apt or snap? Also, was this cluster always at version 3.10 or did you upgrade from 3.0.x?
After this lxc cluster list shows that all nodes are database nodes.
If you don’t have containers that you care about on host-2, the simplest solution would be to remove host-2, wipe it and join it again. Not sure what went wrong the first time.
Otherwise, if you have containers on node-2 that you want to preserve, we’ll need to figure out some manual repairing, but that might be tricky.