I’m currently administering an 11-node lxd cluster (Ubuntu 18.04 with LXD 3.18),
and I get the following when I run lxc cluster ls
:
$ lxc cluster ls
+---------+--------------------------+----------+--------+-------------------+
| NAME | URL | DATABASE | STATE | MESSAGE |
+---------+--------------------------+----------+--------+-------------------+
| chino | https://172.16.0.6:8443 | NO | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
| cocoa | https://172.16.0.7:8443 | NO | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
| hitagi | https://172.16.0.13:8443 | YES | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
| mayoi | https://172.16.0.16:8443 | NO | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
| nadeko | https://172.16.0.18:8443 | NO | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
| rize | https://172.16.0.8:8443 | NO | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
| shinobu | https://172.16.0.15:8443 | NO | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
| suruga | https://172.16.0.17:8443 | NO | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
| tippy | https://172.16.0.5:8443 | NO | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
| tsubasa | https://172.16.0.14:8443 | YES | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
| tsukihi | https://172.16.0.20:8443 | NO | ONLINE | fully operational |
+---------+--------------------------+----------+--------+-------------------+
From what I gather from https://discuss.linuxcontainers.org/t/when-turning-off-the-first-lxd-cluster-node-no-available-dqlite-leader-server-found/4084,
I’m assuming it’s not intended behavior for only
two database nodes to be present in a cluster.
As reported in linked post, turning off a database node results in lxc
commands
becoming unresponsive on all nodes with similar error messages.
I can’t recall the exact sequence in which I setup the cluster, but from what I remember
I suspect it is related to the (planned) third node hanging when trying to turn it into
a cluster node via lxd init --preseed
.
The hanged node was unresponsive to all the usual attempts to fix it
(stopping lxd, uninstalling it via snap), so we eventually reinstalled ubuntu from scratch.
I suspect this hang is due to how having an empty host in core.https_address (i.e. lxc config set core.https_address :8443
rather than lxc config set core.https_address 172.16.0.14:8443
) is valid for non-cluster nodes but invalid for cluster nodes, and
running lxd init --preseed
with a cluster config on a node with such a config hangs lxd.
I haven’t got around to reproducing this, so I’ll update this post once I do.
In the meantime, I would love to be able to restore the third database node; can anybody help?