Hangs when joining 3rd lxd node to 2 node cluster

MrClayPole · May 2, 2020, 2:25pm

When running the “sudo lxd init” with the snap version of LXD 4.0.1 with the setting below

> ubuntu@cn-svr-man-03 ~> sudo lxd init

Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=cn-svr-man-03]: 
What IP address or DNS name should be used to reach this node? [default=2a02:8010:6538:112:60af:98ff:fe22:ec95]: 172.28.185.42
Are you joining an existing cluster? (yes/no) [default=no]: yes
IP address or FQDN of an existing cluster node: 172.28.185.40
Cluster fingerprint: 3ba8cdfbe591d79423bd55135316974db6cf5c764071ad9d85abc1a91113306c
You can validate this fingerprint by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password: 
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "source" property for storage pool "local": rpool/lxd
Choose "zfs.pool_name" property for storage pool "local": 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

It hangs indefinitely the output of the logs from the joining node is below.

ubuntu@cn-svr-man-03 ~> sudo cat /var/snap/lxd/common/lxd/logs/lxd.log
t=2020-05-02T15:14:42+0100 lvl=info msg=“LXD 4.0.1 is starting in normal mode” path=/var/snap/lxd/common/lxd
t=2020-05-02T15:14:42+0100 lvl=info msg=“Kernel uid/gid map:”
t=2020-05-02T15:14:42+0100 lvl=info msg=" - u 0 0 4294967295"
t=2020-05-02T15:14:42+0100 lvl=info msg=" - g 0 0 4294967295"
t=2020-05-02T15:14:42+0100 lvl=info msg=“Configured LXD uid/gid map:”
t=2020-05-02T15:14:42+0100 lvl=info msg=" - u 0 1000000 1000000000"
t=2020-05-02T15:14:42+0100 lvl=info msg=" - g 0 1000000 1000000000"
t=2020-05-02T15:14:42+0100 lvl=info msg=“Kernel features:”
t=2020-05-02T15:14:42+0100 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2020-05-02T15:14:42+0100 lvl=info msg=" - uevent injection: yes"
t=2020-05-02T15:14:42+0100 lvl=info msg=" - seccomp listener: yes"
t=2020-05-02T15:14:42+0100 lvl=info msg=" - seccomp listener continue syscalls: yes"
t=2020-05-02T15:14:42+0100 lvl=info msg=" - unprivileged file capabilities: yes"
t=2020-05-02T15:14:42+0100 lvl=info msg=" - cgroup layout: hybrid"
t=2020-05-02T15:14:42+0100 lvl=warn msg=" - Couldn’t find the CGroup blkio.weight, I/O weight limits will be ignored"
t=2020-05-02T15:14:42+0100 lvl=warn msg=" - Couldn’t find the CGroup memory swap accounting, swap limits will be ignored"
t=2020-05-02T15:14:42+0100 lvl=info msg=" - shiftfs support: disabled"
t=2020-05-02T15:14:42+0100 lvl=info msg=“Initializing local database”
t=2020-05-02T15:14:42+0100 lvl=info msg=“Starting /dev/lxd handler:”
t=2020-05-02T15:14:42+0100 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2020-05-02T15:14:42+0100 lvl=info msg=“REST API daemon:”
t=2020-05-02T15:14:42+0100 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2020-05-02T15:14:42+0100 lvl=info msg=“Initializing global database”
t=2020-05-02T15:14:45+0100 lvl=eror msg=“Firewall nftables unable to parse existing ruleset: invalid character ‘^’ looking for beginning of value”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Firewall loaded driver "xtables"”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Initializing storage pools”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Initializing daemon storage mounts”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Initializing networks”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Pruning leftover image files”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Done pruning leftover image files”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Loading daemon configuration”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Started seccomp handler” path=/var/snap/lxd/common/lxd/seccomp.socket
t=2020-05-02T15:14:45+0100 lvl=info msg=“Pruning expired images”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Done pruning expired images”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Pruning expired instance backups”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Done pruning expired instance backups”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Updating images”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Done updating images”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Expiring log files”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Done expiring log files”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Updating instance types”
t=2020-05-02T15:14:45+0100 lvl=info msg=“Done updating instance types”
t=2020-05-02T15:15:23+0100 lvl=info msg=“Update network address”
t=2020-05-02T15:15:23+0100 lvl=info msg=" - binding TCP socket" socket=172.28.185.42:8443
t=2020-05-02T15:15:26+0100 lvl=eror msg=“Failed to get leader node address: Node is not clustered”
t=2020-05-02T15:15:26+0100 lvl=info msg=“Joining dqlite raft cluster” address=172.28.185.42:8443 id=4 role=voter
t=2020-05-02T15:15:26+0100 lvl=warn msg=“Dqlite client proxy TLS → Unix: read tcp 172.28.185.42:60556->172.28.185.40:8443: use of closed network connection”
t=2020-05-02T15:15:26+0100 lvl=warn msg=“Dqlite client proxy TLS → Unix: read tcp 172.28.185.42:43940->172.28.185.41:8443: use of closed network connection”
t=2020-05-02T15:15:45+0100 lvl=warn msg=“Dqlite: server unavailable err=failed to establish network connection: 503 Service Unavailable address=172.28.185.40:8443 attempt=0”
t=2020-05-02T15:15:45+0100 lvl=warn msg=“Dqlite: server unavailable err=failed to establish network connection: 503 Service Unavailable address=172.28.185.42:8443 attempt=0”
t=2020-05-02T15:15:45+0100 lvl=warn msg=“Dqlite: server unavailable err=failed to establish network connection: 503 Service Unavailable address=172.28.185.40:8443 attempt=1”
t=2020-05-02T15:15:46+0100 lvl=warn msg=“Dqlite: server unavailable err=failed to establish network connection: 503 Service Unavailable address=172.28.185.42:8443 attempt=1”
t=2020-05-02T15:15:46+0100 lvl=warn msg=“Dqlite: server unavailable err=failed to establish network connection: 503 Service Unavailable address=172.28.185.40:8443 attempt=2”
t=2020-05-02T15:15:46+0100 lvl=warn msg=“Dqlite: server unavailable err=failed to establish network connection: 503 Service Unavailable address=172.28.185.42:8443 attempt=2”
t=2020-05-02T15:15:47+0100 lvl=warn msg=“Dqlite: server unavailable err=failed to establish network connection: 503 Service Unavailable address=172.28.185.40:8443 attempt=3”
t=2020-05-02T15:15:47+0100 lvl=warn msg=“Dqlite: server unavailable err=failed to establish network connection: 503 Service Unavailable address=172.28.185.42:8443 attempt=3”

both Nodes have open ports for 8443 on the IPv4 address and can ping each other so not sure why they are failing to communicate.

freeekanayaka · May 4, 2020, 10:53am

The only explanation I can think of is that neither 172.28.185.40 (one of your existing nodes) nor 172.28.185.42 is the leader, and 172.28.185.41 (which I presume would be your other existing node) is not being considered as possible leader because it should still be a stand-by node.

Assuming that’s the explanation (again, I’m not sure it is), the question is how did you end up there, since it shouldn’t be possible, because in the moment you join the 3rd node you should have just one leader, which is the initial cluster node.

In any case, I’ve added some fixes to LXD and dqlite that should handle this situation as well, hopefully that will help in the future (they’ll be available in 4.0.2 and 4.1).

For your situation, I recommend to destroy the cluster and start over from scratch, taking note of all the commands you run from the very start. I’d be surprised if it happens again, but in that case at least we’d have more details.

If starting from scratch is not an option, let me know and we’ll see if it’s possible to recover your cluster.

MrClayPole · May 6, 2020, 4:02pm

I would like to try and recover the cluster as I have 18 containers currently configured and would like to avoid getting them imported to a new cluster. If waiting until 4.0.2 will possibly fix this then I can wait. Do you know when its likely to be released?

freeekanayaka · May 6, 2020, 8:26pm

First we need to check that your cluster didn’t get into some inconsistent state. What does:

lxd sql local "SELECT * FROM raft_nodes"

return? On the two nodes that currently belong to the cluster.

MrClayPole · May 6, 2020, 9:08pm

172.28.185.40
lxd sql local “SELECT * FROM raft_nodes”
±—±-------------------±-----+
| id | address | role |
±—±-------------------±-----+
| 3 | 172.28.185.40:8443 | 0 |
| 4 | 172.28.185.41:8443 | 0 |
±—±-------------------±-----+

172.28.185.41
lxd sql local “SELECT * FROM raft_nodes;”
±—±-------------------±-----+ | id | address | role |
±—±-------------------±-----+
| 3 | 172.28.185.40:8443 | 0 |
| 4 | 172.28.185.41:8443 | 0 |
±—±-------------------±-----+

freeekanayaka · May 6, 2020, 9:39pm

This seems fine.

What about:

lxd sql global "SELECT * FROM nodes"

?

If it shows a spurious entry for the third node (172.28.185.42), you could try clean it up with:

lxd sql global "DELETE FROM nodes WHERE address = '172.28.185.42'

and then just attempt to join the third node (172.28.185.42) again and see how it goes.

MrClayPole · May 6, 2020, 10:14pm

Thanks for you help with this.

lxd sql global “SELECT * FROM nodes” ±—±--------------±------------±-------------------±-------±---------------±------------------------------------±--------±-----+
| id | name | description | address | schema | api_extensions | heartbeat | pending | arch |
±—±--------------±------------±-------------------±-------±---------------±------------------------------------±--------±-----+
| 2 | cn-svr-man-01 | | 172.28.185.40:8443 | 28 | 185 | 2020-05-06T23:13:02.335024044+01:00 | 0 | 2 |
| 3 | cn-svr-man-02 | | 172.28.185.41:8443 | 28 | 185 | 2020-05-06T23:13:02.335293124+01:00 | 0 | 2 |
±—±--------------±------------±-------------------±-------±---------------±------------------------------------±--------±-----+

freeekanayaka · May 7, 2020, 9:02am

Ok, this looks fine too, you don’t have spurious entries for 172.28.185.42.

You can try to wipe completely 172.28.185.42 (e.g. with snap remove lxd; snap install lxd) and then attempt again to join it to the cluster.

The only non-normal thing that you have is that both nodes in your current cluster have the voter role, which usually isn’t the case for a 2-node cluster. It shouldn’t be a problem though.

MrClayPole · May 7, 2020, 10:35am

Sorry I’ve run snap remove lxd; snap install lxd many times
while troubleshooting this and it always hangs during the “sudo
lxd init”

freeekanayaka · May 7, 2020, 11:00am

Okay, that means I might be able to reproduce the issue. I’ll do a few tests and get back to you.

freeekanayaka · May 7, 2020, 12:03pm

It seems I can reproduce it. I will work on a fix.

freeekanayaka · May 7, 2020, 2:34pm

Actually I take it back, at least partially, as I’m not quite able to reproduce your exact situation. Please can you try to:

Remove the lxd snap from your third node 172.28.185.42 which does not belong to the cluster
Stop both your two current cluster nodes (172.28.185.40 and 172.28.185.41), using systemctl stop snap.lxd.daemon (and then make sure that the daemon is not running anymore, e.g. with ps aux | grep lxd)
Make sure that debug logging is enabled on both your two cluster nodes, snap set lxd daemon.debug=true.
Start again lxd on those two nodes, using systemctl start snap.lxd.daemon
Install again the snap on the third node, set debug mode on it (snap set lxd daemon.debug=true) and try again to join.

Assuming that joining fails, please can you send me the logs of all three nodes? You find them under /var/snap/lxd/common/lxd/logs/lxd.log.

MrClayPole · May 8, 2020, 10:17am

The cluster join worked this time. the existing node
172.28.185.41 did hang “sudo systemctl stop snap.lxd.daemon” and
after leaving it for 10 minutes I ended up having to kill the PID
before I could start it.

  How long does a cluster during start up have to get quorum? I

originally had a 2 node cluster which I now no is bad practices
and why I was trying to add a 3rd. Is it possible that if a 2 node
cluster is not started at the same time them it’ll fail to get
quorum. What happens when a single node is rebooted in a 2 node
cluster?

Thanks
Mat

freeekanayaka · May 8, 2020, 11:57am

Normally in a 2-node cluster you get only one voting database node: you start a new cluster with a single node which will be your voting database node, then you add a second node which will not be a voting database node immediately. The second node is promoted to be a voting database node only when you add a third node, at that point you have 3 voting database nodes.

I’m not sure what happened in your situation, probably something went wrong when you added the third node the first time.

MrClayPole · May 9, 2020, 1:15pm

Thanks for your help on getting me up and running. I think I was reboot each of the 2 nodes independently and then losing the cluster leader. The strange thing is I was still able to manage the containers with the 2 node clusters so looks like I was losing a cluster leader but not quorum.

chevrox · September 27, 2021, 7:14am

Encountering the same issue. Following @freeekanayaka 's instructions, here are the logs:
(Note: using macvlan instead of bridge, and the port for each node is 8343 instead of 8443. Yet somehow LXD still tries 8443 and the connection gets refused. Firewall allows both 8343 and 8443).

Node 1 (10.0.0.150, debixps)

t=2021-09-26T23:48:40-0700 lvl=dbug msg="Sending heartbeat request" address=10.0.0.100:8343
t=2021-09-26T23:48:40-0700 lvl=dbug msg="Successful heartbeat" address=10.0.0.100:8343
t=2021-09-26T23:48:40-0700 lvl=dbug msg="Completed heartbeat round" address=10.0.0.150:8343 duration=912.109428ms
t=2021-09-26T23:48:49-0700 lvl=dbug msg="Database error: &errors.errorString{s:\"No such object\"}" 
t=2021-09-26T23:48:49-0700 lvl=warn msg="Failed to retrieve cluster member" err="No such object"
t=2021-09-26T23:48:49-0700 lvl=dbug msg="Starting heartbeat round" address=10.0.0.150:8343 mode=normal
t=2021-09-26T23:48:49-0700 lvl=dbug msg="Heartbeat updating local raft members" members="[{NodeInfo:{ID:1 Address:10.0.0.150:8343 Role:voter} Name:debixps} {NodeInfo:{ID:2 Address:10.0.0.100:8343 Role:stand-by} Name:debixxx} {NodeInfo:{ID:3 Address:10.0.0.11:8343 Role:spare} Name:} {NodeInfo:{ID:5 Address:10.0.0.11:8443 Role:spare} Name:}]"
t=2021-09-26T23:48:49-0700 lvl=eror msg="Unaccounted raft node(s) not found in 'nodes' table for heartbeat: {NodeInfo:{ID:5 Address:10.0.0.11:8443 Role:spare} Name:}" 
t=2021-09-26T23:49:40-0700 lvl=dbug msg="Sending heartbeat request" address=10.0.0.100:8343
t=2021-09-26T23:49:40-0700 lvl=dbug msg="Successful heartbeat" address=10.0.0.100:8343
t=2021-09-26T23:49:40-0700 lvl=dbug msg="Completed heartbeat round" address=10.0.0.150:8343 duration=1.321426187s
t=2021-09-26T23:49:46-0700 lvl=dbug msg="Matched trusted cert" fingerprint=bd8b3ae22ab3898bd2d8a01e387b003e4821c04ac3a67f64d3bf41620fb2db94 subject="CN=root@debixxx,O=linuxcontainers.org"
t=2021-09-26T23:49:47-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:47-0700 lvl=dbug msg=Handling ip=10.0.0.11:51920 method=POST protocol=cluster url=/internal/cluster/accept username=
t=2021-09-26T23:49:47-0700 lvl=dbug msg="\n\t{\n\t\t\"name\": \"debixx\",\n\t\t\"address\": \"10.0.0.11:8343\",\n\t\t\"schema\": 50,\n\t\t\"api\": 263,\n\t\t\"storage_pools\": [\n\t\t\t{\n\t\t\t\t\"config\": {\n\t\t\t\t\t\"size\": \"30GB\",\n\t\t\t\t\t\"source\": \"/var/snap/lxd/common/lxd/disks/steam.img\"\n\t\t\t\t},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"steam\",\n\t\t\t\t\"driver\": \"btrfs\",\n\t\t\t\t\"used_by\": null,\n\t\t\t\t\"status\": \"Created\",\n\t\t\t\t\"locations\": [\n\t\t\t\t\t\"none\"\n\t\t\t\t]\n\t\t\t}\n\t\t],\n\t\t\"networks\": [\n\t\t\t{\n\t\t\t\t\"config\": {\n\t\t\t\t\t\"ipv4.address\": \"10.233.90.1/24\",\n\t\t\t\t\t\"ipv4.nat\": \"true\",\n\t\t\t\t\t\"ipv6.address\": \"fd42:9627:a169:3307::1/64\",\n\t\t\t\t\t\"ipv6.nat\": \"true\"\n\t\t\t\t},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"lxdbr0\",\n\t\t\t\t\"type\": \"bridge\",\n\t\t\t\t\"Project\": \"default\"\n\t\t\t},\n\t\t\t{\n\t\t\t\t\"config\": {\n\t\t\t\t\t\"ipv4.address\": \"10.221.12.1/24\",\n\t\t\t\t\t\"ipv4.nat\": \"true\",\n\t\t\t\t\t\"ipv6.address\": \"fd42:62b4:4f9c:24d2::1/64\",\n\t\t\t\t\t\"ipv6.nat\": \"true\"\n\t\t\t\t},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"lxdvlan\",\n\t\t\t\t\"type\": \"bridge\",\n\t\t\t\t\"Project\": \"default\"\n\t\t\t},\n\t\t\t{\n\t\t\t\t\"config\": {\n\t\t\t\t\t\"parent\": \"eno1\"\n\t\t\t\t},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"lxdvlan1\",\n\t\t\t\t\"type\": \"macvlan\",\n\t\t\t\t\"Project\": \"default\"\n\t\t\t}\n\t\t],\n\t\t\"architecture\": 2\n\t}" 
t=2021-09-26T23:49:47-0700 lvl=dbug msg="Database error: &errors.errorString{s:\"No such object\"}" 
t=2021-09-26T23:49:47-0700 lvl=warn msg="Failed to retrieve cluster member" err="No such object"
t=2021-09-26T23:49:47-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:49-0700 lvl=dbug msg="Database error: &errors.errorString{s:\"No such object\"}" 
t=2021-09-26T23:49:49-0700 lvl=warn msg="Failed to retrieve cluster member" err="No such object"
t=2021-09-26T23:49:49-0700 lvl=dbug msg="Starting heartbeat round" address=10.0.0.150:8343 mode=normal
t=2021-09-26T23:49:49-0700 lvl=dbug msg="Heartbeat updating local raft members" members="[{NodeInfo:{ID:1 Address:10.0.0.150:8343 Role:voter} Name:debixps} {NodeInfo:{ID:2 Address:10.0.0.100:8343 Role:stand-by} Name:debixxx} {NodeInfo:{ID:3 Address:10.0.0.11:8343 Role:spare} Name:} {NodeInfo:{ID:5 Address:10.0.0.11:8443 Role:spare} Name:}]"
t=2021-09-26T23:49:49-0700 lvl=eror msg="Unaccounted raft node(s) not found in 'nodes' table for heartbeat: {NodeInfo:{ID:5 Address:10.0.0.11:8443 Role:spare} Name:}" 
t=2021-09-26T23:50:41-0700 lvl=dbug msg="Sending heartbeat request" address=10.0.0.100:8343
t=2021-09-26T23:50:41-0700 lvl=dbug msg="Successful heartbeat" address=10.0.0.100:8343
t=2021-09-26T23:50:41-0700 lvl=dbug msg="Completed heartbeat round" address=10.0.0.150:8343 duration=2.067227214s
t=2021-09-26T23:50:44-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:50:49-0700 lvl=dbug msg="Database error: &errors.errorString{s:\"No such object\"}" 
t=2021-09-26T23:50:49-0700 lvl=warn msg="Failed to retrieve cluster member" err="No such object"
t=2021-09-26T23:50:49-0700 lvl=dbug msg="Starting heartbeat round" address=10.0.0.150:8343 mode=normal
t=2021-09-26T23:50:49-0700 lvl=dbug msg="Heartbeat updating local raft members" members="[{NodeInfo:{ID:1 Address:10.0.0.150:8343 Role:voter} Name:debixps} {NodeInfo:{ID:2 Address:10.0.0.100:8343 Role:stand-by} Name:debixxx} {NodeInfo:{ID:3 Address:10.0.0.11:8343 Role:spare} Name:} {NodeInfo:{ID:5 Address:10.0.0.11:8443 Role:spare} Name:}]"
t=2021-09-26T23:50:49-0700 lvl=eror msg="Unaccounted raft node(s) not found in 'nodes' table for heartbeat: {NodeInfo:{ID:5 Address:10.0.0.11:8443 Role:spare} Name:}"

Node 2 (10.0.0.100, debixxx)

t=2021-09-26T23:48:40-0700 lvl=dbug msg="Matched trusted cert" fingerprint=9d5535dd1a019de837603eb24506c3e8d845e34a6569b1fc8528e923fec6f4d9 subject="CN=root@debixps,O=linuxcontainers.org"
t=2021-09-26T23:48:40-0700 lvl=dbug msg="Replace current raft nodes with [{NodeInfo:{ID:1 Address:10.0.0.150:8343 Role:voter} Name:debixps} {NodeInfo:{ID:2 Address:10.0.0.100:8343 Role:stand-by} Name:debixxx}]" 
t=2021-09-26T23:49:40-0700 lvl=dbug msg="Matched trusted cert" fingerprint=9d5535dd1a019de837603eb24506c3e8d845e34a6569b1fc8528e923fec6f4d9 subject="CN=root@debixps,O=linuxcontainers.org"
t=2021-09-26T23:49:40-0700 lvl=dbug msg="Replace current raft nodes with [{NodeInfo:{ID:1 Address:10.0.0.150:8343 Role:voter} Name:debixps} {NodeInfo:{ID:2 Address:10.0.0.100:8343 Role:stand-by} Name:debixxx}]" 
t=2021-09-26T23:49:45-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:45-0700 lvl=dbug msg=Handling ip=10.0.0.11:46174 method=GET protocol=cluster url=/1.0 username=
t=2021-09-26T23:49:45-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:45-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:45-0700 lvl=dbug msg=Handling ip=10.0.0.11:46176 method=GET protocol=cluster url=/1.0/certificates/1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 username=
t=2021-09-26T23:49:45-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:45-0700 lvl=dbug msg=Handling ip=10.0.0.11:46178 method=GET protocol=cluster url=/1.0 username=
t=2021-09-26T23:49:45-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:45-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:45-0700 lvl=dbug msg=Handling ip=10.0.0.11:46180 method=GET protocol=cluster url="/1.0/storage-pools?recursion=1" username=
t=2021-09-26T23:49:45-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:45-0700 lvl=dbug msg=Handling ip=10.0.0.11:46182 method=GET protocol=cluster url="/1.0/projects?recursion=1" username=
t=2021-09-26T23:49:45-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:45-0700 lvl=dbug msg=Handling ip=10.0.0.11:46184 method=GET protocol=cluster url="/1.0/networks?project=default&recursion=1" username=
t=2021-09-26T23:49:46-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:46-0700 lvl=dbug msg=Handling ip=10.0.0.11:46186 method=POST protocol=cluster url=/internal/cluster/accept username=
t=2021-09-26T23:49:46-0700 lvl=dbug msg="\n\t{\n\t\t\"name\": \"debixx\",\n\t\t\"address\": \"10.0.0.11:8343\",\n\t\t\"schema\": 50,\n\t\t\"api\": 263,\n\t\t\"storage_pools\": [\n\t\t\t{\n\t\t\t\t\"config\": {\n\t\t\t\t\t\"size\": \"30GB\",\n\t\t\t\t\t\"source\": \"/var/snap/lxd/common/lxd/disks/steam.img\"\n\t\t\t\t},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"steam\",\n\t\t\t\t\"driver\": \"btrfs\",\n\t\t\t\t\"used_by\": null,\n\t\t\t\t\"status\": \"Created\",\n\t\t\t\t\"locations\": [\n\t\t\t\t\t\"none\"\n\t\t\t\t]\n\t\t\t}\n\t\t],\n\t\t\"networks\": [\n\t\t\t{\n\t\t\t\t\"config\": {\n\t\t\t\t\t\"ipv4.address\": \"10.233.90.1/24\",\n\t\t\t\t\t\"ipv4.nat\": \"true\",\n\t\t\t\t\t\"ipv6.address\": \"fd42:9627:a169:3307::1/64\",\n\t\t\t\t\t\"ipv6.nat\": \"true\"\n\t\t\t\t},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"lxdbr0\",\n\t\t\t\t\"type\": \"bridge\",\n\t\t\t\t\"Project\": \"default\"\n\t\t\t},\n\t\t\t{\n\t\t\t\t\"config\": {\n\t\t\t\t\t\"ipv4.address\": \"10.221.12.1/24\",\n\t\t\t\t\t\"ipv4.nat\": \"true\",\n\t\t\t\t\t\"ipv6.address\": \"fd42:62b4:4f9c:24d2::1/64\",\n\t\t\t\t\t\"ipv6.nat\": \"true\"\n\t\t\t\t},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"lxdvlan\",\n\t\t\t\t\"type\": \"bridge\",\n\t\t\t\t\"Project\": \"default\"\n\t\t\t},\n\t\t\t{\n\t\t\t\t\"config\": {\n\t\t\t\t\t\"parent\": \"eno1\"\n\t\t\t\t},\n\t\t\t\t\"description\": \"\",\n\t\t\t\t\"name\": \"lxdvlan1\",\n\t\t\t\t\"type\": \"macvlan\",\n\t\t\t\t\"Project\": \"default\"\n\t\t\t}\n\t\t],\n\t\t\"architecture\": 2\n\t}" 
t=2021-09-26T23:49:46-0700 lvl=dbug msg="Redirect member accept request to 10.0.0.150:8343" 
t=2021-09-26T23:49:47-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:49:47-0700 lvl=dbug msg=Handling ip=10.0.0.11:46190 method=GET protocol=cluster url="/1.0/certificates?recursion=1" username=
t=2021-09-26T23:49:47-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"
t=2021-09-26T23:50:41-0700 lvl=dbug msg="Matched trusted cert" fingerprint=9d5535dd1a019de837603eb24506c3e8d845e34a6569b1fc8528e923fec6f4d9 subject="CN=root@debixps,O=linuxcontainers.org"
t=2021-09-26T23:50:41-0700 lvl=dbug msg="Replace current raft nodes with [{NodeInfo:{ID:1 Address:10.0.0.150:8343 Role:voter} Name:debixps} {NodeInfo:{ID:2 Address:10.0.0.100:8343 Role:stand-by} Name:debixxx}]" 
t=2021-09-26T23:50:44-0700 lvl=dbug msg="Matched trusted cert" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5 subject="CN=root@debixx,O=linuxcontainers.org"

Node 3 (10.0.0.11, debixx, trying to join)

t=2021-09-26T23:48:43-0700 lvl=info msg="Kernel uid/gid map:" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - u 0 0 4294967295" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - g 0 0 4294967295" 
t=2021-09-26T23:48:43-0700 lvl=info msg="Configured LXD uid/gid map:" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - u 0 1000000 1000000000" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - g 0 1000000 1000000000" 
t=2021-09-26T23:48:43-0700 lvl=info msg="Kernel features:" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - closing multiple file descriptors efficiently: yes" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - netnsid-based network retrieval: yes" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - pidfds: yes" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - uevent injection: yes" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - seccomp listener: yes" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - seccomp listener continue syscalls: yes" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - seccomp listener add file descriptors: yes" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - attach to namespaces via pidfds: yes" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - safe native terminal allocation : yes" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - unprivileged file capabilities: yes" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - cgroup layout: cgroup2" 
t=2021-09-26T23:48:43-0700 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored" 
t=2021-09-26T23:48:43-0700 lvl=info msg=" - shiftfs support: disabled" 
t=2021-09-26T23:48:43-0700 lvl=info msg="Initializing local database" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Set client certificate to server certificate" fingerprint=1b257dd342bc9ea6f9615dcddf853d3365168878ef9de1069d75b5070825b2b5
t=2021-09-26T23:48:44-0700 lvl=info msg="Starting database node" address=1 id=1 role=voter
t=2021-09-26T23:48:44-0700 lvl=info msg="Starting /dev/lxd handler:" 
t=2021-09-26T23:48:44-0700 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2021-09-26T23:48:44-0700 lvl=info msg="REST API daemon:" 
t=2021-09-26T23:48:44-0700 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2021-09-26T23:48:44-0700 lvl=info msg="Initializing global database" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Firewall loaded driver" driver=nftables
t=2021-09-26T23:48:44-0700 lvl=info msg="Initializing storage pools" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Initializing daemon storage mounts" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Loading daemon configuration" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Initializing networks" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Pruning leftover image files" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Done pruning leftover image files" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Started seccomp handler" path=/var/snap/lxd/common/lxd/seccomp.socket
t=2021-09-26T23:48:44-0700 lvl=info msg="Pruning expired images" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Done pruning expired images" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Pruning expired instance backups" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Done pruning expired instance backups" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Updating images" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Updating instance types" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Done updating images" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Done updating instance types" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Pruning resolved warnings" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Expiring log files" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Done pruning resolved warnings" 
t=2021-09-26T23:48:44-0700 lvl=info msg="Done expiring log files" 
t=2021-09-26T23:49:45-0700 lvl=info msg="Update network address" 
t=2021-09-26T23:49:45-0700 lvl=info msg=" - binding TCP socket" socket=10.0.0.11:8343
t=2021-09-26T23:49:47-0700 lvl=info msg="Stop database gateway" 
t=2021-09-26T23:49:47-0700 lvl=info msg="Starting database node" address=10.0.0.11:8343 id=11 role=voter
t=2021-09-26T23:49:47-0700 lvl=info msg="Joining dqlite raft cluster" address=10.0.0.11:8343 id=11 role=voter
t=2021-09-26T23:49:47-0700 lvl=warn msg="Dqlite: attempt 1: server 10.0.0.100:8343: no known leader" 
t=2021-09-26T23:49:47-0700 lvl=warn msg="Dqlite: attempt 1: server 10.0.0.11:8343: no known leader" 
t=2021-09-26T23:49:47-0700 lvl=warn msg="Dqlite: attempt 1: server 10.0.0.11:8443: dial: Failed to connect to HTTP endpoint: dial tcp 10.0.0.11:8443: connect: connection refused" 
t=2021-09-26T23:49:47-0700 lvl=info msg="Adding node to cluster" address=10.0.0.11:8343 id=11 role=voter
t=2021-09-26T23:50:44-0700 lvl=warn msg="Dqlite: attempt 1: server 10.0.0.100:8343: no known leader" 
t=2021-09-26T23:50:44-0700 lvl=warn msg="Dqlite: attempt 1: server 10.0.0.11:8343: no known leader" 
t=2021-09-26T23:50:44-0700 lvl=warn msg="Dqlite: attempt 1: server 10.0.0.11:8443: dial: Failed to connect to HTTP endpoint: dial tcp 10.0.0.11:8443: connect: connection refused"

tomp · September 27, 2021, 10:36am

Can you show output of lxc cluster ls on Node 1 please.

chevrox · September 27, 2021, 8:49pm

@tomp here you go:

bash 5.1 :) lxc cluster ls
WARNING: cgroup v2 is not fully supported yet, proceeding with partial confinement
+---------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+
|  NAME   |           URL           | DATABASE | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+---------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+
| debixps | https://10.0.0.150:8343 | YES      | x86_64       | default        |             | ONLINE | Fully operational |
+---------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+
| debixxx | https://10.0.0.100:8343 | YES      | x86_64       | default        |             | ONLINE | Fully operational |
+---------+-------------------------+----------+--------------+----------------+-------------+--------+-------------------+

tomp · September 28, 2021, 8:11am

Thanks, can you show me lxc config show for the new member?

chevrox · September 28, 2021, 6:45pm

@tomp I can’t, since it’s always a fresh install (config: {}), and after it hangs upon lxd init when joining the cluster, subsequent lxc commands hang as well.