Lxc commands hang. Containers are not running

Hi,
Since yesterday all my containers stopped working. Restarting the server and the service did not solve the issue and no error are displayed.
Commands like lxc list hang (for at least several hours).
Help welcome :slight_smile: Thank you!

Here a partial log (after, it repeats forever)

# cat /var/snap/lxd/common/lxd/logs/lxd.log
t=2021-12-08T23:01:15+0100 lvl=info msg="LXD is starting" mode=normal path=/var/snap/lxd/common/lxd version=4.20
t=2021-12-08T23:01:15+0100 lvl=info msg="Kernel uid/gid map:" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - u 0 0 4294967295" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - g 0 0 4294967295" 
t=2021-12-08T23:01:15+0100 lvl=info msg="Configured LXD uid/gid map:" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - u 0 1000000 1000000000" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - g 0 1000000 1000000000" 
t=2021-12-08T23:01:15+0100 lvl=info msg="Kernel features:" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - closing multiple file descriptors efficiently: no" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - netnsid-based network retrieval: no" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - pidfds: no" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - core scheduling: no" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - uevent injection: no" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - seccomp listener: no" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - seccomp listener continue syscalls: no" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - seccomp listener add file descriptors: no" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - attach to namespaces via pidfds: no" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - safe native terminal allocation : yes" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - unprivileged file capabilities: yes" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - cgroup layout: hybrid" 
t=2021-12-08T23:01:15+0100 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored" 
t=2021-12-08T23:01:15+0100 lvl=info msg=" - shiftfs support: disabled" 
t=2021-12-08T23:01:16+0100 lvl=warn msg="Instance type not operational" driver=qemu err="vhost_vsock kernel module not loaded: Failed to run: modprobe -b vhost_vsock: modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy" type=virtual-machine
t=2021-12-08T23:01:16+0100 lvl=info msg="Initializing local database" 
t=2021-12-08T23:01:16+0100 lvl=info msg="Set client certificate to server certificate" fingerprint=6066c4380317964b507fb4f121010d19d02cbdd3d3ee138c945a9f1919c4b78a
t=2021-12-08T23:01:16+0100 lvl=info msg="Starting cluster handler:" 
t=2021-12-08T23:01:16+0100 lvl=info msg="Starting /dev/lxd handler:" 
t=2021-12-08T23:01:16+0100 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2021-12-08T23:01:16+0100 lvl=info msg="REST API daemon:" 
t=2021-12-08T23:01:16+0100 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2021-12-08T23:01:16+0100 lvl=info msg=" - binding TCP socket" socket=192.168.1.10:8443
t=2021-12-08T23:01:16+0100 lvl=info msg="Initializing global database" 
t=2021-12-08T23:01:16+0100 lvl=info msg="Connecting to global database" 
t=2021-12-08T23:01:16+0100 lvl=warn msg="Dqlite: attempt 1: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:16+0100 lvl=warn msg="Dqlite: attempt 2: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:16+0100 lvl=warn msg="Dqlite: attempt 3: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:17+0100 lvl=warn msg="Dqlite: attempt 4: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:18+0100 lvl=warn msg="Dqlite: attempt 5: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:19+0100 lvl=warn msg="Dqlite: attempt 6: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:20+0100 lvl=warn msg="Dqlite: attempt 7: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:21+0100 lvl=warn msg="Dqlite: attempt 8: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:22+0100 lvl=warn msg="Dqlite: attempt 9: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:23+0100 lvl=warn msg="Dqlite: attempt 10: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:24+0100 lvl=warn msg="Dqlite: attempt 11: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:25+0100 lvl=warn msg="Dqlite: attempt 12: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:28+0100 lvl=info msg="Connecting to global database" 
t=2021-12-08T23:01:28+0100 lvl=warn msg="Dqlite: attempt 1: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:28+0100 lvl=warn msg="Dqlite: attempt 2: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:29+0100 lvl=warn msg="Dqlite: attempt 3: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:29+0100 lvl=warn msg="Dqlite: attempt 4: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:30+0100 lvl=warn msg="Dqlite: attempt 5: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:31+0100 lvl=warn msg="Dqlite: attempt 6: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:32+0100 lvl=warn msg="Dqlite: attempt 7: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:33+0100 lvl=warn msg="Dqlite: attempt 8: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:34+0100 lvl=warn msg="Dqlite: attempt 9: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:35+0100 lvl=warn msg="Dqlite: attempt 10: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:36+0100 lvl=warn msg="Dqlite: attempt 11: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:37+0100 lvl=warn msg="Dqlite: attempt 12: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:40+0100 lvl=info msg="Connecting to global database" 
t=2021-12-08T23:01:40+0100 lvl=warn msg="Dqlite: attempt 1: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:41+0100 lvl=warn msg="Dqlite: attempt 2: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:41+0100 lvl=warn msg="Dqlite: attempt 3: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:42+0100 lvl=warn msg="Dqlite: attempt 4: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:43+0100 lvl=warn msg="Dqlite: attempt 5: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:44+0100 lvl=warn msg="Dqlite: attempt 6: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:45+0100 lvl=warn msg="Dqlite: attempt 7: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:46+0100 lvl=warn msg="Dqlite: attempt 8: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:47+0100 lvl=warn msg="Dqlite: attempt 9: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:48+0100 lvl=warn msg="Dqlite: attempt 10: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:49+0100 lvl=warn msg="Dqlite: attempt 11: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:50+0100 lvl=warn msg="Dqlite: attempt 12: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:53+0100 lvl=info msg="Connecting to global database" 
t=2021-12-08T23:01:53+0100 lvl=warn msg="Dqlite: attempt 1: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:53+0100 lvl=warn msg="Dqlite: attempt 2: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:53+0100 lvl=warn msg="Dqlite: attempt 3: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:54+0100 lvl=warn msg="Dqlite: attempt 4: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:55+0100 lvl=warn msg="Dqlite: attempt 5: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:56+0100 lvl=warn msg="Dqlite: attempt 6: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:57+0100 lvl=warn msg="Dqlite: attempt 7: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:58+0100 lvl=warn msg="Dqlite: attempt 8: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:01:59+0100 lvl=warn msg="Dqlite: attempt 9: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:00+0100 lvl=warn msg="Dqlite: attempt 10: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:01+0100 lvl=warn msg="Dqlite: attempt 11: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:02+0100 lvl=warn msg="Dqlite: attempt 12: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:05+0100 lvl=info msg="Connecting to global database" 
t=2021-12-08T23:02:05+0100 lvl=warn msg="Dqlite: attempt 1: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:05+0100 lvl=warn msg="Dqlite: attempt 2: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:06+0100 lvl=warn msg="Dqlite: attempt 3: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:07+0100 lvl=warn msg="Dqlite: attempt 4: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:08+0100 lvl=warn msg="Dqlite: attempt 5: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:09+0100 lvl=warn msg="Dqlite: attempt 6: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:10+0100 lvl=warn msg="Dqlite: attempt 7: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:11+0100 lvl=warn msg="Dqlite: attempt 8: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:12+0100 lvl=warn msg="Dqlite: attempt 9: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:13+0100 lvl=warn msg="Dqlite: attempt 10: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:14+0100 lvl=warn msg="Dqlite: attempt 11: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:15+0100 lvl=warn msg="Dqlite: attempt 12: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:18+0100 lvl=info msg="Connecting to global database" 
t=2021-12-08T23:02:18+0100 lvl=warn msg="Dqlite: attempt 1: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:18+0100 lvl=warn msg="Dqlite: attempt 2: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:18+0100 lvl=warn msg="Dqlite: attempt 3: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:19+0100 lvl=warn msg="Dqlite: attempt 4: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:20+0100 lvl=warn msg="Dqlite: attempt 5: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:21+0100 lvl=warn msg="Dqlite: attempt 6: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:22+0100 lvl=warn msg="Dqlite: attempt 7: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:23+0100 lvl=warn msg="Dqlite: attempt 8: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:24+0100 lvl=warn msg="Dqlite: attempt 9: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:25+0100 lvl=warn msg="Dqlite: attempt 10: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:26+0100 lvl=warn msg="Dqlite: attempt 11: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:27+0100 lvl=warn msg="Dqlite: attempt 12: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:30+0100 lvl=info msg="Connecting to global database" 
t=2021-12-08T23:02:30+0100 lvl=warn msg="Dqlite: attempt 1: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:30+0100 lvl=warn msg="Dqlite: attempt 2: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:31+0100 lvl=warn msg="Dqlite: attempt 3: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:31+0100 lvl=warn msg="Dqlite: attempt 4: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:32+0100 lvl=warn msg="Dqlite: attempt 5: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:33+0100 lvl=warn msg="Dqlite: attempt 6: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:34+0100 lvl=warn msg="Dqlite: attempt 7: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:35+0100 lvl=warn msg="Dqlite: attempt 8: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:36+0100 lvl=warn msg="Dqlite: attempt 9: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:37+0100 lvl=warn msg="Dqlite: attempt 10: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:38+0100 lvl=warn msg="Dqlite: attempt 11: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:39+0100 lvl=warn msg="Dqlite: attempt 12: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:40+0100 lvl=eror msg="Failed connecting to global database" attempt=6 err="failed to create dqlite connection: no available dqlite leader server found"
t=2021-12-08T23:02:42+0100 lvl=info msg="Connecting to global database" 
t=2021-12-08T23:02:42+0100 lvl=warn msg="Dqlite: attempt 1: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:43+0100 lvl=warn msg="Dqlite: attempt 2: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:43+0100 lvl=warn msg="Dqlite: attempt 3: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:44+0100 lvl=warn msg="Dqlite: attempt 4: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:45+0100 lvl=warn msg="Dqlite: attempt 5: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:46+0100 lvl=warn msg="Dqlite: attempt 6: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:47+0100 lvl=warn msg="Dqlite: attempt 7: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:48+0100 lvl=warn msg="Dqlite: attempt 8: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:49+0100 lvl=warn msg="Dqlite: attempt 9: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:50+0100 lvl=warn msg="Dqlite: attempt 10: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:51+0100 lvl=warn msg="Dqlite: attempt 11: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:52+0100 lvl=warn msg="Dqlite: attempt 12: server 1: dial: Failed to connect to HTTP endpoint: dial tcp: address 1: missing port in address" 
t=2021-12-08T23:02:53+0100 lvl=eror msg="Failed connecting to global database" attempt=7 err="failed to create dqlite connection: no available dqlite leader server found"

OS: Ubuntu 18.04
Installed using snap.
LXD v4.20 rev. 21902

Can you show sqlite3 "SELECT * FROM config" /var/snap/lxd/common/lxd/database/local.sql ? (You may need to install sqlite3)

local.sql is an empty file :confused:

Sorry, please try:

sudo sqlite3 /var/snap/lxd/common/lxd/database/local.db "SELECT * FROM config" 
# sudo sqlite3 /var/snap/lxd/common/lxd/database/local.db "SELECT * FROM config"
1|cluster.https_address|192.168.1.10:8443
2|core.https_address|192.168.1.10:8443

Is this a cluster setup or standalone server?

It used to be a cluster. I removed the other node a long time ago.

Interesting, how did you remove it?

It’s been a while, so I don’t remember exactly, but from my shell history, I can find

lxc cluster remove server2

Please can you show output of:

sudo sqlite3 /var/snap/lxd/common/lxd/database/local.db "SELECT * FROM raft_nodes"

and

sudo sqlite3 /var/snap/lxd/common/lxd/database/global/db.bin "SELECT * FROM nodes"
% sudo sqlite3 /var/snap/lxd/common/lxd/database/local.db "SELECT * FROM raft_nodes"
1|1|0|
% sudo sqlite3 /var/snap/lxd/common/lxd/database/global/db.bin "SELECT * FROM nodes"
1|server||192.168.1.10:8443|53|270|2021-11-27 07:59:21.817415788+01:00|0|2|

Any ideas @stgraber @mbordere ?

Where could the member address for that error be stored?

It looks like address 1 is from the raft_nodes table? second column → incus/lxd/db/node/update.go at 8e6a5ea574ab1c89a6886478f3f94e7438406c5f · lxc/incus · GitHub

Ah good spot, do you know how that can happen (it sounds like at some point the cluster has been downgraded back to a single member) and how it can be fixed?

I’m not sure how it can happen, going through the lxd code. It’s https://github.com/canonical/go-dqlite/blob/9a7ab78cafd7a4106b7a213eb5f7da1f731aed6e/driver/driver.go#L256 that fails, which tries to attempt to contact the servers in its store, a NodeStore. The store is set here https://github.com/lxc/lxd/blob/8e6a5ea574ab1c89a6886478f3f94e7438406c5f/lxd/cluster/gateway.go#L830 and https://github.com/lxc/lxd/blob/8e6a5ea574ab1c89a6886478f3f94e7438406c5f/lxd/cluster/gateway.go#L881.

The store is probably populated by the raft_nodes table, and maybe we’re hitting a weird case due to the cluster downgrade to a single member, don’t know yet.

In my standalone LXD server, the raft_nodes table in /var/snap/lxd/common/lxd/database/local.db is empty.

If that is what you would expect for a standalone server after the last member was removed @mbordere then we could ask @Clem to stop LXD and remove that row from raft_nodes (after taking a backup of that file ofc).

In a standalone LXD server, this line also does not exist in the config table. The existence of this line triggers logic in incus/lxd/node/raft.go at 8e6a5ea574ab1c89a6886478f3f94e7438406c5f · lxc/incus · GitHub and incus/lxd/cluster/info.go at 8e6a5ea574ab1c89a6886478f3f94e7438406c5f · lxc/incus · GitHub that could lead to the observed behavior.

@Clem Has this server been running for a long while and was recently restarted? Was this the first restart since the 2nd node was removed from the cluster? It could explain why this has suddenly started occurring.

1 Like

It’s been a while since the 2nd node was removed from the cluster, more or less one year.
The server has been running and rebooting without any issue since.
This error appeared maybe a week, maybe more, after the last reboot.

was LXD recently updated?

Lxd is installed using snap, so it gets rather frequent updates.

% snap list
Name                 Version    Rev    Tracking       Publisher   Notes
canonical-livepatch  10.1.2     126    latest/stable  canonicalâś“  -
core                 16-2.52.1  11993  latest/stable  canonicalâś“  core
core18               20211028   2253   latest/stable  canonicalâś“  base
core20               20211129   1270   latest/stable  canonicalâś“  base
distrobuilder        2.0        1125   latest/stable  stgraber    classic
hello-world          6.4        29     latest/stable  canonicalâś“  -
lxd                  4.20       21902  latest/stable  canonicalâś“  in-cohort
% snap info lxd | grep refresh-date
refresh-date: 27 days ago, at 21:11 CET

I said earlier it appeared a week after the last reboot, it was maybe 4 weeks, ie no reboot since last update.