LXD snap no longer starts - Invalid configuration key: Wildcard addresses aren’t allowed key=cluster.https_address

huepf · December 3, 2021, 11:30am

@stgraber sorry for tagging you directly but I urgently need help.
My snap LXD installation doesn’t start with the following error log:

root@bob:/home/fkeclik# lxd --debug --group lxd
DBUG[12-03|12:21:56] Connecting to a local LXD over a Unix socket 
DBUG[12-03|12:21:56] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
INFO[12-03|12:21:56] LXD is starting                          version=4.0.8 mode=normal path=/var/snap/lxd/common/lxd
INFO[12-03|12:21:56] Kernel uid/gid map: 
INFO[12-03|12:21:56]  - u 0 0 4294967295 
INFO[12-03|12:21:56]  - g 0 0 4294967295 
INFO[12-03|12:21:56] Configured LXD uid/gid map: 
INFO[12-03|12:21:56]  - u 0 1000000 1000000000 
INFO[12-03|12:21:56]  - g 0 1000000 1000000000 
INFO[12-03|12:21:56] Kernel features: 
INFO[12-03|12:21:56]  - closing multiple file descriptors efficiently: no 
INFO[12-03|12:21:56]  - netnsid-based network retrieval: yes 
INFO[12-03|12:21:56]  - pidfds: yes 
INFO[12-03|12:21:56]  - core scheduling: no 
INFO[12-03|12:21:56]  - uevent injection: yes 
INFO[12-03|12:21:56]  - seccomp listener: yes 
INFO[12-03|12:21:56]  - seccomp listener continue syscalls: yes 
INFO[12-03|12:21:56]  - seccomp listener add file descriptors: no 
INFO[12-03|12:21:56]  - attach to namespaces via pidfds: no 
INFO[12-03|12:21:56]  - safe native terminal allocation : yes 
INFO[12-03|12:21:56]  - unprivileged file capabilities: yes 
INFO[12-03|12:21:56]  - cgroup layout: hybrid 
WARN[12-03|12:21:56]  - Couldn't find the CGroup blkio.weight, disk priority will be ignored 
WARN[12-03|12:21:56]  - Couldn't find the CGroup memory swap accounting, swap limits will be ignored 
INFO[12-03|12:21:56]  - shiftfs support: yes 
INFO[12-03|12:21:56] Initializing local database 
DBUG[12-03|12:21:56] Refreshing local trusted certificate cache 
INFO[12-03|12:21:56] Set client certificate to server certificate fingerprint=c79af4bfa82848b543d4d42cb4a4c52bc1e00384eaa948aae4bb2fb34b30e7b8
DBUG[12-03|12:21:56] Initializing database gateway 
EROR[12-03|12:21:56] Invalid configuration key: Wildcard addresses aren't allowed key=cluster.https_address
INFO[12-03|12:21:56] Starting database node                   id=1 address=1 role=voter
EROR[12-03|12:21:57] Invalid configuration key: Wildcard addresses aren't allowed key=cluster.https_address
EROR[12-03|12:21:57] Invalid configuration key: Wildcard addresses aren't allowed key=cluster.https_address
EROR[12-03|12:21:57] Invalid configuration key: Wildcard addresses aren't allowed key=cluster.https_address
DBUG[12-03|12:21:57] Connecting to a local LXD over a Unix socket 
DBUG[12-03|12:21:57] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
DBUG[12-03|12:21:57] Detected stale unix socket, deleting 
INFO[12-03|12:21:57] Starting /dev/lxd handler: 
INFO[12-03|12:21:57]  - binding devlxd socket                 socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[12-03|12:21:57] REST API daemon: 
INFO[12-03|12:21:57]  - binding Unix socket                   socket=/var/snap/lxd/common/lxd/unix.socket
INFO[12-03|12:21:57]  - binding TCP socket                    socket=[::]:8443
INFO[12-03|12:21:57] Initializing global database 
INFO[12-03|12:21:57] Connecting to global database 
DBUG[12-03|12:21:57] Dqlite: attempt 1: server 1: connected 
INFO[12-03|12:21:57] Connected to global database 
DBUG[12-03|12:21:57] Database error: failed to update node version info: updated 0 rows instead of 1 
EROR[12-03|12:21:57] Failed to start the daemon               err="Failed to initialize global database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1"
INFO[12-03|12:21:57] Starting shutdown sequence               signal=interrupt
EROR[12-03|12:21:57] Invalid configuration key: Wildcard addresses aren't allowed key=cluster.https_address
WARN[12-03|12:21:57] Could not handover member's responsibilities err="Node is not clustered"
DBUG[12-03|12:21:57] Cancel ongoing or future gRPC connection attempts 
DBUG[12-03|12:21:57] Cancel ongoing or future gRPC connection attempts 
INFO[12-03|12:21:57] Stop database gateway 
INFO[12-03|12:21:57] Stopping REST API handler: 
INFO[12-03|12:21:57]  - closing socket                        socket=[::]:8443
INFO[12-03|12:21:57]  - closing socket                        socket=/var/snap/lxd/common/lxd/unix.socket
INFO[12-03|12:21:57] Stopping /dev/lxd handler: 
INFO[12-03|12:21:57]  - closing socket                        socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[12-03|12:21:57] Not unmounting temporary filesystems (instances are still running) 
INFO[12-03|12:21:57] Daemon stopped 
Error: Failed to initialize global database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1

I had a look at a different machine where lxd is still running but running lxc list return this
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused

This is worrying me a lot!

huepf · December 3, 2021, 12:38pm

Setting cluster.https_address via lxc config set doesn’t work because I get this error
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory

huepf · December 3, 2021, 1:08pm

It looks like this old issue which is supposed to be fixed a long time ago. I am using lxd v4.0.8

Can I safely run this database change?

huepf · December 3, 2021, 3:02pm

With some research and the help of this article I was able to temporarily get lxd running.

But now it is failing again with:
Failed to start the daemon err="Failed to start dqlite server: raft_start(): io: closed segment 0000000001048683-0000000001048701 is past last snapshot snapshot-1-1048387-21758482"

long version:

lxd --verbose --debug
DBUG[12-03|16:01:04] Connecting to a local LXD over a Unix socket 
DBUG[12-03|16:01:04] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
INFO[12-03|16:01:04] LXD is starting                          version=4.0.8 mode=normal path=/var/snap/lxd/common/lxd
INFO[12-03|16:01:04] Kernel uid/gid map: 
INFO[12-03|16:01:04]  - u 0 0 4294967295 
INFO[12-03|16:01:04]  - g 0 0 4294967295 
INFO[12-03|16:01:04] Configured LXD uid/gid map: 
INFO[12-03|16:01:04]  - u 0 1000000 1000000000 
INFO[12-03|16:01:04]  - g 0 1000000 1000000000 
INFO[12-03|16:01:04] Kernel features: 
INFO[12-03|16:01:04]  - closing multiple file descriptors efficiently: no 
INFO[12-03|16:01:04]  - netnsid-based network retrieval: yes 
INFO[12-03|16:01:04]  - pidfds: yes 
INFO[12-03|16:01:04]  - core scheduling: no 
INFO[12-03|16:01:04]  - uevent injection: yes 
INFO[12-03|16:01:04]  - seccomp listener: yes 
INFO[12-03|16:01:04]  - seccomp listener continue syscalls: yes 
INFO[12-03|16:01:04]  - seccomp listener add file descriptors: no 
INFO[12-03|16:01:04]  - attach to namespaces via pidfds: no 
INFO[12-03|16:01:04]  - safe native terminal allocation : yes 
INFO[12-03|16:01:04]  - unprivileged file capabilities: yes 
INFO[12-03|16:01:04]  - cgroup layout: hybrid 
WARN[12-03|16:01:04]  - Couldn't find the CGroup blkio.weight, disk priority will be ignored 
WARN[12-03|16:01:04]  - Couldn't find the CGroup memory swap accounting, swap limits will be ignored 
INFO[12-03|16:01:04]  - shiftfs support: yes 
INFO[12-03|16:01:04] Initializing local database 
DBUG[12-03|16:01:04] Refreshing local trusted certificate cache 
INFO[12-03|16:01:04] Set client certificate to server certificate fingerprint=c79af4bfa82848b543d4d42cb4a4c52bc1e00384eaa948aae4bb2fb34b30e7b8
DBUG[12-03|16:01:04] Initializing database gateway 
INFO[12-03|16:01:04] Starting database node                   id=1 address=1 role=voter
EROR[12-03|16:01:04] Failed to start the daemon               err="Failed to start dqlite server: raft_start(): io: closed segment 0000000001048683-0000000001048701 is past last snapshot snapshot-1-1048387-21758482"
INFO[12-03|16:01:04] Starting shutdown sequence               signal=interrupt
INFO[12-03|16:01:04] Not unmounting temporary filesystems (instances are still running) 
INFO[12-03|16:01:04] Daemon stopped 
Error: Failed to start dqlite server: raft_start(): io: closed segment 0000000001048683-0000000001048701 is past last snapshot snapshot-1-1048387-21758482

Maybe @freeekanayaka can help?

huepf · December 3, 2021, 6:08pm

With the help of this article I managed to get it working again

github.com/canonical/dqlite

closed segment X is past last snapshot Y - due to disk full

opened 10:24AM - 04 Jun 21 UTC

closed 12:49PM - 04 Jun 21 UTC

dionysius

Hi, we've encountered a dqlite issue while using LXD. LXD can't start dqlite any…more after we ran out of disk space for another reason. That node is using LXD 3.23. I can't say which version of dqlite, sqlite and co. is compiled in, but this is the source tarball of lxd, which ships a vendored dqlite and sqlite: https://linuxcontainers.org/downloads/lxd/lxd-3.23.tar.gz After freeing some disk space we tried to bring LXD back up: (full backup of current state available) ``` Error: Failed to start dqlite server: raft_start(): io: closed segment 0000000001570516-0000000001571328 is past last snapshot snapshot-1-1569792-5921177853 ``` After `rm 0000000001570516-0000000001571328`: (error message would fit #190) ``` Error: Failed to start dqlite server: raft_start(): io: load closed segment 0000000001569703-0000000001570515: found 812 entries (expected 813) ``` After `rm 0000000001569703-0000000001570515`: ``` Error: Failed to start dqlite server: raft_start(): ``` LXD is **not** clustered, so that should mean a single node dqlite. The `db.bin` sqlite file seems fine, so far I can find all data. I don't know dqlite at all. My conceptual idea would be: Since this is a single node dqlite, can we repair or manually freshly setup the dqlite using the sqlite contents? I mean there is no transaction or snapshots to be required to sync to other nodes... and if there's tooling to do so...

I reset the global db to the bakup that gets created in the beginning. LXD is working again.

I have a second server with the same issues (I expect the cluster cert has expired). Gonna work on that tomorrow and mark the issue as resolved when that one works too.