@stgraber sorry for tagging you directly but I urgently need help.
My snap LXD installation doesn’t start with the following error log:
root@bob:/home/fkeclik# lxd --debug --group lxd
DBUG[12-03|12:21:56] Connecting to a local LXD over a Unix socket
DBUG[12-03|12:21:56] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
INFO[12-03|12:21:56] LXD is starting version=4.0.8 mode=normal path=/var/snap/lxd/common/lxd
INFO[12-03|12:21:56] Kernel uid/gid map:
INFO[12-03|12:21:56] - u 0 0 4294967295
INFO[12-03|12:21:56] - g 0 0 4294967295
INFO[12-03|12:21:56] Configured LXD uid/gid map:
INFO[12-03|12:21:56] - u 0 1000000 1000000000
INFO[12-03|12:21:56] - g 0 1000000 1000000000
INFO[12-03|12:21:56] Kernel features:
INFO[12-03|12:21:56] - closing multiple file descriptors efficiently: no
INFO[12-03|12:21:56] - netnsid-based network retrieval: yes
INFO[12-03|12:21:56] - pidfds: yes
INFO[12-03|12:21:56] - core scheduling: no
INFO[12-03|12:21:56] - uevent injection: yes
INFO[12-03|12:21:56] - seccomp listener: yes
INFO[12-03|12:21:56] - seccomp listener continue syscalls: yes
INFO[12-03|12:21:56] - seccomp listener add file descriptors: no
INFO[12-03|12:21:56] - attach to namespaces via pidfds: no
INFO[12-03|12:21:56] - safe native terminal allocation : yes
INFO[12-03|12:21:56] - unprivileged file capabilities: yes
INFO[12-03|12:21:56] - cgroup layout: hybrid
WARN[12-03|12:21:56] - Couldn't find the CGroup blkio.weight, disk priority will be ignored
WARN[12-03|12:21:56] - Couldn't find the CGroup memory swap accounting, swap limits will be ignored
INFO[12-03|12:21:56] - shiftfs support: yes
INFO[12-03|12:21:56] Initializing local database
DBUG[12-03|12:21:56] Refreshing local trusted certificate cache
INFO[12-03|12:21:56] Set client certificate to server certificate fingerprint=c79af4bfa82848b543d4d42cb4a4c52bc1e00384eaa948aae4bb2fb34b30e7b8
DBUG[12-03|12:21:56] Initializing database gateway
EROR[12-03|12:21:56] Invalid configuration key: Wildcard addresses aren't allowed key=cluster.https_address
INFO[12-03|12:21:56] Starting database node id=1 address=1 role=voter
EROR[12-03|12:21:57] Invalid configuration key: Wildcard addresses aren't allowed key=cluster.https_address
EROR[12-03|12:21:57] Invalid configuration key: Wildcard addresses aren't allowed key=cluster.https_address
EROR[12-03|12:21:57] Invalid configuration key: Wildcard addresses aren't allowed key=cluster.https_address
DBUG[12-03|12:21:57] Connecting to a local LXD over a Unix socket
DBUG[12-03|12:21:57] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
DBUG[12-03|12:21:57] Detected stale unix socket, deleting
INFO[12-03|12:21:57] Starting /dev/lxd handler:
INFO[12-03|12:21:57] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[12-03|12:21:57] REST API daemon:
INFO[12-03|12:21:57] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[12-03|12:21:57] - binding TCP socket socket=[::]:8443
INFO[12-03|12:21:57] Initializing global database
INFO[12-03|12:21:57] Connecting to global database
DBUG[12-03|12:21:57] Dqlite: attempt 1: server 1: connected
INFO[12-03|12:21:57] Connected to global database
DBUG[12-03|12:21:57] Database error: failed to update node version info: updated 0 rows instead of 1
EROR[12-03|12:21:57] Failed to start the daemon err="Failed to initialize global database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1"
INFO[12-03|12:21:57] Starting shutdown sequence signal=interrupt
EROR[12-03|12:21:57] Invalid configuration key: Wildcard addresses aren't allowed key=cluster.https_address
WARN[12-03|12:21:57] Could not handover member's responsibilities err="Node is not clustered"
DBUG[12-03|12:21:57] Cancel ongoing or future gRPC connection attempts
DBUG[12-03|12:21:57] Cancel ongoing or future gRPC connection attempts
INFO[12-03|12:21:57] Stop database gateway
INFO[12-03|12:21:57] Stopping REST API handler:
INFO[12-03|12:21:57] - closing socket socket=[::]:8443
INFO[12-03|12:21:57] - closing socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[12-03|12:21:57] Stopping /dev/lxd handler:
INFO[12-03|12:21:57] - closing socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[12-03|12:21:57] Not unmounting temporary filesystems (instances are still running)
INFO[12-03|12:21:57] Daemon stopped
Error: Failed to initialize global database: failed to ensure schema: failed to update node version info: updated 0 rows instead of 1
I had a look at a different machine where lxd is still running but running lxc list return this Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused
Setting cluster.https_address via lxc config set doesn’t work because I get this error Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory
With some research and the help of this article I was able to temporarily get lxd running.
But now it is failing again with: Failed to start the daemon err="Failed to start dqlite server: raft_start(): io: closed segment 0000000001048683-0000000001048701 is past last snapshot snapshot-1-1048387-21758482"
long version:
lxd --verbose --debug
DBUG[12-03|16:01:04] Connecting to a local LXD over a Unix socket
DBUG[12-03|16:01:04] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
INFO[12-03|16:01:04] LXD is starting version=4.0.8 mode=normal path=/var/snap/lxd/common/lxd
INFO[12-03|16:01:04] Kernel uid/gid map:
INFO[12-03|16:01:04] - u 0 0 4294967295
INFO[12-03|16:01:04] - g 0 0 4294967295
INFO[12-03|16:01:04] Configured LXD uid/gid map:
INFO[12-03|16:01:04] - u 0 1000000 1000000000
INFO[12-03|16:01:04] - g 0 1000000 1000000000
INFO[12-03|16:01:04] Kernel features:
INFO[12-03|16:01:04] - closing multiple file descriptors efficiently: no
INFO[12-03|16:01:04] - netnsid-based network retrieval: yes
INFO[12-03|16:01:04] - pidfds: yes
INFO[12-03|16:01:04] - core scheduling: no
INFO[12-03|16:01:04] - uevent injection: yes
INFO[12-03|16:01:04] - seccomp listener: yes
INFO[12-03|16:01:04] - seccomp listener continue syscalls: yes
INFO[12-03|16:01:04] - seccomp listener add file descriptors: no
INFO[12-03|16:01:04] - attach to namespaces via pidfds: no
INFO[12-03|16:01:04] - safe native terminal allocation : yes
INFO[12-03|16:01:04] - unprivileged file capabilities: yes
INFO[12-03|16:01:04] - cgroup layout: hybrid
WARN[12-03|16:01:04] - Couldn't find the CGroup blkio.weight, disk priority will be ignored
WARN[12-03|16:01:04] - Couldn't find the CGroup memory swap accounting, swap limits will be ignored
INFO[12-03|16:01:04] - shiftfs support: yes
INFO[12-03|16:01:04] Initializing local database
DBUG[12-03|16:01:04] Refreshing local trusted certificate cache
INFO[12-03|16:01:04] Set client certificate to server certificate fingerprint=c79af4bfa82848b543d4d42cb4a4c52bc1e00384eaa948aae4bb2fb34b30e7b8
DBUG[12-03|16:01:04] Initializing database gateway
INFO[12-03|16:01:04] Starting database node id=1 address=1 role=voter
EROR[12-03|16:01:04] Failed to start the daemon err="Failed to start dqlite server: raft_start(): io: closed segment 0000000001048683-0000000001048701 is past last snapshot snapshot-1-1048387-21758482"
INFO[12-03|16:01:04] Starting shutdown sequence signal=interrupt
INFO[12-03|16:01:04] Not unmounting temporary filesystems (instances are still running)
INFO[12-03|16:01:04] Daemon stopped
Error: Failed to start dqlite server: raft_start(): io: closed segment 0000000001048683-0000000001048701 is past last snapshot snapshot-1-1048387-21758482
With the help of this article I managed to get it working again
I reset the global db to the bakup that gets created in the beginning. LXD is working again.
I have a second server with the same issues (I expect the cluster cert has expired). Gonna work on that tomorrow and mark the issue as resolved when that one works too.