A backup-restored LXD won't start

Hi

I had problems upgrading an ubuntu 18.04 to 20.04, and decided to start from scratch.

I’ve installed lxd from snap and restored the dir /var/snap/lxd or just /var/snap/lxd/common/lxd/database. I’ve tried jumping versions to refresh links ( snap refresh lxd --edge && snap refresh lxd --stable ).

But the closest I get to starting lxd is this:

# lxd --group lxd --debug
INFO[02-05|17:21:01] Initializing global database 
INFO[02-05|17:21:01] Connecting to global database 
DBUG[02-05|17:21:01] Dqlite: attempt 1: server 1: connected 
INFO[02-05|17:21:01] Connected to global database 
DBUG[02-05|17:21:01] Database error: failed to fetch current nodes versions: no such column: pending 
EROR[02-05|17:21:01] Failed to start the daemon err="Failed to initialize global database: failed to ensure schema: failed to fetch current nodes versions: no such column: pending"
INFO[02-05|17:21:01] Starting shutdown sequence signal=interrupt

So I assume I have a corrupt database? Any way to repair it?

Full start-log:

root@nas02:/var/snap# lxd --group lxd --debug
INFO[02-05|17:28:33] LXD is starting                          version=4.0.8 mode=normal path=/var/snap/lxd/common/lxd
INFO[02-05|17:28:33] Kernel uid/gid map: 
INFO[02-05|17:28:33]  - u 0 0 4294967295 
INFO[02-05|17:28:33]  - g 0 0 4294967295 
INFO[02-05|17:28:33] Configured LXD uid/gid map: 
INFO[02-05|17:28:33]  - u 0 1000000 1000000000 
INFO[02-05|17:28:33]  - g 0 1000000 1000000000 
INFO[02-05|17:28:33] Kernel features: 
INFO[02-05|17:28:33]  - closing multiple file descriptors efficiently: no 
INFO[02-05|17:28:33]  - netnsid-based network retrieval: yes 
INFO[02-05|17:28:33]  - pidfds: yes 
INFO[02-05|17:28:33]  - core scheduling: no 
INFO[02-05|17:28:33]  - uevent injection: yes 
INFO[02-05|17:28:33]  - seccomp listener: yes 
INFO[02-05|17:28:33]  - seccomp listener continue syscalls: yes 
INFO[02-05|17:28:33]  - seccomp listener add file descriptors: no 
INFO[02-05|17:28:33]  - attach to namespaces via pidfds: no 
INFO[02-05|17:28:33]  - safe native terminal allocation : yes 
INFO[02-05|17:28:33]  - unprivileged file capabilities: yes 
INFO[02-05|17:28:33]  - cgroup layout: hybrid 
WARN[02-05|17:28:33]  - Couldn't find the CGroup blkio.weight, disk priority will be ignored 
WARN[02-05|17:28:33]  - Couldn't find the CGroup memory swap accounting, swap limits will be ignored 
INFO[02-05|17:28:33]  - shiftfs support: yes 
INFO[02-05|17:28:33] Initializing local database 
DBUG[02-05|17:28:33] Refreshing local trusted certificate cache 
INFO[02-05|17:28:33] Set client certificate to server certificate fingerprint=1f751b72d27d5d3cb032fa7eefd7c2e37387382a287bcb5889722bf61e68b0e5
DBUG[02-05|17:28:33] Initializing database gateway 
INFO[02-05|17:28:33] Starting database node                   id=1 address=1 role=voter
INFO[02-05|17:28:33] Starting /dev/lxd handler: 
INFO[02-05|17:28:33]  - binding devlxd socket                 socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[02-05|17:28:33] REST API daemon: 
INFO[02-05|17:28:33]  - binding Unix socket                   socket=/var/snap/lxd/common/lxd/unix.socket
INFO[02-05|17:28:33] Initializing global database 
INFO[02-05|17:28:33] Connecting to global database 
DBUG[02-05|17:28:33] Dqlite: attempt 1: server 1: connected 
INFO[02-05|17:28:33] Connected to global database 
DBUG[02-05|17:28:33] Database error: failed to fetch current nodes versions: no such column: pending 
EROR[02-05|17:28:33] Failed to start the daemon               err="Failed to initialize global database: failed to ensure schema: failed to fetch current nodes versions: no such column: pending"
INFO[02-05|17:28:33] Starting shutdown sequence               signal=interrupt
DBUG[02-05|17:28:33] Cancel ongoing or future gRPC connection attempts 
INFO[02-05|17:28:33] Stop database gateway 
INFO[02-05|17:28:33] Stopping REST API handler: 
INFO[02-05|17:28:33]  - closing socket                        socket=/var/snap/lxd/common/lxd/unix.socket
INFO[02-05|17:28:33] Stopping /dev/lxd handler: 
INFO[02-05|17:28:33]  - closing socket                        socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[02-05|17:28:33] Not unmounting temporary filesystems (instances are still running) 
INFO[02-05|17:28:33] Daemon stopped 
Error: Failed to initialize global database: failed to ensure schema: failed to fetch current nodes versions: no such column: pending

Version info:

root@nas02:~# snap list
Name    Version   Rev    Tracking       Publisher   Notes
core18  20211215  2284   latest/stable  canonical✓  base
core20  20220114  1328   latest/stable  canonical✓  base
lxd     4.0.8     21835  4.0/stable     canonical✓  -
snapd   2.54.2    14549  latest/stable  canonical✓  snapd

Is my best option forgetting my old setup, and attempt doing an lxd recover for each container instead?

When recovering instead, the zfs snapshots doesn’t match whatever backup.yaml it can find:

Scanning for unknown volumes...
Error: Failed validation request: Failed checking volumes on pool "default": Instance "minecraft" in project "default" has snapshot inconsistency: Snapshot count in backup config and storage device are different: Backup snapshots mismatch

This doesn’t surprise me as I did some snapshot cleaning before this happened. Do I have to mount each dataset and edit the backup.yaml manually? Doesn’t seem recovery has any --force option.

Didn’t have time to wait. I was lucky enough to have exported my most important containers, so starting over and importing them. =)