A backup-restored LXD won't start

lalala · February 5, 2022, 5:28pm

Hi

I had problems upgrading an ubuntu 18.04 to 20.04, and decided to start from scratch.

I’ve installed lxd from snap and restored the dir /var/snap/lxd or just /var/snap/lxd/common/lxd/database. I’ve tried jumping versions to refresh links ( snap refresh lxd --edge && snap refresh lxd --stable ).

But the closest I get to starting lxd is this:

# lxd --group lxd --debug
INFO[02-05|17:21:01] Initializing global database 
INFO[02-05|17:21:01] Connecting to global database 
DBUG[02-05|17:21:01] Dqlite: attempt 1: server 1: connected 
INFO[02-05|17:21:01] Connected to global database 
DBUG[02-05|17:21:01] Database error: failed to fetch current nodes versions: no such column: pending 
EROR[02-05|17:21:01] Failed to start the daemon err="Failed to initialize global database: failed to ensure schema: failed to fetch current nodes versions: no such column: pending"
INFO[02-05|17:21:01] Starting shutdown sequence signal=interrupt

So I assume I have a corrupt database? Any way to repair it?

Full start-log:

root@nas02:/var/snap# lxd --group lxd --debug
INFO[02-05|17:28:33] LXD is starting                          version=4.0.8 mode=normal path=/var/snap/lxd/common/lxd
INFO[02-05|17:28:33] Kernel uid/gid map: 
INFO[02-05|17:28:33]  - u 0 0 4294967295 
INFO[02-05|17:28:33]  - g 0 0 4294967295 
INFO[02-05|17:28:33] Configured LXD uid/gid map: 
INFO[02-05|17:28:33]  - u 0 1000000 1000000000 
INFO[02-05|17:28:33]  - g 0 1000000 1000000000 
INFO[02-05|17:28:33] Kernel features: 
INFO[02-05|17:28:33]  - closing multiple file descriptors efficiently: no 
INFO[02-05|17:28:33]  - netnsid-based network retrieval: yes 
INFO[02-05|17:28:33]  - pidfds: yes 
INFO[02-05|17:28:33]  - core scheduling: no 
INFO[02-05|17:28:33]  - uevent injection: yes 
INFO[02-05|17:28:33]  - seccomp listener: yes 
INFO[02-05|17:28:33]  - seccomp listener continue syscalls: yes 
INFO[02-05|17:28:33]  - seccomp listener add file descriptors: no 
INFO[02-05|17:28:33]  - attach to namespaces via pidfds: no 
INFO[02-05|17:28:33]  - safe native terminal allocation : yes 
INFO[02-05|17:28:33]  - unprivileged file capabilities: yes 
INFO[02-05|17:28:33]  - cgroup layout: hybrid 
WARN[02-05|17:28:33]  - Couldn't find the CGroup blkio.weight, disk priority will be ignored 
WARN[02-05|17:28:33]  - Couldn't find the CGroup memory swap accounting, swap limits will be ignored 
INFO[02-05|17:28:33]  - shiftfs support: yes 
INFO[02-05|17:28:33] Initializing local database 
DBUG[02-05|17:28:33] Refreshing local trusted certificate cache 
INFO[02-05|17:28:33] Set client certificate to server certificate fingerprint=1f751b72d27d5d3cb032fa7eefd7c2e37387382a287bcb5889722bf61e68b0e5
DBUG[02-05|17:28:33] Initializing database gateway 
INFO[02-05|17:28:33] Starting database node                   id=1 address=1 role=voter
INFO[02-05|17:28:33] Starting /dev/lxd handler: 
INFO[02-05|17:28:33]  - binding devlxd socket                 socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[02-05|17:28:33] REST API daemon: 
INFO[02-05|17:28:33]  - binding Unix socket                   socket=/var/snap/lxd/common/lxd/unix.socket
INFO[02-05|17:28:33] Initializing global database 
INFO[02-05|17:28:33] Connecting to global database 
DBUG[02-05|17:28:33] Dqlite: attempt 1: server 1: connected 
INFO[02-05|17:28:33] Connected to global database 
DBUG[02-05|17:28:33] Database error: failed to fetch current nodes versions: no such column: pending 
EROR[02-05|17:28:33] Failed to start the daemon               err="Failed to initialize global database: failed to ensure schema: failed to fetch current nodes versions: no such column: pending"
INFO[02-05|17:28:33] Starting shutdown sequence               signal=interrupt
DBUG[02-05|17:28:33] Cancel ongoing or future gRPC connection attempts 
INFO[02-05|17:28:33] Stop database gateway 
INFO[02-05|17:28:33] Stopping REST API handler: 
INFO[02-05|17:28:33]  - closing socket                        socket=/var/snap/lxd/common/lxd/unix.socket
INFO[02-05|17:28:33] Stopping /dev/lxd handler: 
INFO[02-05|17:28:33]  - closing socket                        socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[02-05|17:28:33] Not unmounting temporary filesystems (instances are still running) 
INFO[02-05|17:28:33] Daemon stopped 
Error: Failed to initialize global database: failed to ensure schema: failed to fetch current nodes versions: no such column: pending

lalala · February 5, 2022, 7:57pm

Version info:

root@nas02:~# snap list
Name    Version   Rev    Tracking       Publisher   Notes
core18  20211215  2284   latest/stable  canonical✓  base
core20  20220114  1328   latest/stable  canonical✓  base
lxd     4.0.8     21835  4.0/stable     canonical✓  -
snapd   2.54.2    14549  latest/stable  canonical✓  snapd

lalala · February 5, 2022, 8:00pm

Is my best option forgetting my old setup, and attempt doing an lxd recover for each container instead?

lalala · February 5, 2022, 8:22pm

When recovering instead, the zfs snapshots doesn’t match whatever backup.yaml it can find:

Scanning for unknown volumes...
Error: Failed validation request: Failed checking volumes on pool "default": Instance "minecraft" in project "default" has snapshot inconsistency: Snapshot count in backup config and storage device are different: Backup snapshots mismatch

This doesn’t surprise me as I did some snapshot cleaning before this happened. Do I have to mount each dataset and edit the backup.yaml manually? Doesn’t seem recovery has any --force option.

lalala · February 5, 2022, 9:37pm

Didn’t have time to wait. I was lucky enough to have exported my most important containers, so starting over and importing them. =)