LXD fails to start after rebooting

I’ve noticed that my lxd service was offline after rebooting my machine, it was running before that, my containers were fully functional. Lxd version is 3.0.3 at Ubuntu 18.04.

When I tried to start it manually, it wait for minutes and that says:

fmello@turing:~$ sudo systemctl start lxd
Job for lxd.service failed because the control process exited with error code.
See “systemctl status lxd.service” and “journalctl -xe” for details.

Systemctl status says:

fmello@turing:~$ sudo systemctl status lxd
● lxd.service - LXD - main daemon
Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled)
Active: activating (start-post) (Result: exit-code) since Thu 2022-03-17 11:27:32 -03; 4min 38s ago
Docs: man:lxd(1)
Process: 9488 ExecStart=/usr/bin/lxd --group lxd --logfile=/var/log/lxd/lxd.log (code=exited, status=2)
Process: 9470 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS)
Main PID: 9488 (code=exited, status=2); Control PID: 9489 (lxd)
Tasks: 7
CGroup: /system.slice/lxd.service
└─9489 /usr/lib/lxd/lxd waitready --timeout=600

mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main_daemon.go:42 +0x52
mar 17 11:27:32 turing lxd[9488]: github.com/spf13/cobra.(*Command).execute(0xc42029ec80, 0xc4200cc050, 0x3, 0x3, 0xc42029ec80, 0xc4200cc050)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:762 +0x468
mar 17 11:27:32 turing lxd[9488]: github.com/spf13/cobra.(*Command).ExecuteC(0xc42029ec80, 0x0, 0xc4202b5180, 0xc4202b5180)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:852 +0x30a
mar 17 11:27:32 turing lxd[9488]: github.com/spf13/cobra.(*Command).Execute(0xc42029ec80, 0xc420299e08, 0x1)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:800 +0x2b
mar 17 11:27:32 turing lxd[9488]: main.main()
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main.go:160 +0xe26
mar 17 11:27:32 turing systemd[1]: lxd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

The journalctl says:

fmello@turing:~$ journalctl -xe
mar 17 11:27:32 turing lxd[9488]: panic: log not found
mar 17 11:27:32 turing lxd[9488]: goroutine 1 [running]:
mar 17 11:27:32 turing lxd[9488]: github.com/hashicorp/raft.NewRaft(0xc4201ca6c0, 0x112daa0, 0xc42030e580, 0x1135e60, 0xc42030e3c0, 0x1132ca0,
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft/api.go:491 +0x11ba
mar 17 11:27:32 turing lxd[9488]: github.com/lxc/lxd/lxd/cluster.raftInstanceInit(0xc4202d34c0, 0xc42030e2c0, 0xc4202ec1c0, 0x4008000000000000
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/raft.go:191 +0x5
mar 17 11:27:32 turing lxd[9488]: github.com/lxc/lxd/lxd/cluster.newRaft(0xc4202d34c0, 0xc4202ec1c0, 0x4008000000000000, 0x0, 0x0, 0xc42030e1c
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/raft.go:71 +0x24
mar 17 11:27:32 turing lxd[9488]: github.com/lxc/lxd/lxd/cluster.(*Gateway).init(0xc4202c8fc0, 0xc42030e1c0, 0xc4200a52c0)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/gateway.go:448 +
mar 17 11:27:32 turing lxd[9488]: github.com/lxc/lxd/lxd/cluster.NewGateway(0xc4202d34c0, 0xc4202ec1c0, 0xc420299718, 0x2, 0x2, 0x0, 0x24, 0xf
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/gateway.go:57 +0
mar 17 11:27:32 turing lxd[9488]: main.(*Daemon).init(0xc420216c60, 0xc4202998d0, 0x40e936)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/daemon.go:502 +0x645
mar 17 11:27:32 turing lxd[9488]: main.(*Daemon).Init(0xc420216c60, 0xc420216c60, 0xc4200a4d80)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/daemon.go:390 +0x2f
mar 17 11:27:32 turing lxd[9488]: main.(*cmdDaemon).Run(0xc42020d300, 0xc42029ec80, 0xc420245920, 0x0, 0x3, 0x0, 0x0)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main_daemon.go:82 +0x37a
mar 17 11:27:32 turing lxd[9488]: main.(*cmdDaemon).Run-fm(0xc42029ec80, 0xc420245920, 0x0, 0x3, 0x0, 0x0)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main_daemon.go:42 +0x52
mar 17 11:27:32 turing lxd[9488]: github.com/spf13/cobra.(*Command).execute(0xc42029ec80, 0xc4200cc050, 0x3, 0x3, 0xc42029ec80, 0xc4200cc050)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:762 +0x468
mar 17 11:27:32 turing lxd[9488]: github.com/spf13/cobra.(*Command).ExecuteC(0xc42029ec80, 0x0, 0xc4202b5180, 0xc4202b5180)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:852 +0x30a
mar 17 11:27:32 turing lxd[9488]: github.com/spf13/cobra.(*Command).Execute(0xc42029ec80, 0xc420299e08, 0x1)
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:800 +0x2b
mar 17 11:27:32 turing lxd[9488]: main.main()
mar 17 11:27:32 turing lxd[9488]: /build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main.go:160 +0xe26
mar 17 11:27:32 turing systemd[1]: lxd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
mar 17 11:28:26 turing rtkit-daemon[1528]: Supervising 6 threads of 4 processes of 1 users.
mar 17 11:28:26 turing rtkit-daemon[1528]: Supervising 6 threads of 4 processes of 1 users.
mar 17 11:28:28 turing rtkit-daemon[1528]: Supervising 6 threads of 4 processes of 1 users.
mar 17 11:28:28 turing rtkit-daemon[1528]: Supervising 6 threads of 4 processes of 1 users.
mar 17 11:28:29 turing rtkit-daemon[1528]: Supervising 6 threads of 4 processes of 1 users.
mar 17 11:28:29 turing rtkit-daemon[1528]: Supervising 6 threads of 4 processes of 1 users.

There is a panic message at journactl, but it dowsn’t seems to be the issue. Does anyone has a tip of what might be going worng?

Did you recently run out of disk space or machine lost power suddenly?
It looks like your LXD DB got corrupted.

Can you show ls -lh /var/lib/lxd/database/global/

All partions have 48% to 52% of usage, and I did not lost power suddenly.

fmello@turing:~$ ls -lh /var/lib/lxd/database/global/
total 17M
-rwxr-xr-x 1 fmello root 232K mar 15 18:58 db.bin
-rwxr-xr-x 1 fmello root 3,1M mar 15 18:58 db.bin-wal
-rw------- 1 root root 14M mar 17 11:57 logs.db
drwxr-xr-x 4 fmello root 4,0K mar 17 11:57 snapshots

Ok. Can you send me a tarball of /var/lib/lxd/database to stgraber@ubuntu.com

I’ll most likely need to rebuild your DB from that db.bin

Gmail didn’t allowed to send the tarball, so I uploaded it to Drive: https://drive.google.com/file/d/1o_Ytw-BFDiZ20G_AzwEOYF447u5jTR4_/view?usp=sharing

Take a look if it works for you.

Got it downloaded, will hopefully get to look at it soon.

root@bionic:~# lxc list
+-----------+---------+------+------+------------+-----------+
|   NAME    |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+-----------+---------+------+------+------------+-----------+
| adriano   | STOPPED |      |      | PERSISTENT | 0         |
+-----------+---------+------+------+------------+-----------+
| maisfluxo | STOPPED |      |      | PERSISTENT | 0         |
+-----------+---------+------+------+------------+-----------+
root@bionic:~# 

https://dl.stgraber.org/lxd/flavio-database.tar.gz

It worked perfectly! Thanks a lot!

May I understand what you did, and register here how to do it?

Not a particularly easy recovery, 3.0 has a very old dqlite that’s a pain to fix.

Basically what I had to do is extract a DB dump from db.bin (as SQL), then create a fresh database, alter the SQL dump to remove anything already present in the clean DB and then have LXD load it as a patch.global.sql file, effectively populating that empty DB with the relevant records from the broken DB.

Thanks very much for you attention!