LXD 3.0.3 crashes on startup

Hi,
I have a server with standalone LXD 3.0.3 installed from the packages on Ubuntu 18.04 server. There are about 70 containers running.
Today, after an electricity blackout in the data center, LXD started to crash on startup. There is the relevant excerpt from the syslog at the end of the message.
As far as I understood from the log, the problem is related to the database. But I was able to perform ‘select’ operation from each table in db.bin.
Then I re-install LXD but it didn’t solve the problem. Then I deleted the database files and re-install LXD again. At that time it started successfully. Finally, I imported the containers and the problem was solved.
Can somebody give me some more details about the crash reason?

Thank you.

The Log:
Apr 19 10:38:48 mt1 systemd[1]: Starting LXD - main daemon…
Apr 19 10:38:48 mt1 kernel: [ 1526.155392] audit: type=1400 audit(1618828728.354:86): apparmor=“STATUS” operation=“profile_replace” info=“same as current profile, skipping” profile=“unconfined” name="/usr/bin/lxc-start" pid=6776 comm=“a
pparmor_parser”
Apr 19 10:38:48 mt1 kernel: [ 1526.172240] audit: type=1400 audit(1618828728.374:87): apparmor=“STATUS” operation=“profile_replace” info=“same as current profile, skipping” profile=“unconfined” name=“lxc-container-default” pid=6780 comm
=“apparmor_parser”
Apr 19 10:38:48 mt1 kernel: [ 1526.172245] audit: type=1400 audit(1618828728.374:88): apparmor=“STATUS” operation=“profile_replace” info=“same as current profile, skipping” profile=“unconfined” name=“lxc-container-default-cgns” pid=6780
comm=“apparmor_parser”
Apr 19 10:38:48 mt1 kernel: [ 1526.172248] audit: type=1400 audit(1618828728.374:89): apparmor=“STATUS” operation=“profile_replace” info=“same as current profile, skipping” profile=“unconfined” name=“lxc-container-default-with-mounting”
pid=6780 comm=“apparmor_parser”
Apr 19 10:38:48 mt1 kernel: [ 1526.172251] audit: type=1400 audit(1618828728.374:90): apparmor=“STATUS” operation=“profile_replace” info=“same as current profile, skipping” profile=“unconfined” name=“lxc-container-default-with-nesting”
pid=6780 comm=“apparmor_parser”
Apr 19 10:38:48 mt1 lxd[6781]: t=2021-04-19T10:38:48+0000 lvl=warn msg=“CGroup memory swap accounting is disabled, swap limits will be ignored.”
Apr 19 10:38:48 mt1 lxd[6781]: unexpected fault address 0x7f0f154ec944
Apr 19 10:38:48 mt1 lxd[6781]: fatal error: fault
Apr 19 10:38:48 mt1 lxd[6781]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x7f0f154ec944 pc=0x462dac]
Apr 19 10:38:48 mt1 lxd[6781]: goroutine 1 [running]:
Apr 19 10:38:48 mt1 lxd[6781]: runtime.throw(0x101098c, 0x5)
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/runtime/panic.go:616 +0x81 fp=0xc4202c2be0 sp=0xc4202c2bc0 pc=0x435d61
Apr 19 10:38:48 mt1 lxd[6781]: runtime.sigpanic()
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/runtime/signal_unix.go:395 +0x211 fp=0xc4202c2c30 sp=0xc4202c2be0 pc=0x44b5c1
Apr 19 10:38:48 mt1 lxd[6781]: runtime.cmpbody()
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/runtime/asm_amd64.s:1615 +0xec fp=0xc4202c2c38 sp=0xc4202c2c30 pc=0x462dac
Apr 19 10:38:48 mt1 lxd[6781]: github.com/boltdb/bolt.(*Cursor).searchPage.func1(0x1830, 0x0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/boltdb/bolt/cursor.go:302 +0x96 fp=0xc4202c2c88 sp=0xc4202c2c38 pc=0xafea46
Apr 19 10:38:48 mt1 lxd[6781]: sort.Search(0x3061, 0xc4202c2d00, 0xaf048e)
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/sort/search.go:66 +0x58 fp=0xc4202c2cc0 sp=0xc4202c2c88 pc=0x4c1678
Apr 19 10:38:48 mt1 lxd[6781]: github.com/boltdb/bolt.(*Cursor).searchPage(0xc4202c2ef0, 0x17a6ad0, 0x4, 0x4, 0x7f0ee0151000)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/boltdb/bolt/cursor.go:299 +0xd1 fp=0xc4202c2d50 sp=0xc4202c2cc0 pc=0xaf1e21
Apr 19 10:38:48 mt1 lxd[6781]: github.com/boltdb/bolt.(*Cursor).search(0xc4202c2ef0, 0x17a6ad0, 0x4, 0x4, 0x151)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/boltdb/bolt/cursor.go:271 +0x18b fp=0xc4202c2e08 sp=0xc4202c2d50 pc=0xaf19eb
Apr 19 10:38:48 mt1 lxd[6781]: github.com/boltdb/bolt.(*Cursor).seek(0xc4202c2ef0, 0x17a6ad0, 0x4, 0x4, 0x0, 0x0, 0x180b808, 0x0, 0x0, 0xc420338000, …)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/boltdb/bolt/cursor.go:159 +0xa5 fp=0xc4202c2e58 sp=0xc4202c2e08 pc=0xaf11b5
Apr 19 10:38:48 mt1 lxd[6781]: github.com/boltdb/bolt.(*Bucket).CreateBucket(0xc4203301d8, 0x17a6ad0, 0x4, 0x4, 0xc420000180, 0xc4200dc230, 0xaf3d6b)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/boltdb/bolt/bucket.go:172 +0xfa fp=0xc4202c2f58 sp=0xc4202c2e58 pc=0xaed81a
Apr 19 10:38:48 mt1 lxd[6781]: github.com/boltdb/bolt.(*Bucket).CreateBucketIfNotExists(0xc4203301d8, 0x17a6ad0, 0x4, 0x4, 0xc4202c3010, 0x18, 0xc4202c3008)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/boltdb/bolt/bucket.go:206 +0x4d fp=0xc4202c2fb8 sp=0xc4202c2f58 pc=0xaedc1d
Apr 19 10:38:48 mt1 lxd[6781]: github.com/boltdb/bolt.(*Tx).CreateBucketIfNotExists(0xc4203301c0, 0x17a6ad0, 0x4, 0x4, 0x0, 0xf32840, 0xc42030ef01)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/boltdb/bolt/tx.go:115 +0x4f fp=0xc4202c3000 sp=0xc4202c2fb8 pc=0xafb62f
Apr 19 10:38:48 mt1 lxd[6781]: github.com/hashicorp/raft-boltdb.(*BoltStore).initialize(0xc420332400, 0x0, 0x0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft-boltdb/bolt_store.go:98 +0xae fp=0xc4202c3050 sp=0xc4202c3000 pc=0xb009ae
Apr 19 10:38:48 mt1 lxd[6781]: github.com/hashicorp/raft-boltdb.New(0xc42003be60, 0x24, 0xc420334220, 0xc42003be00, 0x24, 0x0, 0x0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft-boltdb/bolt_store.go:81 +0x103 fp=0xc4202c30b8 sp=0xc4202c3050 pc=0xb00843
Apr 19 10:38:48 mt1 lxd[6781]: github.com/lxc/lxd/lxd/cluster.raftInstanceInit(0xc4202fd500, 0xc420332300, 0xc4203121c0, 0x4008000000000000, 0x1, 0x70f7aa, 0xc420288da0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/raft.go:161 +0x3c3 fp=0xc4202c3280 sp=0xc4202c30b8 pc=0xb0d5a3
Apr 19 10:38:48 mt1 lxd[6781]: github.com/lxc/lxd/lxd/cluster.newRaft(0xc4202fd500, 0xc4203121c0, 0x4008000000000000, 0x0, 0x0, 0xc420332200)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/raft.go:71 +0x24d fp=0xc4202c32f8 sp=0xc4202c3280 pc=0xb0d09d
Apr 19 10:38:48 mt1 lxd[6781]: github.com/lxc/lxd/lxd/cluster.(*Gateway).init(0xc4202ecfc0, 0xc420332200, 0xc4200c3260)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/gateway.go:448 +0x84 fp=0xc4202c33b0 sp=0xc4202c32f8 pc=0xb06224
Apr 19 10:38:48 mt1 lxd[6781]: github.com/lxc/lxd/lxd/cluster.NewGateway(0xc4202fd500, 0xc4203121c0, 0xc4202c3718, 0x2, 0x2, 0x0, 0x24, 0xf7efc0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/gateway.go:57 +0x1f2 fp=0xc4202c3428 sp=0xc4202c33b0 pc=0xb04f32
Apr 19 10:38:48 mt1 lxd[6781]: main.(*Daemon).init(0xc42021ad80, 0xc4202c38d0, 0x40e936)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/daemon.go:502 +0x645 fp=0xc4202c3888 sp=0xc4202c3428 pc=0xc60ed5
Apr 19 10:38:48 mt1 lxd[6781]: main.(*Daemon).Init(0xc42021ad80, 0xc42021ad80, 0xc4200c2d80)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/daemon.go:390 +0x2f fp=0xc4202c38e0 sp=0xc4202c3888 pc=0xc607ef
Apr 19 10:38:48 mt1 lxd[6781]: main.(*cmdDaemon).Run(0xc420237440, 0xc4202c8c80, 0xc4202718f0, 0x0, 0x3, 0x0, 0x0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main_daemon.go:82 +0x37a fp=0xc4202c3b10 sp=0xc4202c38e0 pc=0xc92b8a
Apr 19 10:38:48 mt1 lxd[6781]: main.(*cmdDaemon).Run-fm(0xc4202c8c80, 0xc4202718f0, 0x0, 0x3, 0x0, 0x0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main_daemon.go:42 +0x52 fp=0xc4202c3b58 sp=0xc4202c3b10 pc=0xddbc92
Apr 19 10:38:48 mt1 lxd[6781]: github.com/spf13/cobra.(*Command).execute(0xc4202c8c80, 0xc4200300d0, 0x3, 0x3, 0xc4202c8c80, 0xc4200300d0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:762 +0x468 fp=0xc4202c3c48 sp=0xc4202c3b58 pc=0xb8a088
Apr 19 10:38:48 mt1 lxd[6781]: github.com/spf13/cobra.(*Command).ExecuteC(0xc4202c8c80, 0x0, 0xc4202dd180, 0xc4202dd180)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:852 +0x30a fp=0xc4202c3d88 sp=0xc4202c3c48 pc=0xb8aa9a
Apr 19 10:38:48 mt1 lxd[6781]: github.com/spf13/cobra.(*Command).Execute(0xc4202c8c80, 0xc4202c3e08, 0x1)
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:800 +0x2b fp=0xc4202c3db8 sp=0xc4202c3d88 pc=0xb8a76b
Apr 19 10:38:48 mt1 lxd[6781]: main.main()
Apr 19 10:38:48 mt1 lxd[6781]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main.go:160 +0xe26 fp=0xc4202c3f88 sp=0xc4202c3db8 pc=0xc90c06
Apr 19 10:38:48 mt1 lxd[6781]: runtime.main()
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/runtime/proc.go:198 +0x212 fp=0xc4202c3fe0 sp=0xc4202c3f88 pc=0x4375d2
Apr 19 10:38:48 mt1 lxd[6781]: runtime.goexit()
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc4202c3fe8 sp=0xc4202c3fe0 pc=0x4635d1
Apr 19 10:38:48 mt1 lxd[6781]: goroutine 5 [syscall]:
Apr 19 10:38:48 mt1 lxd[6781]: os/signal.signal_recv(0x0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/runtime/sigqueue.go:139 +0xa6
Apr 19 10:38:48 mt1 lxd[6781]: os/signal.loop()
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/os/signal/signal_unix.go:22 +0x22
Apr 19 10:38:48 mt1 lxd[6781]: created by os/signal.init.0
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/os/signal/signal_unix.go:28 +0x41
Apr 19 10:38:48 mt1 lxd[6781]: goroutine 7 [select]:
Apr 19 10:38:48 mt1 lxd[6781]: database/sql.(*DB).connectionOpener(0xc4202f6320, 0x1132620, 0xc4202f8bc0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/database/sql/sql.go:935 +0x119
Apr 19 10:38:48 mt1 lxd[6781]: created by database/sql.OpenDB
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/database/sql/sql.go:634 +0x178
Apr 19 10:38:48 mt1 lxd[6781]: goroutine 8 [select]:
Apr 19 10:38:48 mt1 lxd[6781]: database/sql.(*DB).connectionResetter(0xc4202f6320, 0x1132620, 0xc4202f8bc0)
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/database/sql/sql.go:948 +0x12a
Apr 19 10:38:48 mt1 lxd[6781]: created by database/sql.OpenDB
Apr 19 10:38:48 mt1 lxd[6781]: #011/usr/lib/go-1.10/src/database/sql/sql.go:635 +0x1ae
Apr 19 10:38:48 mt1 systemd[1]: lxd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

Looks like the bolt database got corrupted somehow.
We’ve long moved away from it and onto our own database layer now (in LXD 4.0 and higher) so this particular issue is extremely unlikely to happen on a modern LXD and we’ve got a lot more experience and tooling for data recovery when run on our current implementation.

From what you’re saying, it sounds like you managed to recover everything through lxd import. Alternatively we could probably have recovered data for you by doing some pretty targetted queries and then dumped that into a clean database.

Thank you, Stephane.
Where can I find the queries?
We are planning to switch to LXD 4.x during this year but we still have tens of 3.0.3 servers in production…
Best regrads,
Leonid

So I honestly don’t remember the layout on 3.0.x, what do you see in /var/lib/lxd/database/global/?

In modern LXD, we have a db.bin in there which is a read-only snapshot of the DB state (sqlite3) which you could use to dump to .sql. You can then wipe the global database which will have LXD create an empty one and feed it the missing data through /var/lib/lxd/database/patch.global.sql.

It’s a tiny bit involved because some records will be pre-created and you don’t want to try to inject those again, but it’s not too bad.

That assumes that db.bin exists on 3.0.x though, if it doesn’t, maybe your broken server was still able to run lxd sql global .dump which would get you a similar SQL output.

There is db.bin and also the snapshots folder. What I don’t see is patch.global.sql.

That’s normal, patch.global.sql is consumed if present and then deleted.

Ok, thanks.
Leonid