It seems lxd can not start with this message when doing
lxd --debug --group lxd
DBUG[03-18|13:51:01] Not unmounting temporary filesystems (containers are still running) Error: Failed to start dqlite server: raft_start(): io: load closed segment 0000000000010645-0000000000010659: entries batch 28 starting at byte 326168: entries count in preamble is zero
I did a laptop restart in the meantime, I’m not sure if that could do it.
I tried to delete the problematic segment, didnt work.
I tried to restore from datatabase backup, and that brought it up, however, since backup was old, recent changes about new storage pool were not there.
I tried searching for lxd config so to remove storage pool config, but didn’t find anything, so I guessed the config is in the db itself.
Since I’m in a hurry, I went with the nuke option and will restore my containers from backups I have.
Look at /var/lib/lxd/database/global, there are a bunch of segments in there in sequential order, the last one is likely corrupted, move it aside and try launching LXD again.
That should fix it, then make sure you didn’t lose any of your config/containers in the process.
It’s the first time we see this and it’s very weird, since 0000000000000002-0000000000000002 is essentially the very first segment that we write (excluding 0000000000000001-0000000000000001 which just has bootstrap data).
Trying to remove 0000000000000002-0000000000000002 should help, as @stgraber suggests, but if you recall anything weird happening before this error, please let us know.
This is a fresh install of the OS, so I don’t know what could have gone wrong. I’ve actually done a fresh install a couple of times on this machine, and each time I have the same problem.
Okay, then there is definitely something going on. First thing would be to figure out what version of LXD and of its dependencies (libdqlite and libraft) you are using.
@ironlenny if it’s an option for you, I’d also recommend trying to install lxd using the snap, since we know exactly what’s in there and that’s what we test most. It’d be interesting to know if that fails in the same way.
So I tried removing 0000000000000002-0000000000000002 and got a new error: Error: failed to open cluster database: failed to ensure schema: disk I/O error
Assuming that /var/snap/lxd and /var/lib/lxd are on the same disk and on the same partition, the only difference I could think of between the lxd snap and the system lxd package are versions of lxd or of its dependencies. That’s still weird since we never saw the error you reported.