Failed to start the daemon: Failed to start dqlite server: raft_start()

It seems lxd can not start with this message when doing

lxd --debug --group lxd

DBUG[03-18|13:51:01] Not unmounting temporary filesystems (containers are still running) Error: Failed to start dqlite server: raft_start(): io: load closed segment 0000000000010645-0000000000010659: entries batch 28 starting at byte 326168: entries count in preamble is zero

please advise

Can you list the content of database/global?

Is that a hand built lxd?

@freeekanayaka

Did you experience a crash or something like that?

no, I had this problem Error: Get http://unix.socket/1.0: dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused after changing backups destination

I did a laptop restart in the meantime, I’m not sure if that could do it.
I tried to delete the problematic segment, didnt work.
I tried to restore from datatabase backup, and that brought it up, however, since backup was old, recent changes about new storage pool were not there.
I tried searching for lxd config so to remove storage pool config, but didn’t find anything, so I guessed the config is in the db itself.
Since I’m in a hurry, I went with the nuke option and will restore my containers from backups I have.

I am having a similar issue:

Error: Failed to start dqlite server: raft_start(): io: load closed segment 0000000000000002-0000000000000002: unexpected format version 8095768602490157155

I’m running the latest openSUSE Tumbleweed version of LXD. LXD always fails with the above message.

Look at /var/lib/lxd/database/global, there are a bunch of segments in there in sequential order, the last one is likely corrupted, move it aside and try launching LXD again.

That should fix it, then make sure you didn’t lose any of your config/containers in the process.

It’s the first time we see this and it’s very weird, since 0000000000000002-0000000000000002 is essentially the very first segment that we write (excluding 0000000000000001-0000000000000001 which just has bootstrap data).

Trying to remove 0000000000000002-0000000000000002 should help, as @stgraber suggests, but if you recall anything weird happening before this error, please let us know.

This is a fresh install of the OS, so I don’t know what could have gone wrong. I’ve actually done a fresh install a couple of times on this machine, and each time I have the same problem.

Okay, then there is definitely something going on. First thing would be to figure out what version of LXD and of its dependencies (libdqlite and libraft) you are using.

How should I go about that?

I think this involves zypper but that’s about the extent of my opensuse knowledge.

@cyphar may be able to help there as he’s done the packaging.

So the LXD version is 4.0.0. I can’t fine the libraries in zypper.

@ironlenny if it’s an option for you, I’d also recommend trying to install lxd using the snap, since we know exactly what’s in there and that’s what we test most. It’d be interesting to know if that fails in the same way.

So I tried removing 0000000000000002-0000000000000002 and got a new error: Error: failed to open cluster database: failed to ensure schema: disk I/O error

Can you try:

LXD_DIR=/tmp/lxd sudo -E lxd --verbose

?

This should sanity check that things are fine in general and this is really disk-related.

The snap version is working as well.

Ran fine.

Assuming that /var/snap/lxd and /var/lib/lxd are on the same disk and on the same partition, the only difference I could think of between the lxd snap and the system lxd package are versions of lxd or of its dependencies. That’s still weird since we never saw the error you reported.

/var/snapd/lxd and /var/lib/xd are on the same disk.

What’s the next step?