Daemon failed to start

hi,

Recently I faced with a dead daemon on one of my servers, snap start lxd failed to start it.
In the logs I found complaining messages about dqlite.
In forum I found the idea to remove the last segment of dqlite files and it helped:

root@tummy:/var/snap/lxd/common/lxd/database/global# mv 0000000000085677-0000000000085691 0000000000085677-0000000000085691~

Awesome.
But what did happen? Why did this happen?

10x
tamas

Something bad happened with the database, it could have been a crash during shutdown, your system crashing, ran out of disk space, … anyway. The last DB transaction segment somehow got corrupted and LXD couldn’t read it back during startup.

We now have @mbordere on our team who’s going through quite a bit of backlog of issues with dqlite and doing a bunch of stress testing to track down issues like this.

Removing the last segment did not fix everything entirely.

lxc start efop

Error: Failed preparing container for start: Failed to run: zfs mount tank/lxd/containers/efop: cannot mount ‘tank/lxd/containers/efop’: filesystem already mounted
Try lxc info --show-log efop for more info

Any advice on this?

If it’s an option, a reboot of your system will take care of that cleanly.

If not, can you show me grep containers/efop /proc/*/mountinfo?

I can reboot it, but I would rather find a more permanent or engineering solution:)

I couple of months ago I had a very similar issue. I don’t find the solution from that, but your instructions helped definitely (it was something like clean up namespaces…).

root@tummy:~# grep containers/efop /proc/*/mountinfo
/proc/2096/mountinfo:6037 762 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/2318/mountinfo:6037 762 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/2417/mountinfo:6037 762 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/2709135/mountinfo:6039 6216 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
grep: /proc/2854549/mountinfo: No such file or directory
grep: /proc/2854553/mountinfo: No such file or directory
grep: /proc/2854809/mountinfo: No such file or directory
grep: /proc/2854810/mountinfo: No such file or directory
grep: /proc/2854811/mountinfo: No such file or directory
grep: /proc/2854812/mountinfo: No such file or directory
grep: /proc/2854813/mountinfo: No such file or directory
grep: /proc/2854814/mountinfo: No such file or directory
/proc/2895/mountinfo:6037 762 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/3274/mountinfo:6037 762 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/3515/mountinfo:6037 762 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/3793/mountinfo:6037 762 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/4109630/mountinfo:6039 6216 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/4109760/mountinfo:6039 6216 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/4109863/mountinfo:6039 6216 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/4405/mountinfo:6037 762 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/4595/mountinfo:6037 762 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/4861/mountinfo:6037 762 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl
/proc/532341/mountinfo:6039 6216 0:89 / /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop rw,noatime shared:197 - zfs tank/lxd/containers/efop rw,xattr,posixacl

Thanks,

tamas

A reboot would get you a clean mount table which should avoid further issues, what I can give you to avoid a reboot are just workarounds which may break for the next container.

In this case, you can do:
nsenter -t 532341 umount -l /var/snap/lxd/common/shmounts/
nsenter -t 532341 umount /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop

root@tummy:~# nsenter -t 532341 umount -l /var/snap/lxd/common/shmounts/
umount: /var/snap/lxd/common/shmounts/: not mounted.
root@tummy:~# nsenter -t 532341 umount /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop
umount: /var/snap/lxd/common/shmounts/storage-pools/default/containers/efop: no mount point specified.

I can reboot the machine. I’m just worried about this issue, that it will come back in the future again.

What could we do to avoid it?

Or do you also expect to solve these kind of issues from dqlite fixes you mentioned in your first message?

10x

t

This has nothing to do with dqlite, it’s a mount namespace bug that’s been affecting LXD for a few years, we’re still trying to sort out a solid reproducer so we can fix the remaining edge cases.

Sadly once the bug hits, there isn’t a lot we can do, we can clear some mount entries to make LXD happy again, but that in turn will affect other containers.

That’s why I usually lead with just rebooting the machine as that guarantees you’re in a sane state again. It’s only if that’s not an option that we really start offering the workarounds :wink: