LXD failed after refereshing to 5.21

Hi there,
My LXD cluster is not assessible after refreshing to 5.21.

Here is the out of systemctl status snap.lxd.daemon.service

● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static; vendor preset: enabled)
Active: active (running) since Mon 2024-04-15 11:23:18 HKT; 740ms ago
TriggeredBy: ● snap.lxd.daemon.unix.socket
Main PID: 193717 (daemon.start)
Tasks: 0 (limit: 96511)
Memory: 5.7M
CGroup: /system.slice/snap.lxd.daemon.service
‣ 193717 /bin/sh /snap/lxd/28155/commands/daemon.start

Apr 15 11:23:18 bvlgari lxd.daemon[146675]: Reloaded LXCFS
Apr 15 11:23:19 bvlgari lxd.daemon[193717]: => Re-using existing LXCFS
Apr 15 11:23:19 bvlgari lxd.daemon[193717]: ==> Reloading LXCFS
Apr 15 11:23:19 bvlgari lxd.daemon[193717]: => Starting LXD
Apr 15 11:23:19 bvlgari lxd.daemon[193882]: time=“2024-04-15T11:23:19+08:00” level=warning msg=" - Couldn’t find the CGroup blkio.weight, disk priority will be ignored"
Apr 15 11:23:19 bvlgari lxd.daemon[193882]: time=“2024-04-15T11:23:19+08:00” level=warning msg=“Dqlite: attempt 1: server 192.168.1.16:8443: no known leader”
Apr 15 11:23:19 bvlgari lxd.daemon[193882]: time=“2024-04-15T11:23:19+08:00” level=warning msg="Dqlite: attempt 1: server 192.168.1.17:8443: dial: Failed connecting to HTTP endpoint "192.168.1.17:8443": dial tcp 192.168>
Apr 15 11:23:19 bvlgari lxd.daemon[193882]: time=“2024-04-15T11:23:19+08:00” level=warning msg=“Dqlite: attempt 1: server 192.168.1.21:8443: no known leader”
Apr 15 11:23:19 bvlgari lxd.daemon[193882]: time=“2024-04-15T11:23:19+08:00” level=warning msg=“Dqlite: attempt 1: server 192.168.1.22:8443: no known leader”
Apr 15 11:23:19 bvlgari lxd.daemon[193882]: time=“2024-04-15T11:23:19+08:00” level=warning msg=“Dqlite: attempt 1: server 192.168.1.23:8443: no known leader”

I believe that the dqlite database was having difficulty connecting to each other.

All containers are not down now. It is a production system. So, it is critical to get the LXD up ASAP.

How can I get the LXD up please?

Thank you in advance.

Regards,
Terry

I would ask this question at LXD discourse forum and not here, because this is an Incus forum. Incus is a LXD fork created when Canonical took over the project.

1 Like

Canonical broke ZFS 0.8 in lxd 5.21; I doubt that’s the issue here, but it’s possible they broke other stuff too. Unfortunately, they also made it impossible to rollback from 5.21 to 5.20, because they modified the database schema and it’s irreversible.

Personally, I run standalone incus nodes. The risks of clustering breaking (and the difficulty of fixing it if it does) is too high for me, and outweighs the benefits of a single API view of all containers.

But worse is the auto-updating behaviour of snap. If you must run lxd, which means you have to run snap(*), then it’s critical to prevent snap auto-updating. At very least you should pin your software to a fixed branch:

snap refresh lxd --channel=5.21/stable

But if you migrate to incus, you can be free of snap completely - happy days!

(*) Actually, Debian packaged lxd 5.0 as native deb packages, but that was before the lxd license change, and it’s not going to be maintained going forward.

Thank you for the information.
As mentioned by qkiel, this topic should be posted in the LXD discourse forum instead here. However, I think that I should share what I have found to resolve this issue.

I am using 20.04 and just want to share to everyone that my issue was resolved by installing Hardware Enablement (HWE) stack from Canonical.

As for the Incus project, I will look into it seriously.

Thanks!

Terry

Thank you for giving the resolution. FWIW, that’s the same fix as is required for the ZFS issue.

Fixed it for me aswell for a server on focal.

sudo apt-get install --install-recommends linux-generic-hwe-20.04

1 Like