LXD Snap stuck on “Undoing”

I use LXD for my cluster and I can’t access it for a couple of days, because the LXD versions are incompatible across the cluster. In one of the node it’s upgraded to 4.3; but on the other one, it’s still 4.2.

I tried to upgrade it manually by sudo snap refresh lxd , but it returns:

error: snap "lxd" has "auto-refresh" change in progress

I checked the status via snap changes and it tells me:

ID   Status   Spawn                     Ready  Summary
57   Undoing  yesterday at 05:24 +0430  -      Auto-refresh snap "lxd"

Trying to abort this via sudo snap abort 57 has no results and the process is still there.

How can I manage this?

Please can you attach the /var/snap/common/lxd/logs/lxd.log file of all the members of your cluster?

node-1(192.168.139.17): https://paste.ubuntu.com/p/BjNn33BmZY/
node-2 (192.168.139.18): https://paste.ubuntu.com/p/QhBpPkMFZw/

Weird. Did you have more than two nodes in this cluster at some point? You should be able to recover the situation buy running this procedure.

By the way, if you wish to run a LXD cluster please make sure you eventually grow to at least 3 nodes. Running 2 nodes is discouraged because we can’t provide high-availability in that case.

From the doc:

It is strongly recommended that the number of nodes in the cluster be at least three, so the cluster can survive the loss of at least one node and still be able to establish quorum for its distributed state (which is kept in a SQLite database replicated using the Raft algorithm). If the number of nodes is less than three, then only one node in the cluster will store the SQLite database. When the third node joins the cluster, both the second and third nodes will receive a replica of the database.

There was no third node at any point. I would add some nodes to this cluster later when we can afford them.

You might have been hit by a bug that could happen when rebooting a 2-node cluster and that has now been fixed in master, and rolled out to the 4.0.x series. Please make sure you upgrade to 4.4 once released and as said preferably grow to 3 nodes.

1 Like

Now they both stuck at Doing:

ID   Status  Spawn                 Ready  Summary
55   Doing   today at 03:05 +0430  -      Auto-refresh snap "lxd"

Logs?

Now there are both done and are in 4.3 (16044), but no lxc command works and it hangs here:

$ lxc --debug list
DBUG[07-09|16:52:19] Connecting to a local LXD over a Unix socket 
DBUG[07-09|16:52:19] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=

logs:

  1. https://paste.ubuntu.com/p/GYwgHTRf9K/
  2. https://paste.ubuntu.com/p/MBMjvfP2RW/

I got stuck with the same symptoms. After rebooting into single and starting

systemctl start snapd.service

I was able to abort the doing task.

Hope this helps!