Snap lxd broken - can't do anything!

sudo lxc list
-just waits forever

sudo lxd init
-just waits forever

sudo snap remove lxd
error: snap “lxd” has “refresh-snap” change in progress

I have even tired to reboot.

I don’t need any data, config, storage pools, networks, so i would like to reset LXD to ‘as new’ and reset up, previously i did join this LXD to a cluster which i do want to do again.

Is there a reset script i can run to undo the LXD init changes?

Something went wrong and it appears that the snap is stuck at updating.
snap info lxd should give you some info.
There are also other snap commands to give insight into what is happening.
snap stop lxd could help, or as a last resort, reboot the computer.

It is important to figure out what’s going on in your specific case because you would not want this issue to reappear in the future.

Ah so using
sudo snap stop lxd
and starting again in debug mode
sudo lxd --debug --group lxd
i saw an ip address that it was failing to talk to, this must have been the other cluster member, i checked and the ip had changed and so put it back and rebooted and it all came back to life.

it would be nice if you see what the config is and show an error ?

lxc cluster list and lxc cluster show should show the cluster details.
I have not used LXD with clustering. I assume that it should report if a node is down.

If you have just three nodes and one is down, then the cluster is stuck because they are too few.
Did you have such a case?

Did you see any relevant info in /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log?

In any case, the situation should be handled gracefully.

The issue is if the host is part of a cluster and doesn’t have corrum then there nothing that you can do to control LXD. Snap wont disable it or remove it or stop it, and none of the LXC or LXD commands work !
Rebooting server does not change anything.

starting LXD in debug give this:
root@de-db02:~# lxd --debug --group lxd
DBUG[01-30|09:30:22] Connecting to a local LXD over a Unix socket
DBUG[01-30|09:30:22] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=

and waits for ever and logs created in cat /var/snap/lxd/common/lxd/logs/lxd.log

t=2019-01-30T09:25:04+0000 lvl=info msg=“Initializing global database”
t=2019-01-30T09:25:08+0000 lvl=warn msg=“Raft: not part of stable configuration, aborting election”
t=2019-01-30T09:26:27+0000 lvl=warn msg=“Failed connecting to global database (attempt 6): failed to create dqlite connection: no available dqlite leader server found”

So how do i get out of this hole? How do tell this host its not a member of a cluster anymore? I don’t have any data or containers etc that i need so i can even go with removing the lot and re setting up!

ps. I got in this mess because i used a laptop as the other member that had a dhcp assigned ip address which changed over the weekend. Don’t know what the ip address would have been. But that one is now fixed LXD removed and reinstalled (don’t know what i did differently)

I believe you can uninstall the LXD snap and then install again. By doing so, it should remove all remnants of the previous installation. Those remnants are in /var/snap/lxd/common/.
After you snap remove lxd, have a look into /var/snap/lxd/ and make sure it is empty.
Then, when you install the LXD snap again, you are starting over from fresh.

(the snap commands do not require sudo if you perform first snap login).

snap remove lxd wont work

root@de-db02:~# snap remove lxd
error: snap “lxd” has “disable-snap” change in progress

I previously tried to disable lxd before a reboot, with the thinking that if it was not running then i might be able to remove it. But LXD is hung so now i can’t do anything.

Can i just remove all of /var/snap/lxd/ what will happen? do you think snap will fix its self if LXD has gone?

I don’t know about this.

When you do snap stop lxd, does it work? Because if you stop the snap, you can then remove.

I was looking at #5423 very similar lxd logs:

t=2019-01-30T15:58:08+0000 lvl=info msg=“Initializing global database”
t=2019-01-30T15:58:11+0000 lvl=warn msg=“Raft: not part of stable configuration, aborting election”
t=2019-01-30T15:59:36+0000 lvl=warn msg=“Failed connecting to global database (attempt 6): failed to create dqlite connection: no available dqlite leader server found”

I had made the local sql ip address the same and even updated the global db nodes address.
stopped lxd with:

  • sudo systemctl stop snap.lxd.daemon snap.lxd.daemon.unix.socket
  • sudo pkill -9 lxd
  • sudo lxd --debug --group lxd

but it was no different, so i rebooted looked at the logs and they didn’t look any different then your post pops up and i thought ill get you a screen shot of the error reply, when woo hoo , I have managed to stop it with
snap stop lxd then a quick snap remove lxd and a look at /var/snap/ its gone!

so i still don’t really know what fixed it, but thanks for your help and encouragement simos.

when running LXD init what do these question mean

  • Choose “source” property for storage pool “local”:
  • Choose “volatile.initial_source” property for storage pool “local”:
  • Choose “zfs.pool_name” property for storage pool “local”:

I dodged these questions. I didn’t remember them coming from the 1st time so i snap removed lxd in the 1st node too.

So my take away from this is if a node fails to join for what ever reason remove the snap and reinstall and try lxd init again with different options, and there is quite a few sometimes.

Has anyone else managed to get clusting to work properly?

I now have a cluster of 3 real servers!

i have had to add a root disk to the default profile, and the storage wont work on one host.
zfs on /dev/sdb on all three, i have cleared that out completely and rebuild it manually the same two server work and containers start, oh and fan networking added and worked very kool. But de-db02 wont create the storage logs say:

stev@de-db03:~$ sudo lxc start c2
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart c2 /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/c2/lxc.conf:
Try lxc info --show-log c2 for more info
stev@de-db03:~$ lxc info --show-log c2
Name: c2
Location: de-db02
Remote: unix://
Architecture: x86_64
Created: 2019/01/31 14:15 UTC
Status: Stopped
Type: persistent
Profiles: default

Log:

lxc c2 20190131142546.124 WARN conf - conf.c:lxc_map_ids:2970 - newuidmap binary is missing
lxc c2 20190131142546.124 WARN conf - conf.c:lxc_map_ids:2976 - newgidmap binary is missing
lxc c2 20190131142546.152 WARN conf - conf.c:lxc_map_ids:2970 - newuidmap binary is missing
lxc c2 20190131142546.152 WARN conf - conf.c:lxc_map_ids:2976 - newgidmap binary is missing
lxc c2 20190131142546.164 ERROR dir - storage/dir.c:dir_mount:198 - Permission denied - Failed to mount “/var/snap/lxd/common/lxd/containers/c2/rootfs” on “/varxc/”
lxc c2 20190131142546.164 ERROR conf - conf.c:lxc_mount_rootfs:1351 - Failed to mount rootfs “/var/snap/lxd/common/lxd/containers/c2/rootfs” onto "/var/snap/lxd options “(null)”
lxc c2 20190131142546.164 ERROR conf - conf.c:lxc_setup_rootfs_prepare_root:3498 - Failed to setup rootfs for
lxc c2 20190131142546.164 ERROR conf - conf.c:lxc_setup:3551 - Failed to setup rootfs
lxc c2 20190131142546.164 ERROR start - start.c:do_start:1279 - Failed to setup container “c2”
lxc c2 20190131142546.165 ERROR sync - sync.c:__sync_wait:62 - An error occurred in another process (expected sequence number 5)
lxc c2 20190131142546.165 WARN network - network.c:lxc_delete_network_priv:2589 - Operation not permitted - Failed to remove interface “eth0” with index 14
lxc c2 20190131142546.165 ERROR start - start.c:__lxc_start:1972 - Failed to spawn container “c2”
lxc c2 20190131142546.165 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:864 - Received container state “ABORTING” instead of “RUNNING”
lxc c2 20190131142546.165 WARN conf - conf.c:lxc_map_ids:2970 - newuidmap binary is missing
lxc c2 20190131142546.165 WARN conf - conf.c:lxc_map_ids:2976 - newgidmap binary is missing
lxc 20190131142546.169 WARN commands - commands.c:lxc_cmd_rsp_recv:132 - Connection reset by peer - Failed to receive response for command “get_state”

The title of this thread is Snap lxd broken - can’t do anything!.
It’s better to start a new thread about clustering questions.
I suppose you have looked into https://lxd.readthedocs.io/en/latest/clustering/

Regarding this specific thread, you were not able to use LXD because in clustering you did not have enough clusters to make it work (because one of them had a different IP). You can pick one of the replies and mark it as the summary for Solved, or summarize in a new post what happened.

Can you show the content of /var/snap/lxd/common/lxd/logs/c2/lxc.conf? The LXC error is pretty confusing, showing what looks like a bunch of invalid paths, so I’m wondering what’s going on there.

Thanks for your reply, but since yesterday i remove de-db02 from the cluster, remove lxd snap, removed all zfs and zpools then re-installed lxd snap, joined to cluster
at questions :

  • Choose “source” property for storage pool “local”: /dev/sdb
  • Choose “volatile.initial_source” property for storage pool “local”: /dev/sdb
  • Choose “zfs.pool_name” property for storage pool “local”: local

And its working as a proper cluster now with 3 containers on 3 nodes, great but, now name resolution is not working- i’ll open a new thread.

Back to the subject of the title of the thread i do feel like if you do anything not 100% right then you will end up with a cluster that is broken, LXD is stuck and then there is nothing you can do to fix it and It doesn’t tell you why. I spent 2 days trying to get it to position where i could wipe and reinstall LXD, and i was lucky that i didn’t have any containers i needed. I have read the docs (and watched the videos), but find parts brief.
Thanks for your help.

1 Like