This is a bad start of the year. I did a snap remove lxd
on one of the cluster nodes.
The LVS volumes are still present. But what should I do to bring this cluster node (and its containers) back to life?
This is a bad start of the year. I did a snap remove lxd
on one of the cluster nodes.
The LVS volumes are still present. But what should I do to bring this cluster node (and its containers) back to life?
You should be able to snap install
the same version as the rest of the cluster, then use snap restore
to restore the backup which snapd would have generated at deletion time.
With a bit of luck, that’s all you’ll need to get back online.
Maybe I already screwed up. I did a snap install
and then I was trying to do lxd init
again. However that failed because it my LVS pool was already (still) initialized.
Now I did snap restore
, which succeeded. However with lxc ls
I’m getting
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory
Do I need to restart it somehow?
From another cluster node I can see the containers as “STOPPED”. When I try to start it fails with
lxc start cov-connect
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart cov-connect /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/cov-connect/lxc.conf:
Try `lxc info --show-log cov-connect` for more info
# lxc info --show-log cov-connect
Name: cov-connect
Location: roer
Remote: unix://
Architecture: x86_64
Created: 2020/08/10 12:58 UTC
Status: Stopped
Type: container
Profiles: default_pub
Log:
lxc cov-connect 20210106115333.374 ERROR conf - conf.c:run_buffer:324 - Script exited with status 1
lxc cov-connect 20210106115333.374 ERROR start - start.c:lxc_init:798 - Failed to run lxc.hook.pre-start for container "cov-connect"
lxc cov-connect 20210106115333.374 ERROR start - start.c:__lxc_start:1945 - Failed to initialize container "cov-connect"
lxc cov-connect 20210106115333.623 ERROR conf - conf.c:run_buffer:324 - Script exited with status 1
lxc cov-connect 20210106115333.623 ERROR start - start.c:lxc_end:916 - Failed to run "lxc.hook.stop" hook
lxc cov-connect 20210106115333.588 ERROR conf - conf.c:run_buffer:324 - Script exited with status 1
lxc cov-connect 20210106115333.588 ERROR start - start.c:lxc_end:958 - Failed to run lxc.hook.post-stop for container "cov-connect"
lxc cov-connect 20210106115333.588 ERROR lxccontainer - lxccontainer.c:wait_on_daemonized_start:851 - No such file or directory - Failed to receive the container state
Stopping and starting as follows solved this missing unix.socket
.
systemctl stop snap.lxd.daemon snap.lxd.daemon.unix.socket
systemctl start snap.lxd.daemon.unix.socket
Now the system is behaving normal again.
Thanks @stgraber