If I reinstall LXD with a previous ZFS Pool, is there anyway to recover it, or make it work

(Tony Anytime) #1

So in a previous episode, I still have 4 servers in a cluster that LXD stop working. Two servers the lxc containers are still running. The other 2 servers that I rebooted lxc wont run becuase lxd won’t start because they can’t find socket. This happened to me before in another upgrade and it was a permission issue changed by the upgrade. This seems different.

But anyway the question is there any way to migrate these container s in a non-functioning cluster to another new cluster. The problem is that when you do lxd init it destroys your data. Which is really dumb for exactly the problem I have here.

(Free Ekanayaka) #2

Tony, you shouldn’t have rebooted the two servers, I told you that the containers would not be restarted, because the LXD database is corrupted. Please see my replies in the other thread, and apply the solution I point in the last comment. That should make all your nodes recover.

(Free Ekanayaka) #3

Hm, I see you actually tried that. We might manually try to recover the database, but it’s kind of hard work. If @stgraber has simpler options to do what you ask you (recover from ZFS), it’s probably going to be quicker.

(Tony Anytime) #4

The first server JOE was not critical, so losing it was just testing to see if upgrade would work.
The second server I reboot, was to see if problem was somehow something that fix itself via upgrade and information into while important is not total irreplaceable. Server 3 and 4 are critical and does can not go down with out a working recovery plan.

Anyway, you keep saying database is corrupt, but it tells me everywhere it is a unix.socket. It can’t find servers to do voting. If you see screen with 4 servers, in other post they are wanting to talk, but can’t because they can’t find each other.

To me the simplest thing would be to manually tell each server they are not part of cluster and just run independently. Then I can recover containers or create a new one and move them over. There should be a way to manually trick lxc to starting without LXD.

Last time something similar to this happen I was able to fix it like this tread, Resolved- LXD Cluster dying - Failed to fetch http://unix.socket/1.0: 500 Internal Server Error and my files are missing