I have three servers in a cluster, moe, larry, curly in production. They have been working fine, but after a reboot curly lxc wont work. lxc list never finishes. lxc list on the other two gives Failed to fetch http://unix.socket/1.0: 500 Internal Server Error. Which basically means I have three servers with hundreds of websites and applications at risk of being lost.
I don’t understand why if one server goes down, it all stops working. What is the point. Do I need a fourth server?
So I would normally, blow away the bad server and reinstall from backup. The weird part is that the backup files in both a separate drive and the ones I copy it is too are also not showing. It is like the Zpool is corrupt, but why would that affect file outside of it. I know this doesn’t not make sense. I copied /var/lib/lxd to a separate drive and they are their but files in container is missing.
Anyway, so i need to either restart LXD cluster getting Curly back online or somehow get data out of it and reestablish it again.
Any and all help is welcomed, this basically ruined my Sunday and perhaps the rest of the week. And I may have to shutdown cluster idea if I can’t figure this out and go back to individual lxc.