Why is the solution to many cluster problem erasing everything and starting all over? Is there a way to keep Zfs data without erasing everytime

Ok, so now I am on ver 3.20.and I have been using LXD since version 1 so while not an expert and I am getting pretty good at making this work. Or at the much needed skill of reinstalling everything and reinstalling again.
After running a test for several weeks, here is what I found. Every time, I would bring down more than 3 database servers, or more specifically shutdown the whole cluster, I would need several restart to get it going. Version 3.20 is much better than previous ones, but it still has issues.
Today I boot up the servers again and it is stuck, restarting it wont do it. They are all dead as door nails. Luckily these units are for testing for now. I am reinstalling the whole cluster again… and of course it says all data will be lost. It isn’t a problem since this is a test unit. But obviously all this is not acceptable in a production unit. I have 4 other server running LXD 3.18 and I don’t reboot those in fear. Because it will turn into weeks of unnecessary work.

So the Question is again…Why do we have to lose the data in zfs every time a member is added, why can’t cluster members be added and removed at will without destroying the data?

We don’t want to have to merge databases and figure out what to replicate to the new node and what to push to the cluster. So whenever you join a cluster, the local database is deleted and the cluster database is pulled instead.

One disaster recovery method you could use is to delete the database on the node, restart LXD, validate with lxc info that it’s functional in standalone mode, then use the lxd import disaster recovery instructions to get the local containers imported into your now standalone LXD.
Once that’s done and everything looks good, you can lxc move those containers onto another cluster node, once all moved away, you can now reinstall the broken system and join it into the cluster.

Note that we really don’t expect clusters to need to be re-deployed ever. I run a half dozen production clusters and while I can’t say it’s not been a bumpy ride sometimes, I’ve never had to actually re-deploy any of them, they also all get rebooted for kernel security updates on a regular basis (and to makes more fun, half of them also operate a CEPH cluster on the same hardware).

There obviously are bugs and @freeekanayaka has been quite busy solving those that we have reproducers for, or when corruption happens, database dumps that we can look at. Even with corrupted databases, I don’t think we’ve ever had a case where we couldn’t revert a few transactions, make dqlite load it back up and then manually put that in place on the relevant nodes and replicate it to the rest.

I am talking about containers in ZFS. You lose them if the Cluster fails. Yes, they can be recovered if you can get that node running, but most of the time you cant reuse the container pool. It should not be data recovery export/import thing, it should just be re-add node to cluster.

The local/global database sync is a different thing. You require three to vote, and when there are two or one it freaks out. Read my comments on this here. How do you "upgrade" a cluster non-database member to database member?

Can you make LXD cluster work now, sure, but if you do an apt upgrade and reboot all your server, have to power down for some reason, you will go through hell. And you should not have to. LXD reminds me of my old boat, every time I go start it I have to through this whole procedure. I should not have to. It should go up and down 100% of the time. And if one node or all but one don’t work, then the working nodes should work. And if I bring back a node then it should work. It should not require black magic. I don’t mind if it is a manual thing to add, promote or demote a node. One should be able disable a cluster too, just like you enable one. And everything should always work locally. Solving this problem will help you guy save so much time on support, give system user friendliness and fault tolerance. There is also a problem on the booting process. My latest install require all kinds of poking every time I boot it. You can’t keep adding features and forget about this basic problem, think like a user not a programmer.
Believe me Stephane not trying to be a pain, I really like this product and I want to be the best.

Right, that’s our goal. You said that things have improved with 3.20 (let’s say it went from 60% to 90%, I’m making up numbers here), please report here what you’re seeing and we’ll make it to 100%.

There three types of problems… (Making up numbers :wink: )
Problem A, that fact it wont come up after a shutdown - This has improve from say 60% to 80% - It will be ok if you leave at least two node running. Switches back a forth better.
Problem B how easy you recover from it not connecting, it has gotten better, gone from 60% to 90%, I just use my reset script, but if it fails completely - 1 out of 10 times you fall into Problem C.
Problem C, need to either reinstall completely, blow away cluster OR spend a day or two with you guys on tech support trying to figure out issues, poking db. The later is no fun for either one of us.
Consequently I am spending lots of time testing this whole process 200% before I am ready to implement a new production cluster.