I don’t see us changing this logic any time soon. It’s at the core of all modern fault tolerant system using consensus algorithms.
This kind of systems completely prevent data corruption and impossible merges in the event of a network partition by always requiring a majority of the voter systems to agree to any transaction. This allows for continued operation of the cluster when a node drops off while also ensuring that the node which lost access to the cluster will not keep performing any mutating database access.
When the node regains access to the cluster, no data merging is needed as it couldn’t perform any writes in the first place, it instead simply retrieves the transactions that happened since it last communicated with other cluster nodes and moves on.
I’m not sure what you are referring to as far as users losing access to their containers.
LXD never shuts down the instances during updates or when losing access to the rest of the cluster. You won’t be able to use the LXD API to perform actions on a system which has lost access to the rest of the cluster, but the instances that were running on it will still be running.
As we start operating more and more production clusters for our own use and for our customers, we’ve been quite busy fixing a variety of issues. These days those are getting more and more niche which is a good sign that by and large things are working.
For the past 4 or 5 LXD releases, all 5 clusters I’m running (ranging from 3 to 24 nodes) have been self-upgrading without a hitch and our daily cluster upgrade tests have been reflecting that too: https://jenkins.linuxcontainers.org/job/lxd-test-cluster/
We have some planned work to make manual reconfiguration of the raft cluster a bit easier, looking at something similar to editing the monmap in Ceph. That should save the LXD team a bit of support time as for the most broken cases today, we need the user to ship us a copy of the database so we can manually edit it…
We’ve also recently had @mbordere join the team who will be taking over maintenance of libraft/libdqlite/go-dqlite with a focus on quality, testing and performance. That’s the clustering/database stack used by LXD, Anbox and microk8s.