Here we go again. My Cluster Controller died, left two machines not running

It will take a few days before I get my third server up. I was doing some upgrades to fourth and third overheated and died. What can I do meantime to get those two machines working non clustered? Part of the problem when they reboot they upgrade to ver 6.4

Working machines are Q1 and Q4

root@Q4:/home/ic2000# incus version
Client version: 6.4

root@Q1:/home/ic2000# incus version
Client version: 6.4
Server version: unreachable

root@Q1:/home/ic2000# incus admin sql global “SELECT * FROM nodes”
±—±-----±------------±-----------------±-------±---------------±------------------------------------±------±-----±------------------+
| id | name | description | address | schema | api_extensions | heartbeat | state | arch | failure_domain_id |
±—±-----±------------±-----------------±-------±---------------±------------------------------------±------±-----±------------------+
| 1 | Q4 | | 84.17.40.21:8443 | 74 | 412 | 2024-09-04T14:41:02.198972188-04:00 | 0 | 2 | |
| 2 | Q3 | | 84.17.40.20:8443 | 73 | 395 | 2024-06-18T21:27:27.334257305-04:00 | 0 | 2 | |
| 3 | Q2 | | 84.17.40.19:8443 | 73 | 406 | 2024-09-03T15:19:40.954906083-04:00 | 0 | 2 | |
| 4 | Q1 | | 84.17.40.18:8443 | 74 | 412 | 2024-09-04T14:40:57.510368543-04:00 | 0 | 2 | |
±—±-----±------------±-----------------±-------±---------------±------------------------------------±------±-----±------------------+

Your help is always appreciated.

You can do incus admin sql global "UPDATE nodes SET schema=74 api_extensions=412" to pretend that everyone is running 6.4.

I think the question was how to take a broken 4-node cluster, which has only 2 working nodes, and make those 2 work again - either by converting them to standalone nodes, or by turning the 4-node cluster into a 3-node cluster while two are offline.

I can’t answer that question. (Aside: does incus admin sql global work if there’s no quorum?)

All I can say is what I’ve said before: using incus clustering gives you lower overall availability, because it introduces cluster-wide failure modes which cause the whole system to fail and are hard to recover from. To be running with N nodes down, you need to have 2N+1 nodes in the cluster, which in this case means you would have needed a 5 node cluster. Also, all those nodes have to be on the same version.

If you care about availability, then running without clustering gives you the highest availability for any number of nodes, since you can be working (at least partially) with as few as 1 node.

If there’s some reason you need clustering features, but you also want high availability, then I’d suggest you run two independent clusters with an odd number of nodes each, which would mean 6 nodes minimum.

Then distribute your applications over the two clusters: either so that they are running independently live-live and can withstand loss of one cluster; or replicate them from one cluster to the other, so that the replica on the second cluster can be brought up easily if the first cluster dies.

If you are running some application which uses its own Raft-style replication with its own quorum (e.g. etcd or cockroachdb) then you need three clusters for high availability, which may as well be three standalone nodes. Of course, those applications will themselves stop working if they don’t have a quorum of working nodes, so arguably it’s not much different than running under a single incus cluster. But I’d say that having incus nodes which remain manageable allows you to fix the problem more easily.

1 Like

This worked
incus admin sql global “UPDATE nodes SET schema=74, api_extensions=412”

slight typo

| NAME | URL | ROLES | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATUS | MESSAGE |
±-----±-------------------------±----------------±-------------±---------------±------------±--------±-----------------------------------------------------------------------------------+
| Q1 | https://84.17.40.18:8443 | database-leader | x86_64 | default | | ONLINE | Fully operational |
| | | database | | | | | |
±-----±-------------------------±----------------±-------------±---------------±------------±--------±-----------------------------------------------------------------------------------+
| Q2 | https://84.17.40.19:8443 | database | x86_64 | default | | OFFLINE | No heartbeat for 38h25m39.310974772s (2024-09-03 15:19:40.954906083 -0400 -0400) |
±-----±-------------------------±----------------±-------------±---------------±------------±--------±-----------------------------------------------------------------------------------+
| Q3 | https://84.17.40.20:8443 | | x86_64 | default | | OFFLINE | No heartbeat for 1880h17m52.931612145s (2024-06-18 21:27:27.334257305 -0400 -0400) |
±-----±-------------------------±----------------±-------------±---------------±------------±--------±-----------------------------------------------------------------------------------+
| Q4 | https://84.17.40.21:8443 | database | x86_64 | default | | ONLINE | Fully operational |

Thank you tremendously stgraber,

To Brians’s point, I have been saying same thing since 2018. You need to be able adhocly remove and add machines to a cluster while everything is still running. And if you reboot and something is wrong individual machines need to be running Incus. The container side will stay running in LXD and INCUS but once you reboot machine they are dead. There should be a easy way to add and remove members from a cluster even from within the same machine, even while it is still running.

I may not recluster these machines or leave it at two, in a new cluster. Everything this happens it is days of down time for my sites while I come up with a way to fix them.

This time the servers overheat because cololocation facility HVAC died over holiday weekend, I didn’t realize it myself until I can back from holiday. Then I rebooted the three machines, this solved the supposed memory issues in one, but then the other failed to reboot. I haven’t had time to really fix it, probably going to need a new MB, Anyway, end result of the story, I was down for 5 days, definitely not high availability.

I agree. Or to put it another way: the node where a container is running should be the authoritative source of information about that container, and that node should be able to manage and update the state of that container without permission from any other nodes.

Replicating a view of the container onto the other nodes is nice, but could be done in an eventually-consistent way.

To move the container from node X to node Y, only those two nodes need to participate in the discussion.

The one problem remains that if node X dies (or becomes unreachable), and you want to restart the container safely on node Y. The current quorum mechanism ensures all remaining nodes agree on where the container is to be started - if there are enough of them.

However, the current clustering system doesn’t actually prevent split brain, because you have no way of knowing whether node X has turned off, or has just become isolated. If you wanted the current clustering system to be safe according to Raft principles, then as soon as any node loses quorum, it should shut down all its containers and VMs. But that doesn’t happen. (In the case of your 2-out-of-4 node cluster, everything should stop running immediately)

The alternative is simply to run incus non-clustered, and you get pretty much all of the above benefits. The incus API makes it straightforward to copy and move containers between nodes, even if they are not in the same cluster.

This is the correct explanation for any kind of shared nothing architecture. Loosing quorum should shutdown the remaining rest of the cluster. Working in this space for many years one best practice is to always setup an odd number of nodes. Spread your nodes across different racks or zones (you name it) to avoid total outage etc. This is more important for smaller cluster compared to bigger ones. Sometimes it just need an additional small arbiter node to keep things in order.

Usually it isn’t the softwares fault if things go south it is rather the missing bit of information or configuration. Don’t get me wrong it is what you learn over time. Shared nothing is great but required some special attention sometimes :wink: