How many nodes can an Incus cluster support?

Mike · July 25, 2024, 8:17pm

Hi there,
I’m looking for a production capable LXC orchestrator that can cluster up to hundreds of nodes.

And Incus seems like it checks all the boxes, but I’m not sure how many nodes it can support up to in a cluster.

In the Incus 6.0 LTS announcement it said “cluster up to 50 servers together”, but in the introduction page for incus, under “Scalability” it said “from containers on your laptop to clusters of thousands of compute nodes”

So, just wondering which statement is true?

Thanks!

stgraber · July 25, 2024, 9:17pm

The former is more true than the latter as extremely large environments (100+ servers) tend to be split across multiple clusters instead of running a single extremely large cluster.

The recommended max cluster size is 50 but that’s an arbitrary number which is mostly used to have people think of maybe doing a cluster per rack when in a physical environment. In practice we’ve had clusters of 100-150 servers run just fine and there’s no hardcoded limit which would prevent us from scaling to larger sizes too.

The main issue with very large cluster is the control plane blast radius. That is, if something fails and jams the control plane (API), then you have a lot of your infrastructure down. If you split things into chunks of 50 servers, then something going very wrong isn’t going to take everything down.

The other aspect of this are upgrades. Up until recently (last week), any API addition required all servers to be upgraded concurrently before the API would come back online. Doing this across 100+ servers gets annoying pretty quickly, not to mention the elevated risk of one of those servers running into some other issue.

But this has now been tweaked to only be required when a database schema change occurs, this basically makes LTS releases safe in that regard as we don’t change the schema within an LTS release.