We haven’t run the full test in a while but have been optimizing a lot of the cluster logic recently. There’s still one big piece to fixing past scalability issues which is the eventhub role that @tomp has been working on. We should have that in LXD 4.23 (mid-February).
Until then, the largest production cluster I’m running stands at 52 hosts and has been behaving pretty well.
With very large clusters, spawn time shouldn’t really increase when machines aren’t themselves particularly busy. I’d expect the biggest issue to become dealing with updates as LXD requires all machines to be on the exact same version. When dealing with hundreds of systems, it may take a while for all of them to detect that they need to pull an update and to reload, causing a significant downtime of the API (the instances themselves will be fine though).