On cluster instance placement and project limits

This post is mostly a discussion on the current state of the scheduler script for instance placement in a cluster and some relation to project limits.

Currently, the default instance placement scheduler places an instance on the cluster node with fewest other instances, in all projects. This is not ideal in two scenarios, which both affect my current cluster.

  1. The first scenario is a situation in which there is one project which runs heavy workload instances while all other projects run low workload instances. Under this constraint, it is important the instances in the heavy workload project are allocated in the node with fewest other instances in that project only, independent of other projects, so that the heavy workload is distributed among the nodes. In the current situation, ona 3 node cluster, for example, if I repeat 3 times launching an instance on the heavy workload project and then launching two other instances on other projects, all 3 heavy workload instances will be allocated in the same cluster node, which is not desired.
  2. The second scenario is a project for running a HA service (say OVN, for example). Again, it’s important instances in this project are placed in different cluster nodes instead of having instance placement on other projects affect this.

There are a few ways to solve this problem.

  • Use --target for every instance. This works, but it’s cumbersome to the user to check each instance placement. Furthermore, an automatic solution is desirable here.
  • Provide an instance placement scriptlet. This might work, but is tricky for two reasons. First because I’m not really sure even how to implement the current scheduler with the provided tools. We can indeed get a lot of information on the cluster nodes which I found, but I could not get, for example, the number of other instances running on that node. Or where those instances are running to see if another instance on the same project is running on that node. The second reason this is tricky is that there is no scheduler per project, but only a global scheduler. If I plan to change the scheduler for one project, I must rewrite the scheduler for all projects. This is particularly not great when the default scheduler works just fine.
  • I once thought that project limits could solve this problem. If I could, for example, set a project limit of 20GB per cluster node and have the default instance profile set instances to use 20GB, then no two instances would be allowed on the same cluster node. Sadly I mentioned this before and this doesn’t work. This is also very odd since I wanted to set such limits on my cluster. I have 4 machines with 32GB each and I would like the heavy workload project to use up to 20GB per node. But the only way to set a project limit here is to set 80GB for all nodes. This doesn’t forbid different 20GB instances on the same node, for example, which would be really bad.

Currently, the only viable solution is to use --target but I think there should be a better solution provided by incus.

  • Project limits per node would work (this is probably necessary as total project limits in a cluster don’t really make a lot of sense if this is higher than what one node has).
  • Being able to change the default scheduler behavior per project to consider instances only in the same project would also work.
  • Having a way for the instance placement script to get a list of instances per node and their projects would also work.

In my view, all of these are desirable for incus, although I’m not sure if any of these would interest the developers. So this post is mostly to get this discussion out there.

While in some cases, balancing instances within a project makes sense, we have other cases where the exact opposite is desired and leads to better performance, so we’re not very likely to be adding extra logic to the default scheduler for that.

In general, the plan is definitely to more heavily rely on the scriptlet based scheduler for any case that needs some amount of customization. We’d love to ship some more examples of such scriptlets and are definitely open to exporting more function and data where it makes sense.

Does the scriptlet have access to all instances? Or at least all instances in the same project? I did not find this. If so, it can be done right now, it was just bad research on my part.

We don’t seem to have functions for that, but it’d be pretty easy to add to internal/server/scriptlet/instance_placement.go a get_instances function which would then take optional project and/or location kwargs. That’d then get you an equivalent of []Instance.

If that’d sort things out for you, feel free to file a feature request at Issues · lxc/incus · GitHub

Added that request on the this issue. Also asked for a function to get cluster members inside a cluster group, which currently is not available.

While checking the documentation, also found a small issue which could be fixed and filed that here.