[Feature request] Better instance placement scriptlet control for enforcing custom limits

Although I could place this directly as a github issue, I think a feature request such as this one deserves a broader discussion, so I prefer to start this discussion here and only post it there if approved by @stgraber .

I have mentioned before some of my desires on how to better control project limits in an incus cluster. It was recommended I do this with a placement scriptlet. I have done so and the scriptlet usage is great.

I run an incus cluster for students to run computational experiments and to run some services. Since experiments consume a lot of machine resources, I need to isolate them to prevent starvation. Currently, I create a project for each set of machines with the same resources. Then I can control machine resource isolation through a placement scriptlet controlled using project keys.

For example, in the following project I set that every instance can only be allocated to use vcores 0-5,8-13 which must be set, that different instances on a same cluster node may not use the same vcores, that the total amount of reserved RAM per cluster node is limited to 24GB, that each student must set a user.responsavel key to set who is responsible for that instance and also that multiple experiments on the same machine cannot be run by different people. And through normal incus configuration, that it runs on the cluster group amd-5700g.

$ incus project show amd-5700g
config:
(...)
  restricted: "true"
  restricted.cluster.groups: amd-5700g
  user.node.limits.cpu: 0-5,8-13
  user.node.limits.cpu.unique: "true"
  user.node.limits.memory: 24GB
  user.node.represented: "true"
  user.node.represented.unique: "true"

After this, by setting a best fit approach to machine allocation, multiple experiments can automatically assigned to the same machine if run by the same person using resource isolation.

I have to say this functionality is amazing. The only issue with the placement scriptlet is that these limits are only enforced during instance creation, and currently cannot be enforced, like incus does with restricted project configuration keys. I wanted to discuss some possible improvements to this approach. This is mostly related to how these configuration keys can be ignored.

  • When moving an instance between projects, the scriptlet is not called, so this configuration is ignored. I posted this as an issue which was recognized as a bug.
  • When creating an instance by using the --target flag to a cluster node, the scriptlet is not called, so custom project limits are not checked. It must be noted the scriptlet is called when using --target to a cluster group. I posted this as an issue here.
  • When changing instance of profile configuration keys, the limits set on the scriptlet are completely unchecked, so anything can be set or unset. It must be noted that restricted.* project keys are checked on instance or profile configuration key changes.

What I wanted to discuss was a solution to these two last items which are currently not enforced. But first of all, is there an simple way for them to be enforced? In this case, I actually think so. So the solution for the instance or profile key configuration change to be enforced can be done by calling the placement scriptlet instance_placement(request, candidate_members) on each affected instance and setting the candidate_members with a unique cluster node equal to its current node. If any affected instance fails, then the configuration key change should not be allowed. This same approach can be used when --target is used to place an instance in a particular cluster node.

I think these changes are somewhat simple enough that they could be considered, and they would greatly improve incus’ fine tuned control for custom project limits. The main issue I can see for them to not be allowed into incus could be for people who are already using the scriptlet to not have scripts adapted for this workflow like, for example, if in the case of changing a configuration key, the instance being both in the cluster node already and being requested for inclusion, which can error out for double usage of resources. This can be solved by adding scriptlet configuration keys like instances.placement.scriptlet.profiles with a default value of false to make this check on profile key changes.

So my main question is, is this feature request something desirable for incus?

I think having the placement scriptlet be called at all placement decisions makes sense, so that means the cross-project move case and the --target case.

Now having it be called when instance configuration changes, either directly or through profiles, that feels out of scope for the current scriptlet as there is no placement decision to be made here.

It would be more of a job for an instances.update.scriptlet but I’m not yet convinced that this is a good idea to introduce that. Certainly it would help handle your case here but there are a LOT of places where an instance config can change (either directly or through profile) and so having to properly handle them all, some of which should then go through the scriptlet, some shouldn’t, may make things significantly more complex.

So for now I think we’ll want to focus on sorting out the rough edges of the current placement scriptlet, basically:

  • Ensure that it’s called at all placement decisions
  • Make it easier to return a clear error to users

For a while, I was convinced not to talk about the issue of changing instance configuration if it is enforced through the placement scriptlet. But when I was looking at the scriptlet usage with the --target command, I started to think about this and decided a discussion was needed. And for having thought about it for a while, I think an instances.update.scriptlet is not a good idea.

To see why it might not be a good idea, take my use case, for example. I need a CPU string parser, a memory string parser, a cpu vcore counter (for the best fit allocation), a set of error messages and some extra logic to check the custom limits are in order. If an update scriptlet was available, then the only difference would be that I would not need an allocator, but all the parsers are needed, all the parser error messages are the same and the extra logic is precisely the same. If there was an update scriptlet, this would be essentially doubling the same code, except for the scheduling part. Doubling the code could lead to problems when maintaining the code. Either have to update it twice, if two code bases are used, or have to update the same code in two configuration variables, which increases the chance of human error.

The other thing which can be looked at is that the placement scriptet is in some sense already flexible enough to be an update scriptlet. Particularly in the request object there is a reason field which can be one of new, evacuation or relocation. I think it would be bad to shoehorn in a usage for the placement scriptlet in which it was not intended, but the reason field solves this problem. We just need a reason which might be config_update or maybe even project_update or profile_update, although I think just config_update should be enough.

Since there are new reasons which were not there before, note that this change of usage should not be automatic, but could be controlled with incus config keys. Say something like instance.placement.scriptlet.check_on_instance_config_change and instance.placement.scriptlet.check_on_project_config_change to have it called when an instance configuration or project configuration change occurs, have the placement scriptlet be called on all instances targeted for the same node it currently resides. Maybe have another key to decide the behavior on a failure to either stop the instance or to deny the configuration change if any instance fails.

Just as a side note, I don’t think my use case is super specific. Current incus project limits seem like they were thought for non-clusers usage only. Most of the project limits make no sense for clusters as it is currently verified (total and not per machine). Those limits are super nice to have on projects, but just don’t work on a cluster without a placement scriptlet to say it’s per node and not the sum on all nodes. It would seem to be nice to enforce custom limits but now it’s just not possible.

Btw, this discussion is mostly to see if this makes sense as a feature request. I wouldn’t even want to add the feature request on github if it was not properly discussed previously since I can clearly see there might be issues and adaptations involved in this request.