Although I could place this directly as a github issue, I think a feature request such as this one deserves a broader discussion, so I prefer to start this discussion here and only post it there if approved by @stgraber .
I have mentioned before some of my desires on how to better control project limits in an incus cluster. It was recommended I do this with a placement scriptlet. I have done so and the scriptlet usage is great.
I run an incus cluster for students to run computational experiments and to run some services. Since experiments consume a lot of machine resources, I need to isolate them to prevent starvation. Currently, I create a project for each set of machines with the same resources. Then I can control machine resource isolation through a placement scriptlet controlled using project keys.
For example, in the following project I set that every instance can only be allocated to use vcores 0-5,8-13
which must be set, that different instances on a same cluster node may not use the same vcores, that the total amount of reserved RAM per cluster node is limited to 24GB, that each student must set a user.responsavel
key to set who is responsible for that instance and also that multiple experiments on the same machine cannot be run by different people. And through normal incus configuration, that it runs on the cluster group amd-5700g.
$ incus project show amd-5700g
config:
(...)
restricted: "true"
restricted.cluster.groups: amd-5700g
user.node.limits.cpu: 0-5,8-13
user.node.limits.cpu.unique: "true"
user.node.limits.memory: 24GB
user.node.represented: "true"
user.node.represented.unique: "true"
After this, by setting a best fit approach to machine allocation, multiple experiments can automatically assigned to the same machine if run by the same person using resource isolation.
I have to say this functionality is amazing. The only issue with the placement scriptlet is that these limits are only enforced during instance creation, and currently cannot be enforced, like incus does with restricted
project configuration keys. I wanted to discuss some possible improvements to this approach. This is mostly related to how these configuration keys can be ignored.
- When moving an instance between projects, the scriptlet is not called, so this configuration is ignored. I posted this as an issue which was recognized as a bug.
- When creating an instance by using the
--target
flag to a cluster node, the scriptlet is not called, so custom project limits are not checked. It must be noted the scriptlet is called when using--target
to a cluster group. I posted this as an issue here. - When changing instance of profile configuration keys, the limits set on the scriptlet are completely unchecked, so anything can be set or unset. It must be noted that
restricted.*
project keys are checked on instance or profile configuration key changes.
What I wanted to discuss was a solution to these two last items which are currently not enforced. But first of all, is there an simple way for them to be enforced? In this case, I actually think so. So the solution for the instance or profile key configuration change to be enforced can be done by calling the placement scriptlet instance_placement(request, candidate_members)
on each affected instance and setting the candidate_members
with a unique cluster node equal to its current node. If any affected instance fails, then the configuration key change should not be allowed. This same approach can be used when --target
is used to place an instance in a particular cluster node.
I think these changes are somewhat simple enough that they could be considered, and they would greatly improve incus’ fine tuned control for custom project limits. The main issue I can see for them to not be allowed into incus could be for people who are already using the scriptlet to not have scripts adapted for this workflow like, for example, if in the case of changing a configuration key, the instance being both in the cluster node already and being requested for inclusion, which can error out for double usage of resources. This can be solved by adding scriptlet configuration keys like instances.placement.scriptlet.profiles
with a default value of false
to make this check on profile key changes.
So my main question is, is this feature request something desirable for incus?