[LXD] Scriptlet based instance placement scheduler

Project LXD
Status Draft
Author(s) @tomp
Approver(s) @stgraber
Release LXD 5.x
Internal ID LXXXX

Abstract

Allow for users to provide a Starlark scriptlet that decides on cluster target at instance creation time.

This scriptlet would be provided with information about the new requested instance as well as a list of candidate cluster members (online and compatible with the request) and details about them and their existing instances to make the decision.

Rationale

This allows for custom logic to control instance placement, rather than the very basic placement logic that LXD currently has (cluster member with fewest instances).

Specification

Design

The instance placement scriptlet would be provided to LXD by way of a global configuration option called instances.placement_scriptlet. This would be stored in the global database and available to all cluster members.

When a request for a new instance comes to POST /1.0/instances without a target URL parameter specified the the instance placement scriptlet (if defined) would be executed on the cluster member that the request arrived at.

The scriptlet environment will be provided with the following items:

  • Instance create request.
  • Profiles due to be used.
  • Expanded instance config and devices after profiles applied.
  • List of candidate cluster members and their config (excluding offline members, incompatible architectures).
  • Ability to retrieve cluster member’s state metrics (including system load, storage pool state etc).
  • Ability to retrieve cluster member’s resources info.

API changes

A new server config key called instances.placement_scriptlet will be added along with an API extension called instances_placement_scriptlet.

CLI changes

No CLI changes are expected.

Database changes

No DB changes are expected.

Upgrade handling

As this is a new feature, no upgrade handling is required.

Further information

Additional information on the particular design decisions made in the document, alternative designs that may have been considered as well as links to additional information.

2 Likes

Would you ever consider allowing the script to make network requests? I can imagine a use case where I’d like to ask a cloud provider’s API for information about where my instance should be placed.

Starlark doesn’t support external communications, from what I can tell, I think because its designed for running embedded inside other applications and is designed to prevent scriptlets from blocking the system for unexpected amounts of time.

It does allow for the application to provide functions into it, so LXD will be providing some functions to access cluster and resource information.

I’m not quite following the use-case of querying an external system for where an instance should be placed? This feature is only for placing instances within existing cluster members, not for provisioning new cluster members (which would make sense to involve the hosting provider at that point).

Here is a scenario - perhaps niche - that comes to mind.

Let’s say I have an LXD cluster entirely within a single AWS VPC (in us-east-1), and I have 10 cluster members spread across 3 subnets. In AWS, a subnet is deployed withing a single “availability zone”. My subnets could be in AZs us-east-1a, us-east-1b, and us-east-1c.

Some AWS managed services, like RDS, require you to provision subnets across multiple availability zones to enable High Availability configurations. Traffic within a single AZ also has lower latency.

Currently, we have cluster member groups which could be assigned based on something like AZ/subnet, and presumably this information will be passed to our Starlark script. And honestly I love this feature as proposed, because it’s a gentle evolution of what exists.

I just want to point out that any scheduling algorithm that needs information from my business domain - e.g. not metrics LXD tracks, but my own customer database, my own service topology - is probably best served by a dedicated external service of my own design.


I am reminded of Envoy’s global rate limit service integration. The idea is you can configure a service that gets called on every request to determine whether rate limits are being hit.

The more I think about it, the more it seems like a Starlark script isn’t the place for this, even if the functionality could be exposed, but it could be implemented more like a webhook that gets called on POST /1.0/instances, and a response header or body can be used for placement information, or perhaps that response can be passed to Starlark, etc.

Yeah, we’ve been considered webhooks for this in the past though part of the issue with them was around what machine should call them, how to handle errors/timeouts/…

I think there’s still room to add webhooks to LXD in the future though. Maybe expose a function to the scriplet that’s specifically meant for doing a generic webhook call or something?

If your placement callback accepted a generic context-ish object where I could hang my own properties, as well as member info, maybe something like.

def get_placement_dynamic(client : HTTPClient, userinfo : Object, members : []Member) -> ID:
    u = userinfo
    result = client.do("GET", u.my_url, ca=u.my_tls.ca, timeout=10)
    return result.ID

def get_placement_static(userinfo : Object, members : []Member) -> ID:
   # only a pure function is allowed here
   return something

Of course, this all depends on what’s elegent in starlark.

FWIW, this proposal can (and probably should) be implemented with pure functions that don’t make network requests. But I wanted to flag this use-case, because perhaps your design can be extended later.

I agree. We do plan on providing the cluster members and their config to the starlark scriptlet.
This will include group membership and any failure domains they are a part of (which sounds like you could make use of that to ensure LXD spreads its own cluster roles out over the availability zones correctly).

Additionally almost all LXD entities (including cluster members) have support for custom user config fields (starting with user.*) and those will also be made available to the scriptlet.

So you could mark each cluster member as being part of a particular AZ or subnet (or anything you like really) and then use that for placement logic.