[LXD] Cluster server grouping

Project LXD
Status Implemented
Author(s) @monstermunchkin
Approver(s) @stgraber
Release 4.21
Internal ID LX010

Abstract

This adds support for grouping cluster members.

Rationale

LXD currently doesn’t support grouping cluster members. Adding this feature will allow targeting groups instead of single members. This will also add project restrictions for groups and members.

Specification

Design

Initially, there will be an undeletable default group which all members will belong to. Cluster groups can be created using lxc cluster group create. There will also be delete, edit, and show commands for groups .

Cluster members can change their groups by editing their groups config option. They always need to be in at least one group.

The existing scheduler.instance config option will gain a new group value. This allows targeting of the server when either selected directly or selected by one of its groups. If the value is all, targeting using the @ prefix will work as well.

When using a group as a target, it needs to be prefixed with @. Example:

lxc launch images:ubuntu/20.04 c1 --target=@foobar

This will create a new container on one of the cluster members in the foobar group.

Since group targeting will use the @ prefix, cluster member names will no longer be able to start with @.

There will also be a new project restriction restricted.cluster.groups which when set will restrict the user to targeting those group of cluster members (either by group name or cluster member name that’s part of those groups).

API changes

Create and edit a cluster group

POST /1.0/cluster/groups
POST /1.0/cluster/groups/<name>
PUT /1.0/cluster/groups/<name>
PATCH /1.0/cluster/groups/<name>

Using the following new API structures respectively:

type ClusterGroupsPost struct {
	// The name of the cluster group
	// Example: group1
	GroupName string `json:"group_name" yaml:"group_name"`
}

type ClusterGroupPost struct {
	// The new name of the cluster group
	// Example: group1
	GroupName string `json:"group_name" yaml:"group_name"`
}

type ClusterGroupPut struct {
	// The description of the cluster group
	// Example: amd64 servers
	Description string `json:"description" yaml:"description"`

	// List of members in this group
	// Example: ["node1", "node3"]
	Members []string `json:"members" yaml:"members"`
}

Delete a cluster group

DELETE /1.0/cluster/groups/<name>

List cluster groups

GET /1.0/cluster/groups
GET /1.0/cluster/groups/<name>

Returns a list or single record (respectively) of this new ClusterGroup structure:

type ClusterGroup struct {
	ClusterGroupPut  `yaml:",inline"`
	ClusterGroupPost `yaml:",inline"`
}

If GET /1.0/cluster/groups is used with recursion=1, the entire records are listed. If recursion=0 (default), only the cluster group names are listed.

Other

The ClusterMemberPut struct will gain a new Groups fields:

type ClusterMemberPut struct {
    // ...
  
    // List of cluster groups this member belongs to
    // Example: ["group1", "group2"]
    Groups []string `json:"groups" yaml:"groups"`
}

CLI changes

The CLI will gain the following new commands:

  • Create new group:
    lxc cluster group create <group_name>
  • Delete existing group:
    lxc cluster group delete <group_name>
  • Show information on group:
    lxc cluster group show <group_name>
  • Change group fields and members:
    lxc cluster group edit <group_name>
  • Assign member to group:
    lxc cluster group assign <member_name> <group_name>
  • Remove member from group:
    lxc cluster group remove <member_name> <group_name>
  • Rename group:
    lxc cluster group rename <old_name> <new_name>
  • List groups:
    lxc cluster group list

Database changes

There will be two new tables added called cluster_groups and nodes_cluster_groups.

CREATE TABLE "cluster_groups" (
	id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    name TEXT NOT NULL,
    description TEXT,
    UNIQUE (name),
);

CREATE TABLE nodes_cluster_groups (
    node_id INTEGER NOT NULL,
    group_id INTEGER NOT NULL,
    FOREIGN KEY (node_id) REFERENCES nodes (id) ON DELETE CASCADE,
	FOREIGN KEY (group_id) REFERENCES cluster_groups (id) ON DELETE CASCADE,
    UNIQUE (node_id, group_id)
);

Upgrade handling

For clusters, all existing cluster members will be placed in the default cluster group.

Further information

2 Likes

Using --target @foobar will always work so long as the project is allowed to use the group. The interaction with scheduler.instance is:

  • scheduler.instance=all will receive new instances when either --target @foobar is passed or no target at all is passed
  • scheduler.instance=manual will only receive new instances if --target member-name is used. Targeting one of its groups will not cause it to take new instances.
  • scheduler.instance=group will make the member take new instances when either --target member-name is used or --target @foobar is used.

The idea is that you could make a group called gpu, move the servers that have GPUs into that group and out of the box, you’ll be able to --target @gpu to get a suitable server with those servers also getting some non-GPU workloads on top of that.
Then if you don’t want to “waste” resources on those machines with non-GPU workloads, then you set scheduler.instance=group on all of them and now they’ll only take workloads which are sent to @gpu

Look wrong, should be POST/GET on /1.0/cluster/groups and GET/PUT/PATCH/POST/DELETE on /1.0/cluster/groups/NAME

For edit, I guess that’s primarily so we can edit the group description, but maybe we should also have the API get us the list of all members in the group and allow changing that list by editing the group directly.

We also need a list in there I think.

For that matter, I think we should put a lxc cluster group assign which works like lxc profile assign so this can be more easily scripted.

I think we should also show the groups in lxc cluster list.

Looks good to me, approved.