[LXD] Cluster member evacuation

Project LXD
Status Implemented
Author(s) @monstermunchkin
Approver(s) @stgraber
Release 4.17
Internal ID LX002

Abstract

Cluster member evacuation allows the temporary move of instances and storage volumes from one cluster member to another. This allows for maintenance on cluster members.

Rationale

It’s pretty common for system administrators to have to take a node temporarily offline, either to reconfigure storage/network or to reboot it for a kernel update.

Currently the only way to do this is by manually moving or stopping instances and then shutting down LXD. This is a pretty manual process which can easily go wrong due to user error.

The goal of this work is to have LXD handle this internally, move or stop the instances as needed and when moving them, automatically picking a suitable alternative cluster member for the duration of the evacuation.

Specification

Design

The user calls lxc cluster evacuate <name>,, where <name> is the cluster member that needs to be taken offline. To prevent accidental evacuation, the user will need to confirm the action (unless --force is passed).

The cluster member state is then set to EVACUATED, preventing any new instances from being created on it.

Instances can be configured through cluster.evacuate with one of auto (default), stop or migrate. In auto mode, a migration will be done if the instance doesn’t have other local dependencies, otherwise it will be stopped.

When migrating, instances are then moved to a different randomly chosen cluster member, setting the volatile.evacuate.origin to the origin cluster member. The rest of the instances are cleanly shutdown (similar to logic used on LXD shutdown).

Once LXD is done and shut down, the user can perform maintenance.

To restore the evacuated cluster member, the user needs to call lxc cluster restore <name>. The previously moved instances will then be restored, and the cluster member’s state is switched back to ONLINE.

API changes

The following API endpoint will be added:

  • POST /1.0/cluster/members/<name>/state

It takes the following JSON input:

{
  "action": "evacuate|restore",
}

CLI changes

The following command will be added:

  • lxc cluster evacuate <member> (long version)
  • lxc cluster evac <member> (short version)
  • lxc cluster restore <member>

The evacuate sub-command is used to evacuate a cluster member. The user will need to confirm this action (to prevent accidental evacuation) or provide the --force flag.
The restore sub-command doesn’t have any specific flags.

Database changes

No database changes.

Upgrade handling

No upgrade handling.

Further Information

No further information.

Please could you add a struct explanation as to what would be accepted by this API endpoint (if any)?

Can you expand a little on how this would work? Would there be a separate go routine waiting for the member to come back online, or would it be initiated by the member that left when it rejoins (via a local DB entry)? If initiated by another cluster member, which one would this be (to avoid multiple members attempting the move)?

Would the moved instances have recorded where they came from in their config somehow?

Not really my place but I have also been wondering how you would identify these instances, even a key in user-data would probably be a world of help so other interfaces could identify these instances as “risky to use” as they could be migrated back at any point (which i assume will interrupt any running processes / commands)

2 Likes

I suspect we’ll use a volatile.evacuate.origin key or something similar to record where the instance was so it can be moved back when the cluster member comes back up.

The way I see it, it should look something like this:

  • [user] lxc cluster evacuate NAME
  • [lxd] sets cluster member state to EVACUATED (prevents any new instance from being created on it)
  • [lxd] moves instances that aren’t pinned to other cluster members, records origin in volatile key
  • [lxd] stops the rest
  • [user] performs maintenance (update, reboot, …)
  • [lxd] when LXD starts back up, notices it was in EVACUATED mode, flips back to ONLINE, starts local instances back up and starts moving the rest back

There are a few things we need for this:

  • Config option to mark an instance as pinned (stopped instead of moved)
  • Volatile key to record where the instance originally was
  • New DB cluster member state
  • API to trigger it
  • CLI

One thing I’m wondering about now is whether we should make the restoration user driven. That is, LXD won’t automatically exit EVACUATED state but instead would need a lxc cluster restore (don’t really like the name as that’s usually related to snapshots) to move back the instances and allow new ones to be created.

So maybe something like:

  • lxc cluster evacuate NAME
  • lxc cluster evacuate NAME --restore
  • lxc cluster evacuate NAME --restore --no-move

Better suggestions would be most welcome there :slight_smile:

But from an API point of view, we need to be able to flip to evacuate, flip back to online and have a way to flip without causing instance movement.

So maybe /1.0/cluster/members/<NAME>/state with something similar to /1.0/instances/<NAME>/state where we can pass an action (evacuate/restore) and additional config to control whether we want to move instances or not?

Should that be “[lxd] moves instances that aren’t pinned to other members, records origin in volatile key”?

Indeed, fixed :slight_smile:

1 Like

+1 from me for keeping it all under one command, so its easy to see the full lifecycle and options from --help flag.

As the action would potentially be quite disruptive and certainly don’t want to run it accidentally, how about:

    lxc cluster maintain NAME --evacuate
    lxc cluster maintain NAME --restore
    lxc cluster maintain NAME --restore --no-move

I don’t know, it’s a bit different from anything we’ve done before.

I think I’d prefer to have lxc cluster evacuate prompt the user to confirm unless a --force is passed rather than need to use a -- option to perform the default action.

OK yeah if asks for confirmation by default then that’s fine.

So effectively you do “evacuation” followed by “evacuation restore”. Makes sense.

Some more ideas for you

$ lxc cluster maintenance evacuate NAME
$ lxc cluster maintenance admit NAME

OR

$ lxc cluster evacuate start NAME
$ lxc cluster evacuate finish NAME

Instead of failing, we could ask the user whether it’d be ok if those instances are stopped instead of throwing an error. What do you think?

@stgraber @tomp any suggestions regarding the naming of this config option? I’m not good at making up names :smiley:

Perhaps evacuation.pinned?

Doing things interactively is always tricky with the API.
If we do an early check when we get the request and fail before we’ve changed status or have stopped/moved anything, then the user can easily go set the config key on the instances that shouldn’t be migrated and then try again.

Maybe cluster.evacuate taking one of:

  • auto (default, will migrate what it can, stop the rest)
  • stop (always stop the instance)
  • migrate (always migrate the instance)

So this would change my earlier comment as the default of auto would mean that nothing would fail if an instance can’t be migrated off, it would just be stopped. Unless it was directly configured to migrate in which case a validation error would be thrown during evacuation if the instance cannot be migrated.

I actually expect an additional live-migrate value to be added soon enough to the list with auto then using live-migrate when possible, then standard migrate, then stop.

1 Like

Yeah, I’m not super fond of this as a sub-command in the CLI usually implies a separate API object which isn’t the case here.

So I’m pretty sure we want lxc cluster evacuate as that’s an action against a cluster member. Now we can either do lxc cluster evacuate --restore, lxc cluster evacuate --complete, … or we can have a separate command.

Overall, I like the idea of:

  • lxc cluster evacuate NAME
  • lxc cluster restore NAME

But I don’t know if that’s going to sound too confusing because of lxc restore and lxc storage volume restore which instead refer to snapshots…

Looking at a synonyms dictionary, we could use lxc cluster reinstate but that one is quite typo prone for me, though I guess I could get used to it :wink:

  • lxc cluster evacuate node1 | lxc cluster evacuate --restore node1
  • lxc cluster evacuate node1 | lxc cluster evacuate --complete node1
  • lxc cluster evacuate node1 | lxc cluster restore node1
  • lxc cluster evacuate node1 | lxc cluster reinstate node1
0 voters

I think lxc cluster restore node1 is the most intuitive.

1 Like