Project | LXD |
Status | Implemented |
Author(s) | @monstermunchkin |
Approver(s) | @stgraber |
Release | 4.17 |
Internal ID | LX002 |
Abstract
Cluster member evacuation allows the temporary move of instances and storage volumes from one cluster member to another. This allows for maintenance on cluster members.
Rationale
It’s pretty common for system administrators to have to take a node temporarily offline, either to reconfigure storage/network or to reboot it for a kernel update.
Currently the only way to do this is by manually moving or stopping instances and then shutting down LXD. This is a pretty manual process which can easily go wrong due to user error.
The goal of this work is to have LXD handle this internally, move or stop the instances as needed and when moving them, automatically picking a suitable alternative cluster member for the duration of the evacuation.
Specification
Design
The user calls lxc cluster evacuate <name>,
, where <name>
is the cluster member that needs to be taken offline. To prevent accidental evacuation, the user will need to confirm the action (unless --force
is passed).
The cluster member state is then set to EVACUATED
, preventing any new instances from being created on it.
Instances can be configured through cluster.evacuate
with one of auto
(default), stop
or migrate
. In auto
mode, a migration will be done if the instance doesn’t have other local dependencies, otherwise it will be stopped.
When migrating, instances are then moved to a different randomly chosen cluster member, setting the volatile.evacuate.origin
to the origin cluster member. The rest of the instances are cleanly shutdown (similar to logic used on LXD shutdown).
Once LXD is done and shut down, the user can perform maintenance.
To restore the evacuated cluster member, the user needs to call lxc cluster restore <name>
. The previously moved instances will then be restored, and the cluster member’s state is switched back to ONLINE
.
API changes
The following API endpoint will be added:
POST /1.0/cluster/members/<name>/state
It takes the following JSON input:
{
"action": "evacuate|restore",
}
CLI changes
The following command will be added:
lxc cluster evacuate <member>
(long version)lxc cluster evac <member>
(short version)lxc cluster restore <member>
The evacuate
sub-command is used to evacuate a cluster member. The user will need to confirm this action (to prevent accidental evacuation) or provide the --force
flag.
The restore
sub-command doesn’t have any specific flags.
Database changes
No database changes.
Upgrade handling
No upgrade handling.
Further Information
No further information.