The API will be a operation (async) and should include state updates (UpdateMetadata) to indicate what instance is being evacuated (Stopping instance XYZ in project XYZ, Migrating instance XYZ in project XYZ to XYZ, Starting instance XYZ in project XYZ).
This will then show up in the CLI when you run lxc cluster evacuate
Marked this as implemented. The code was merged today and I just sent an additional PR to improve documentation coverage for it. Itâll be available to users in LXD 4.17.
Iâve been waiting for something like a âmaintenance modeâ for a long time, and Iâm excited to see it happening!
Can you give some technical details surrounding the requirements for this?
Iâm wondering if this is some sort of backend lxc move where the container has to stop, move, and then start. Or if itâs a live migration where I need to figure out CRIU and Ceph ahead of time for each node in my cluster.
Currently there are 3 options (for cluster.evacuate):
auto (will migrate unless the instance relies on devices which canât be moved)
migrate (forces a migration but may fail if devices are incompatible with target)
stop (just stops the instance, wonât attempt to move)
Currently all our migrate are cold migration, so the evacuation will perform a clean shutdown of the instance (up to timeout from boot.host_shutdown_timeout), then move it to a new cluster member and then start it back up. Stopped instances are also moved away.
If youâre on ceph, the downtime is much shorter as no data needs to be transferred over the network. For those instances that arenât on ceph, then theyâll get migrated the same way lxc move would do it.
In our case, the intent is to add live-migrate once we have it working with VMs and have auto default tolive-migrate for any instance matching the required config, then migrate for those that donât and finally stop for those that canât be migrated.
That way you should always get the best possible behavior.