LXD Fault tolerance

They won’t keep running but you also won’t lose any data.
They’ll go into ERROR state as the machine they were running on is now dead.

From there, you can do lxc move NAME --target OTHER-MACHINE and then lxc start NAME to get them back online.

We’re working on automating that part so that you can configure a threshold after which LXD will consider a machine to be “dead enough” to move its workloads elsewhere.

In all cases, this will mean the instance will be restarted as there’s no way to have the CPU state itself be kept in sync between multiple systems to allow for seamless recovery in this kind of situation.

1 Like