How to implement "self healing" for stateful containers on LXD clusters?

Yosu_Cadilla · August 27, 2019, 1:54pm

Understood @turtle0x1 , anyway, let me reply back with what I had in mind…

1,2. How do you know which snapshot is good / bad ?
Latest is the good one unless it fails, if so, we load the previous one and so on (let’s say after 5 attempts the process stops and sysadmin is notified) (more complex options below, based on your suggestions).

The data loss from N minutes of change surely matters (in a stateful container) ?
It certainly does, but I don’t think this can be prevented with LXD, self-healing or not. HAProxy should be taking care of service continuity and DB replication should take care of the data integrity, I just want to restore the “service” to it’s intended level of performance, Kubernetes style, where you define a # of containers (pods) and K8S will keep an eye on these and make sure to bring up new containers if required. (Maybe I should check how their algorithm defines healthy and unhealthy)

4.How do you define, group & monitor redeployment of containers ?
The idea is to have a maximum of one container per node for each “service” so each failing container should be re-created in the same node or some other node without an instance of that particular service.

Let’s say we have 5 nodes, also assume we host 3 Apache containers per domain and 3 DB containers too, each DB container is “tied” to one particular Apache container (preferably DB container is in the same host as it’s corresponding Apache container).
(or maybe both Apache and Mysql should both run within the same container: LXD Performance: Stack per container VS service per container).
When container dom1.com dies, a new one should be deployed, either on the same node or probably even better in a node which doesn’t yet have an instance of dom1.com
Afterward (not urgent, maybe a different script/process) we check if the DB container is in the same host as its corresponding HTTP container and if not we move it.

NOTE: Now that I lay that structure out, I think it might make sense, in some situations, to consider cloning one of the other 2 running containers for dom1.com instead of the latest snapshot…

On the DB side, if a container fails, we launch a DB container without data on it and then ask one of the other 2 containers to replicate the data.

What if you lose a whole cluster member.
Yes, that is a huge concern indeed, that’s why a minimum of 3 (but 5 if possible) nodes would be required.

Could the container not be re-created from an known “good” state ?
Taking note of the cloud-init possibility, I guess if we could somehow know what the issue is that caused the disruption, we would be able to determine the best course of action (snapshot vs clone active vs cloud-init)

HAProxy “checks” (or similar, maybe just an HTTP request that calls an “am I healthy?” script on the HTTP container, or a simple query for the DB containers) + some DB to store that data?

The problems with stateful containers (like MySQL) can be easily be removed by setting the data directory to a shared disk.
Understood, do you know a good guide on how to do that?

Thanks a lot for your detailed answer @turtle0x1!

PD: What about having “dormant” (stopped) copies of a container in the nodes where a given service is not currently running for even shorter MTTR.
If dom1.com is running on nodes 1-3-5, we keep stopped clones of that container on nodes 2 & 4 which are (supossed to be) containing a healthy instance of the service.