Lxd clustering, ceph storage and live migration

geodb27 · April 17, 2019, 7:47am

Hi !
I’ve setup a lxd cluster with three nodes. Aside this, for container storage, I’ve setup a complete ceph cluster and of course defined this on my lxd cluster.
Here is the output of lxc storage list :

lxdpool ceph created 4

And the output of lxc storage show lxdpool :

config:
ceph.cluster_name: ceph
ceph.osd.pg_num: “40”
ceph.osd.pool_name: cephfs
ceph.user.name: admin
volatile.pool.pristine: “true”
description: “”
name: lxdpool
driver: ceph
used_by:

/1.0/containers/lxc-drupal3

/1.0/containers/lxc-drupal4

/1.0/containers/lxc-drupal5

/1.0/images/663f6663aed66a22dd708c4b07514748221522b810008c55002fcc1dd81af377
status: Created
locations:

clusterlxd1

clusterlxd3

clusterlxd2
Now, here is the output of lxc list :
| lxc-drupal3 | RUNNING | 192.168.220.3 (eth0) | fd42:bdae:88e5:5d3c:216:3eff:fedf:966a (eth0) | PERSISTENT | | clusterlxd2 |
±------------±--------±---------------------±----------------------------------------------±-----------±----------±------------+
| lxc-drupal4 | RUNNING | 192.168.220.4 (eth0) | fd42:bdae:88e5:5d3c:216:3eff:fe81:6a85 (eth0) | PERSISTENT | | clusterlxd1 |
±------------±--------±---------------------±----------------------------------------------±-----------±----------±------------+
| lxc-drupal5 | RUNNING | 192.168.220.5 (eth0) | fd42:bdae:88e5:5d3c:216:3eff:fe52:5a0a (eth0) | PERSISTENT | | clusterlxd3 |
±------------±--------±---------------------±----------------------------------------------±-----------±----------±------------+

Now, I expected the lxd cluster to restart a container “automatically” on a live node of the cluster if one goes down.
For example, if I stop the lxd daemon on the host named clusterlxd3 in my setup, I expected the lxc-drupal5 to be started either on host clusterlxd1 or on host clusterlxd2.
Yet, it doesn’t work at all.
More on this, as far as the host clusterlxd3 is down in a way or another, there is no way to start my container, nor to move it on another node of the cluster.
I’ve searched a lot before posting here and I found this :

in wich stgraber stated in his first comment :

The exception to this is if you’re using CEPH as your storage backend, in that case, since your storage is over the network and not tied to any of the nodes, you will be able to move a container from one node to another and restart it there even when the source node has gone offline.

So, my question is : what did I do wrong or what is missing in my setup ?

freeekanayaka · April 17, 2019, 8:33am

Regarding automatic re-scheduling of containers, that’s beyond the scope of LXD clustering at the moment. The best way to think of LXD clustering is as a vSphere replacement using system containers instead of VMs, not as an application container orchestration manager (like k8s).

What error are you seeing when trying to move your container? What you describe must definitely work.

freeekanayaka · April 17, 2019, 8:38am

Here you find some examples of how moving containers between nodes works:

github.com

lxc/lxd/blob/f084b59bfea5266251d95875e41d443ba039919b/test/suites/clustering.sh#L457




# Update the storage pool
if [ "${driver}" = "dir" ]; then
  LXD_DIR="${LXD_ONE_DIR}" lxc storage set pool1 rsync.bwlimit 10
  LXD_DIR="${LXD_TWO_DIR}" lxc storage show pool1 | grep rsync.bwlimit | grep -q 10
  LXD_DIR="${LXD_TWO_DIR}" lxc storage unset pool1 rsync.bwlimit
  ! LXD_DIR="${LXD_ONE_DIR}" lxc storage show pool1 | grep -q rsync.bwlimit || false
fi


if [ "${driver}" = "ceph" ]; then
  # Test migration of ceph-based containers
  LXD_DIR="${LXD_TWO_DIR}" ensure_import_testimage
  LXD_DIR="${LXD_ONE_DIR}" lxc launch --target node2 -s pool1 testimage foo


  # The container can't be moved if it's running
  ! LXD_DIR="${LXD_TWO_DIR}" lxc move foo --target node1 || false


  # Stop the container and create a snapshot
  LXD_DIR="${LXD_ONE_DIR}" lxc stop foo --force
  LXD_DIR="${LXD_ONE_DIR}" lxc snapshot foo backup

Basically the rules are:

You can move the container cheaply (no data copy) between two nodes if the container is stopped.
You can move the container cheaply (no data copy) between two nodes if the current node it’s running on is detected as offline.

geodb27 · April 17, 2019, 9:53am

Hi @freeekanayaka and thanks a lot for your answers and advises.
I took my time to reply, because I wanted to check something before.
So, here is what went wrong after investigations :
I had started by creating the lxd cluster. For this, on each node I had setup a zfs pool (local, so) with lxd init.
Then after did I create a ceph storage pool for the cluster.
I think that that’s what made things wrong. So, I’ve destroyed the whole cluster and recreated it, by specifying the ceph remote storage driver to lxd init.
Things looks far better.
I can now do :
clusterlxd3 $ snap stop lxd
clusterlxd1 $ lxc move lxd-drupal5 --target clusterlxd2
clusterlxd1 $ lxc start lxc-drupal5
All this work some kind of good.
There is just one thing left that annoys me :
before the snap stop lxd on clusterlxd3, I had setup the ip adress of the container this way :
clusterlxd2 $ lxc network attach lxdbr0 lxc-drupal5 eth0 && lxc config device set lxc-drupal5 eth0 ipv4.address 192.168.220.5
However, after all the above commands to move the container to clusterlxd2, the container starts fine but don’t get it’s configured address.
I expected the config to be on the ceph filesystem (or in the shared db), so that if would stay between the cluster’s host, but it doesn’t seem so.
I continue to investigate…

geodb27 · April 17, 2019, 9:55am

Complement to my previous post…
These commands don’t work at all :
clusterlxd3 $ snap start lxd
clusterlxd1 $ lxc move lxc-drupal5 --target clusterlxd3
Error: Migration API failure: Container is running
With ceph default and only storage backend, I expected this to work…

freeekanayaka · April 17, 2019, 10:24am

Please re-read my first reply. You can move a container from one node to another only if:

The container is stopped.

OR

The node where the container is running is detected as offline.

Regarding the network configuration issue, I would expect that to work too. What network configuration are you seeing in the container after moving it and starting it? @stgraber might help here.

geodb27 · April 17, 2019, 11:10am

I had read your first answer carefully, and well, I might have misunderstood the second part. I supposed (but I can see now that I was wrong) that by stopping the lxd daemon on the host, it would be detected as offline.
Maybe that it is not the case and offline means the machine completely down.
Anyway, thanks for your kind help !

freeekanayaka · April 17, 2019, 11:16am

Stopping the daemon is enough. It takes a few seconds to the detection to happen. You can use:

lxc cluster list

to see what nodes are online and what have been detected as offline.

geodb27 · April 18, 2019, 12:41pm

Is there any solution existing yet to do what I wanted to wrongly achieve with my lxd cluster ? I mean the container orchestration manager you talked about ?

CyrusTheVirusG · May 5, 2019, 6:23am

I don’t think so, I always make sure that I have redundant containers running on all hosts and haproxy directing traffic . That way if a container goes down I don’t need to worry about moving it and just wait for the host to come back up.

Additionally, I’m using ucarp to provide virtual IP addresses for applications like haproxy that need to share a single IP address and failover to a node that’s alive if the host goes down.

Imagine a scenario where you have a large redis dataset and LXD automatically moves it to another host that is alive, while also hosting a redundant container with that same large dataset? You might run that host out of RAM.

I would take care not to introduce such a scenario into your environment.

sophware · June 10, 2024, 3:47am

Even in 2019, vSphere supported the VM equivalent of what this poster was asking about.

In 2024 does LXD (or Incus) support automatically restarting a replacement for a container when a host fails?

EDIT: I see that live migration is not really functional (check the video for Incus 6.1 release). This probably means “no” for automatic recovery/ restart of a container on a different host.

stgraber · June 11, 2024, 3:12am

You don’t need live migration to handle a host failure, in fact this is a case where you specifically can never make use of live migration as live migration requires a functional source host.

When using remote storage, so either Ceph or clustered LVM, Incus can automatically recover from the failure of a host by simply starting back those instances on another system.

In such an environment, setting cluster.healing_threshold to the desired delay will have Incus automatically relocate the instances to other servers and start them back up there.