Lxd clustering, ceph storage and live migration

Hi !
I’ve setup a lxd cluster with three nodes. Aside this, for container storage, I’ve setup a complete ceph cluster and of course defined this on my lxd cluster.
Here is the output of lxc storage list :

lxdpool ceph created 4

And the output of lxc storage show lxdpool :

config:
ceph.cluster_name: ceph
ceph.osd.pg_num: “40”
ceph.osd.pool_name: cephfs
ceph.user.name: admin
volatile.pool.pristine: “true”
description: “”
name: lxdpool
driver: ceph
used_by:

  • /1.0/containers/lxc-drupal3
  • /1.0/containers/lxc-drupal4
  • /1.0/containers/lxc-drupal5
  • /1.0/images/663f6663aed66a22dd708c4b07514748221522b810008c55002fcc1dd81af377
    status: Created
    locations:
  • clusterlxd1
  • clusterlxd3
  • clusterlxd2
    Now, here is the output of lxc list :
    | lxc-drupal3 | RUNNING | 192.168.220.3 (eth0) | fd42:bdae:88e5:5d3c:216:3eff:fedf:966a (eth0) | PERSISTENT | | clusterlxd2 |
    ±------------±--------±---------------------±----------------------------------------------±-----------±----------±------------+
    | lxc-drupal4 | RUNNING | 192.168.220.4 (eth0) | fd42:bdae:88e5:5d3c:216:3eff:fe81:6a85 (eth0) | PERSISTENT | | clusterlxd1 |
    ±------------±--------±---------------------±----------------------------------------------±-----------±----------±------------+
    | lxc-drupal5 | RUNNING | 192.168.220.5 (eth0) | fd42:bdae:88e5:5d3c:216:3eff:fe52:5a0a (eth0) | PERSISTENT | | clusterlxd3 |
    ±------------±--------±---------------------±----------------------------------------------±-----------±----------±------------+

Now, I expected the lxd cluster to restart a container “automatically” on a live node of the cluster if one goes down.
For example, if I stop the lxd daemon on the host named clusterlxd3 in my setup, I expected the lxc-drupal5 to be started either on host clusterlxd1 or on host clusterlxd2.
Yet, it doesn’t work at all.
More on this, as far as the host clusterlxd3 is down in a way or another, there is no way to start my container, nor to move it on another node of the cluster.
I’ve searched a lot before posting here and I found this :

in wich stgraber stated in his first comment :

The exception to this is if you’re using CEPH as your storage backend, in that case, since your storage is over the network and not tied to any of the nodes, you will be able to move a container from one node to another and restart it there even when the source node has gone offline.

So, my question is : what did I do wrong or what is missing in my setup ?

Regarding automatic re-scheduling of containers, that’s beyond the scope of LXD clustering at the moment. The best way to think of LXD clustering is as a vSphere replacement using system containers instead of VMs, not as an application container orchestration manager (like k8s).

What error are you seeing when trying to move your container? What you describe must definitely work.

Here you find some examples of how moving containers between nodes works:

Basically the rules are:

  1. You can move the container cheaply (no data copy) between two nodes if the container is stopped.
  2. You can move the container cheaply (no data copy) between two nodes if the current node it’s running on is detected as offline.

Hi @freeekanayaka and thanks a lot for your answers and advises.
I took my time to reply, because I wanted to check something before.
So, here is what went wrong after investigations :
I had started by creating the lxd cluster. For this, on each node I had setup a zfs pool (local, so) with lxd init.
Then after did I create a ceph storage pool for the cluster.
I think that that’s what made things wrong. So, I’ve destroyed the whole cluster and recreated it, by specifying the ceph remote storage driver to lxd init.
Things looks far better.
I can now do :
clusterlxd3 $ snap stop lxd
clusterlxd1 $ lxc move lxd-drupal5 --target clusterlxd2
clusterlxd1 $ lxc start lxc-drupal5
All this work some kind of good.
There is just one thing left that annoys me :
before the snap stop lxd on clusterlxd3, I had setup the ip adress of the container this way :
clusterlxd2 $ lxc network attach lxdbr0 lxc-drupal5 eth0 && lxc config device set lxc-drupal5 eth0 ipv4.address 192.168.220.5
However, after all the above commands to move the container to clusterlxd2, the container starts fine but don’t get it’s configured address.
I expected the config to be on the ceph filesystem (or in the shared db), so that if would stay between the cluster’s host, but it doesn’t seem so.
I continue to investigate…

Complement to my previous post…
These commands don’t work at all :
clusterlxd3 $ snap start lxd
clusterlxd1 $ lxc move lxc-drupal5 --target clusterlxd3
Error: Migration API failure: Container is running
With ceph default and only storage backend, I expected this to work…

Please re-read my first reply. You can move a container from one node to another only if:

  1. The container is stopped.

OR

  1. The node where the container is running is detected as offline.

Regarding the network configuration issue, I would expect that to work too. What network configuration are you seeing in the container after moving it and starting it? @stgraber might help here.

I had read your first answer carefully, and well, I might have misunderstood the second part. I supposed (but I can see now that I was wrong) that by stopping the lxd daemon on the host, it would be detected as offline.
Maybe that it is not the case and offline means the machine completely down.
Anyway, thanks for your kind help !

Stopping the daemon is enough. It takes a few seconds to the detection to happen. You can use:

lxc cluster list

to see what nodes are online and what have been detected as offline.

Is there any solution existing yet to do what I wanted to wrongly achieve with my lxd cluster ? I mean the container orchestration manager you talked about ?

I don’t think so, I always make sure that I have redundant containers running on all hosts and haproxy directing traffic . That way if a container goes down I don’t need to worry about moving it and just wait for the host to come back up.

Additionally, I’m using ucarp to provide virtual IP addresses for applications like haproxy that need to share a single IP address and failover to a node that’s alive if the host goes down.

Imagine a scenario where you have a large redis dataset and LXD automatically moves it to another host that is alive, while also hosting a redundant container with that same large dataset? You might run that host out of RAM.

I would take care not to introduce such a scenario into your environment.