Enable live migration?


#1

Hi,

I’d like to try out live migration but some how I can’t get it to work.

I’m running Ubuntu 18.04 and installed lxd 3.0.1 and criu 3.6 via apt.

I’ve setup my two ubuntu hosts infra1 and infra2 as a lxd cluster. The storage pool is based on ceph.

I can create a container e.g. bionic and move it from one server to the other as long as it is stopped.

lxc stop bionic
lxc move bionic --target infra2
lxc start bionic

If I try this when the container is running I get:

Error: Container is running

As it seems it doesn’t even try to migrate the container. Do I have to enable live migration in lxd somehow?

BTW: I can take a statefull snapshot, move the container to the other machine and restore this statefull snapshot on the other machine and it runs just fine.

Regards

Chris


(Stéphane Graber) #2

So the main issue here is that LXD right now can’t move a container from one node to another without having it be renamed too and renaming isn’t supported when using CRIU.

We do have planned work to get a new API specifically to move containers within the cluster, this could internally be made to trigger CRIU for running containers. Whether we’ll do that part or not, I’m not sure given the current state of CRIU (we pretty much can’t test it since it really can’t migrate a whole lot these days).


#3

ah, ok, so live migration in an lxd cluster is just not supported at the moment. I was starting to feel quite stupid that I can’t get it to work.

Well, I’m quite happy that offline migration on an lxd cluster with ceph storage works so easily. Live migration would have been a nice bonus, but since I’m getting weird ext4 errors in the one container I moved via stateful snapshots it might be a good idea to stay away from live migration anyway.

Thanks for the clarification.

Regards

Chris


(Shantur Rathore) #4

@stgraber: Is there any alternative to move live containers in ceph environment?


(Stéphane Graber) #5

Nope, moving live processes can only be done with CRIU, there aren’t any other kernel features or userspace tools that can do that at this time.

Your only other alternative is to temporarily shut down the container and move it while stopped.


(Shantur Rathore) #6

@stgraber Thanks.
I will wait for you guys to implement this support. Is it planned for any release right now?


(Stéphane Graber) #7

No, CRIU is the only option and we’re not the ones developing that, it’s an external project.

It’s an extremely complex piece of work and not something we intend to work on ourselves, we do make sure that LXD properly interacts with CRIU but don’t work on adding missing features to CRIU ourselves.


(Shantur Rathore) #8

Time to bug CRIU devs then :grinning:


(Shantur Rathore) #9

Apologies to be a pain. Just wanted to understand what lxd needs from criu to do live migrations. I am trying to open an issue and start a discussion there.
Also in theory if lxd can move a container from one node to another without renaming it, will criu work in that case?


(Stéphane Graber) #10

The moving from one node to another without renaming is indeed a LXD issue and is being worked on.

The CRIU issues will happen after that where you’ll more than likely notice a wide range of processes in your containers using kernel features that CRIU doesn’t know how to handle.

In another issue I noticed that your profile allows for containers to use /dev/kvm, that’d be an example of something that CRIU doesn’t know how to serialize, so as long as anything in the container uses /dev/kvm, it’ll fail live migration. Current systemd is also a problem and can’t be live migrated.

The easiest way for you to see what will happen once we fix the LXD move issue would be to have a node that is not in your cluster and doesn’t use CEPH, then attempt to move one of your containers over to it. That will have LXD use good old rsync for data, which is slow but should work fine and then call CRIU for you which will most likely fail and give you some hints as to what’s missing in CRIU for your particular use case.