LXD move, how to reduce downtime without live migration

benoit.georgelin · April 27, 2017, 12:53am

Good evening
I’m running into a problem with LXD live migration, so I can’t use live migration
I have this kind of error:

error: Error transferring container data: migration restore failed
(00.018254) Warn  (criu/apparmor.c:421): apparmor namespace /sys/kernel/security/apparmor/policy/namespaces/lxd-router-gw_<var-lib-lxd> already exists, restoring into it
(00.025064) Warn  (criu/cr-restore.c:853): Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,because not all kernels support such clone flags combinations!
(00.246825)      1: Warn  (criu/autofs.c:77): Failed to find pipe_ino option (old kernel?)
(00.249058) Error (criu/cr-restore.c:1024): 15755 killed by signal 11: Segmentation fault
(00.268144) Error (criu/cr-restore.c:1024): 15755 killed by signal 9: Killed
(00.288290) Error (criu/mount.c:3275): mnt: Can't remount root with MS_PRIVATE: No such file or directory
(00.288297) Error (criu/mount.c:3285): mnt: Can't unmount .criu.mntns.K2nNqG: No such file or directory
(00.288301) Error (criu/mount.c:3290): mnt: Can't remove the directory .criu.mntns.K2nNqG: No such file or directory
(00.288627) Error (criu/cr-restore.c:1890): Restoring FAILED.

I thought it was because of the kernel, but it looks supported : 4.4.0-47-generic

So, I would like to move my containers. Moving without live migration have a big impact as the container have to be stopped. And the bigger, the longer ^^

I’m using ZFS as backend storage.
LXD/LXC version 2.8 on source
LXD/LXC version 2.12 on destination

Do you have any idea how i can move it and reduce the downtime ?
Like doing an rsync while the container is running, stop the container, rsync while it is stopped so I save time.

My idea was eventually to create the container on destination node. Same name , same configuration
rsync the content of the storage from source to destination twice (one while running, one while stopped)
then start the container on the destination node

If you have any idea, it would be appreciated .

Regards,
Benoît

markc · April 27, 2017, 1:50am

One possibility is to match up the LXD versions by installing this PPA on the older (xenial?) system then the live migration might work.

deb http://ppa.launchpad.net/ubuntu-lxc/lxd-stable/ubuntu xenial main

benoit.georgelin · April 27, 2017, 3:19am

I’m moving the containers away so I can do the update.
This new version need some “reconfiguration” about manageable network/storage if I remember well from my last upgrade so I know It can take me some time to re-configure . But yes, it’s an option .

rkelleyrtp · April 27, 2017, 11:46am

Your rsync option sounds like the best approach. I have used this multiple times in the past when moving large VMs from one host to another. As you pointed out, create the container “shell” on the destination server, do an initial rsync of the “rootfs” directory, shutdown the primary container, do another rsync, then turn on the container on the destination.

As someone else pointed out (@tamas), you might use zfs send/receive since you are using ZFS volumes. That might make it faster than rsync.

Hope this helps.

stgraber · April 27, 2017, 4:03pm

Right, so the problem is definitely with CRIU here.
There are quite a number of known issues in CRIU when migrating containers, some of which are being worked on, some aren’t.

We (Canonical LXD team) used to be pretty active in CRIU upstream to try and improve things there, but we’ve recently had to refocus on pure LXD work as we unfortunately don’t have any paying customers for the live migration bits…

In the specific case of your issue, it looks like it’s the serialization and deserialization of apparmor protected processes that’s the issue. There are a few things you can do:

Disable AppArmor in the container (/etc/init.d/apparmor teardown), then try migrating again
Attempt to upgrade to a more recent CRIU like https://launchpad.net/ubuntu/+source/criu/2.12.1-2ubuntu1/+build/12473819
Do what you said with creating a new container and rsyncing, or using zfs send/receive as suggested. Do however note that you may run into problems if the uid/gid range of both machines doesn’t match.
Stop your container, then migrate it (so cold migration) which would avoid the uid/gid problem but indeed cause quite a bit of downtime.

One thing based on the comments I see above:

Please avoid recommending production users upgrade to the non-LTS branch (so 2.12 instead of 2.0.9) as both branches are equally supported and the very fast release pace of the non-LTS branch (monthly releases) is often incompatible with critical production environments.

spike · April 27, 2017, 5:45pm

getting a bit confusing with the double-posting to ML and here, but per conversation on the ML, what about hosting /var/lib/lxd on ZFS itself? if that is on ZFS and the containers are on ZFS, assuming you have the bridge on the failover host, is there anything else needed to being able to restart the container?

also what about stateful snapshots? I haven’t dug into those too deeply yet, are they stored on the same zfs dataset?

cheers,

Spike

benoit.georgelin · April 27, 2017, 6:42pm

I thought , as the forum is new , I’ll post on both ^^
So fare, I haven’t tried any hight availability. It’s one thing that I would like to work on .
Having two or more servers, used as LXD hypervisor and be able to manage containers on those servers and start them on any of the LXD hypervisor.

Same configuration on all hypervisor, with LXD (same range uid/gid) , network and storage.
That would be nice. In case of node failure, starting on the new hypervisor in a second .

Thanks @stgraber I will see how I can manage that.

Cheers,
Benoît

spike · April 27, 2017, 6:52pm

this isn’t HA tho, I’m talking about cold spares, especially if you clone them (via zfs along with /var/lib/lxd)) the instances would have the same mac address so you would most definitely not be able to run both at the same time.

still, it’d be plenty for us here as a way to recover very quickly, the time to run lxc start (which could be automated as a nagios handler if we really wanted to).

benoit.georgelin · April 27, 2017, 7:14pm

I think it’s a mix . Definitively not HA yes.
This would never support running same containers à the same time on different nodes, as you said, but still. You’ll have the ability to start a container wherever you want.
The only question is always the same, how to share the file system to all the nodes.
Dealing with ZFS clone is one way to go I guess, if you only want a cold spare, ready to take over on primary node failure.
So you know how to sync on normal usage and then in case of DR , the failure node will become your DR in case of the new primary is running.
If you can host your LXD + Container on a shared/cluster FS way, then it makes more flexible . No need migration at all, just power on from the node that will host that container. Backup from the cluster FS
You can balance your load over your LXD “cluster” . And In case of node failure, split those container across all the others nodes, easy as lxc start node1:ct1

rkelleyrtp · May 1, 2017, 1:27pm

Not to get too far off topic here, but the core problem with LXD and HA is: all files under /var/lib/lxd are “per-computer”. In other words, if you have 5 nodes and want real HA, somehow the files under /var/lib/lxd must be shared between the nodes. Perhaps future version of LXD can support an HA solution with a manager server, a number of worker nodes, and a shared storage technology (GFS, NFS, etc).

For example:

Session Manager node(s) that control the LXD database, profiles, worker nodes, etc.
Worker nodes that register to the session manager
Highly available shared storage technology (GFS, etc) for all worker nodes

The result would be a highly scalable and redundant container solution. You could easily load balance containers from one physical host to another w/out much downtime (1-2sec). And, with the right configuration per worker node, you could have a very cool multi-tenant solution ready for the data center.

I would love to contribute to such a open-source project, but I don’t have the coding depth to make it happen. I do, however, have enough architecture, network, compute, and storage background to help guide a project like this.

Any takers?