Lxc 5.0 live migration, "Unable to perform container live migration. CRIU isn't installed", but it is and criu.enable is set to true

I was super keen to try the live migration as soon as I’ve seen the changelog for 5.0, but the dreaded “CRIU isn’t installed” error is back, and now won’t go away with conventional means :wink:

RHEL8 (actually centos-8-Streams, but RHEL has the same error)

$ lxc mv thorough-bengal --target lxd10
Error: Failed stopping instance "thorough-bengal": Unable to perform container live migration. CRIU isn't installed

$ lxdrun rpm -q criu
lxd1: criu-3.15-3.module_el8.6.0+926+8bef8ae7.x86_64
lxd10: criu-3.15-3.module_el8.6.0+926+8bef8ae7.x86_64
lxd2: criu-3.15-3.module_el8.6.0+926+8bef8ae7.x86_64
lxd11: criu-3.15-3.module_el8.6.0+926+8bef8ae7.x86_64
lxd13: criu-3.15-3.module_el8.6.0+926+8bef8ae7.x86_64
lxd12: criu-3.15-3.module_el8.6.0+926+8bef8ae7.x86_64

$ lxdrun snap get lxd criu.enable
lxd1: true
lxd2: true
lxd10: true
lxd11: true
lxd13: true
lxd12: true

i have even tried hardlinking /usr/sbin/criu to /usr/bin, nope

That’s live migration of containers which is basically in the same state in 5.0 as it was in 4.0. What we added is live migration of virtual machines which is quite different.

I’m a bit confused as to why CRIU doesn’t get run in your environment though, setting criu.enable=true on the snap and reloading LXD afterwards should cause criu to be added to LXD’s PATH.

root@dakara:~# cat /proc/$(cat /var/snap/lxd/common/lxd.pid)/environ | tr '\0' '\n' | grep criu

ah, reload was the key, I was convinced that reload happens every time you do snap set, something happens though, I see the progress bar briefly every time, but obviously it’s not reload.

Make sure you don’t do a reload on all cluster members at the same time, as I’ve learned the hard way yesterday :wink:

Anyway, live migration of containers would have been a killer feature, it’s sad that it doesn’t get more attention, this is the one feature I miss the most since migrating away from OpenVZ, I guess there are not too many converts like me who really enjoyed being able to rebalance the clusters without users noticing.

Did you used to use the ploop backend? As I didn’t when using OpenVZ and I found that live migration would often cause issues with applications taking umbridge of the fact that the inode of open files changed after live migration.

Yeah, OpenVZ was doing the live-migration in-kernel which was a lot more reliable.
As we can’t do that these days, CRIU is the way to go (and is written by ex-OpenVZ people) but it’s far less reliable and only really works well with workloads that are designed for it (it’s actively used by Google and others).

No, always had zfs as a backend. Although I know about the inode issue, never in 10 years of using openvz I’ve stumbled upon an application that cared.

Iirc postfix spluttered every time but that was a while ago now.