I’ve been trying to test out the stateful snapshotting via CRIU but it seems that the common ubuntu images aren’t working with criu out of the box. I was wondering what successes other people have had with this feature so I can find something that works for me.
(Note that I’m not saying that this is particularly useful, just that based on current known limitations and issues with CRIU, I suspect that this would be a working case.)
You suggestion worked until I tried to actually run something in it. I’ve been just putting a sleep process into containers just to convince myself this works. When I do this it fails:
--> lxc stop a1 --stateful
Error: snapshot dump failed
(00.694770) Warn (compel/arch/x86/src/lib/infect.c:281): Will restore 19918 with interrupted system call
(01.135229) Warn (compel/arch/x86/src/lib/infect.c:281): Will restore 20255 with interrupted system call
(01.136364) Error (criu/files-reg.c:1372): Can't lookup mount=638 for fd=0 path=/dev/pts/3
(01.136408) Error (criu/cr-dump.c:1348): Dump files (pid: 20255) failed with -1
(01.144135) Error (criu/cr-dump.c:1764): Dumping FAILED.
Try `lxc info --show-log a1` for more info
From what I remember, CRIU doesn’t like processes which were started in the container from an lxc exec session. Instead try starting your sleep from an init script, that should have a better chance of serializing.
We’re still supporting and occasionally updating the glue code between LXD/LXC and CRIU but we no longer have a full-time engineer working on CRIU.
Our goal at the time was to get CRIU to work with most common workloads on modern distributions (at the time Ubuntu 16.04 LTS). This turned into a bit of a losing battle as every time we’d add support for some new kernel feature in CRIU, the upstream Linux kernel would grow support for a dozen more features that CRIU didn’t understand.
So CRIU is certainly viable for very specific environments where the user is in complete control of the workload and distribution they run it on, but that market isn’t sufficient for us to justify very costly engineering efforts.
So right now, we tend to redirect requests for missing CRIU features directly to upstream CRIU where there is an active community of contributors that eventually tackle the most common limitations. There is a fair amount of investment in CRIU coming from academia, HPC and some big organizations like Google also actively make use of it and contribute fixes to it.
So it’s certainly not a dead end but for us as a generic container tool that focuses on generic workloads on modern distros, it’s not a current focus.
Thanks for this response and, this is kind of what I needed to hear. Which is: “Its there and it /can/ be useful but its pretty fragile and your going to have to put a lot of energy in to make it work.”
Its not a critical need for me so I guess I wait until CRIU works out of the box. Although at this point (and given what you said about the linux kernel) an entire OS based on orthogonal persistence sounds more practical unfortunately. Not something LXD can really fix.