Problem with stateful container (re)start

I’ve been experimenting with LXC/LXD for a while to set up a number of containers which I want to suspend/“hibernate” if the host machine is shut down/rebooted and resume where they left off once the host is up again. However, this only works for me as far as making a stateful shutdown of a container while I cannot restart it statefully. This part chokes on a number of errors from Criu, however note that this can easily be due to configuration issues as I’m pretty noob at this, but any help will be greatly appreciated anyway :slight_smile:

I took some advice that @stgraber gave in this discussion and started out with a very simple container image: lxc launch images:alpine/edge take2

Using that works fine with “lxc stop take2 --stateful” and a subsequent “lxc start take2”, however if I start a background process in the container and repeat the stop/start operations then “lxc start” fails with:

[root@guineapig bob]# lxc start take2
Error: snapshot restore failed
(00.012316) Warn (criu/cr-restore.c:1243): Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,because not all kernels support such clone flags combinations!
(00.381354) 1: Warn (criu/sk-unix.c:1756): sk unix: Can’t unlink stale socket 0x6b7b peer 0 (name /dev/log dir -)
(00.397694) 295: Error (criu/sockets.c:774): Unable to find a network namespace
(00.397734) 295: Error (criu/files.c:1194): Unable to open fd=15 id=0x13
(00.398080) 292: Error (criu/cr-restore.c:1387): 295 exited, status=1
(00.556028) Error (criu/cr-restore.c:2266): Restoring FAILED.
Try lxc info --show-log take2 for more info

lxc info says:

Name: take2
Remote: unix://
Architecture: x86_64
Created: 2018/07/28 15:00 UTC
Status: Stopped
Type: persistent
Profiles: default

Log:

lxc take2 20180731115324.483 ERROR lxc_criu - criu.c:do_restore:1089 - criu process exited 1, output:

lxc 20180731115324.505 WARN lxc_commands - commands.c:lxc_cmd_rsp_recv:130 - Connection reset by peer - Failed to receive response for command “get_state”
lxc take2 20180731115325.559 ERROR lxc_conf - conf.c:run_buffer:348 - Script exited with status 1
lxc take2 20180731115325.560 ERROR lxc_start - start.c:lxc_fini:975 - Failed to run lxc.hook.post-stop for container “take2”
lxc take2 20180731115325.568 ERROR lxc_criu - criu.c:__criu_restore:1414 - restore process died

The process that I started in the container was a simple script running an infinite loop printing the current time each second (i.e I executed: nohup sh -c infinite.sh &).

Also, I can restart the container if I ignore the state (lxc start take2 --stateless), however that is not what I want :unamused: I want my running processes to resume where they left off, so if somebody out there has any suggestions / pointers to a possible solution to this problem then I would very much like to hear about it :smile:

The boring details about my setup:

Arch Linux 4.17.11.a-1-hardened x86_64 (Checkpoint enabled) built from AUR
lxd 3.3-1 (built from AUR)
criu 3.9-1 (built from AUR)
lxc 1:3.0.1-1 (plain pacman installation)
lxcfs 3.0.1-1 (plain pacman installation)

Output from lxc-checkconfig:

— Namespaces —
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled

— Control groups —
Cgroups: enabled

Cgroup v1 mount points:
/sys/fs/cgroup/systemd
/sys/fs/cgroup/perf_event
/sys/fs/cgroup/blkio
/sys/fs/cgroup/cpu,cpuacct
/sys/fs/cgroup/freezer
/sys/fs/cgroup/rdma
/sys/fs/cgroup/net_cls,net_prio
/sys/fs/cgroup/pids
/sys/fs/cgroup/memory
/sys/fs/cgroup/hugetlb
/sys/fs/cgroup/cpuset
/sys/fs/cgroup/devices

Cgroup v2 mount points:
/sys/fs/cgroup/unified

Cgroup v1 clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled

— Misc —
Veth pair device: enabled, loaded
Macvlan: enabled, not loaded
Vlan: enabled, loaded
Bridges: enabled, loaded
Advanced netfilter: enabled, not loaded
CONFIG_NF_NAT_IPV4: enabled, loaded
CONFIG_NF_NAT_IPV6: enabled, loaded
CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, not loaded
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, not loaded
FUSE (for use with lxcfs): enabled, loaded

— Checkpoint/Restore —
checkpoint restore: enabled
CONFIG_FHANDLE: enabled
CONFIG_EVENTFD: enabled
CONFIG_EPOLL: enabled
CONFIG_UNIX_DIAG: enabled
CONFIG_INET_DIAG: enabled
CONFIG_PACKET_DIAG: enabled
CONFIG_NETLINK_DIAG: enabled
File capabilities: