Error with Stateful Restore

I am receiving the following error:

Error: Failed to run: /snap/lxd/current/bin/lxd forkmigrate mycontainer /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/mycontainer/lxc.conf /var/snap/lxd/common/lxd/containers/mycontainer/state true:

The setup is as follows:
I have two AWS Ubuntu 18.04 nodes, with LXD 4.6 and CRIU installed. One is the source node, the other is the remote. On the source, I am running a simple script which prints a number every few seconds and increments it. I am successfully able to take a stateful snapshot, and then do an incremental copy to the destination node. However, on the destination node, I am unable to restore the container statefully.

On the source:

$ lxc snapshot mycontainer --stateful
$ lxc copy mycontainer dest:mycontainer --refresh

On the destination:

$ lxc list
+-------------+---------+------+------+-----------+-----------+
|    NAME     |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-------------+---------+------+------+-----------+-----------+
| mycontainer | STOPPED |      |      | CONTAINER | 2         |
+-------------+---------+------+------+-----------+-----------+
$ lxc start mycontainer
$ lxc restore mycontainer snap1 --stateful
Error: Failed to run: /snap/lxd/current/bin/lxd forkmigrate mycontainer /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/mycontainer/lxc.conf /var/snap/lxd/common/lxd/containers/mycontainer/state true:

Any suggestions would be helpful. Thanks.

Look at /var/snap/lxd/common/lxd/logs/mycontainer for more detailed logs. I would expect something like restore_XYZ.log to be in there.

I can’t seem to find any logs of the format restore_XYZ.log. However, I am attaching the other logs present.
console.log: https://pastebin.com/pqjQiGWR
forkstart.log: empty
lxc.log:

lxc mycontainer 20201007132443.822 ERROR    criu - criu.c:criu_ok:872 - Found un-dumpable network: phys (eth0)

Right, that’s the usual issue then.

It’s currently impossible to checkpoint/restore containers that have a network interface.
This is obviously a bit of an issue as pretty much all containers have one :slight_smile:

The current workaround would be to unplug the network prior to migration and re-attach it afterwards, but that’s definitely unpleasant.

@brauner will be working on fixing this limitation in the near future as we’ve finally managed to get some checkpoint/restore work back onto our roadmap.

2 Likes

Ok. By unplugging, do you mean to physically disconnect from the network, or do I disable the network interface? Also, do I do this before stateful snapshot, or incremental copy?

Thanks.

lxc config device remove

You’d need to do that before any stateful snapshot, stateful stop or lxc copy where the container is running and --stateless isn’t passed.

1 Like

(To be clear, I realize this is a very annoying limitation and that it pretty much negates any benefit you may get from using CRIU, that’s why we’re actively looking at fixing this :))

1 Like

Thanks for you immense help. However, I have to bother you again.

$ lxc config show mycontainer lists the following:

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 18.04 LTS amd64 (release) (20200922)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20200922"
  image.type: squashfs
  image.version: "18.04"
  volatile.base_image: 39a93d0b355279d430e8ce21c689aa88515212ee99874276e77f7f31ad7bf810
  volatile.eth0.host_name: vethbafe49c2
  volatile.eth0.hwaddr: 00:16:3e:69:a9:ca
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
devices: {}
ephemeral: false
profiles:
- default
stateful: false
description: ""

$ lxc config show --expanded mycontainer lists the following:

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 18.04 LTS amd64 (release) (20200922)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20200922"
  image.type: squashfs
  image.version: "18.04"
  volatile.base_image: 39a93d0b355279d430e8ce21c689aa88515212ee99874276e77f7f31ad7bf810
  volatile.eth0.host_name: vethbafe49c2
  volatile.eth0.hwaddr: 00:16:3e:69:a9:ca
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Both
$ lxc config device remove mycontainer nic
and
$ lxc config device remove mycontainer eth0
result in
Error: The device doesn't exist

As you’ve shown that the NIC device eth0 only appears with the flag --expanded, this shows the NIC device is applied to your container from a profile. So you cannot ‘remove’ the NIC device from your container using lxc config device remove as the device isn’t actually configured against your container in the first place.

If you’re trying to temporarily remove the NIC from your container and don’t want to remove it from the profile using lxc profile device remove <profile> <device> - which would affect all containers using that profile. Then instead you can add a ‘fake’ empty device to your container with the same name as the device you want to remove, so that it overrides the NIC device from the profile.

E.g.

lxc init images:ubuntu/focal c1
lxc config show c1
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu focal amd64 (20201011_07:42)
  image.os: Ubuntu
  image.release: focal
  image.serial: "20201011_07:42"
  image.type: squashfs
  volatile.apply_template: create
  volatile.base_image: 62cf969fa90ca8f7e20a578465b886d7078d5567b6f35210c609f7c0de958ef8
  volatile.eth0.hwaddr: 00:16:3e:66:c6:54
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
devices: {}
ephemeral: false
profiles:
- default
stateful: false
description: ""
lxc config show c1 --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu focal amd64 (20201011_07:42)
  image.os: Ubuntu
  image.release: focal
  image.serial: "20201011_07:42"
  image.type: squashfs
  volatile.apply_template: create
  volatile.base_image: 62cf969fa90ca8f7e20a578465b886d7078d5567b6f35210c609f7c0de958ef8
  volatile.eth0.hwaddr: 00:16:3e:66:c6:54
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

Now add the special none device:

lxc config device add c1 eth0 none
lxc config show c1 --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu focal amd64 (20201011_07:42)
  image.os: Ubuntu
  image.release: focal
  image.serial: "20201011_07:42"
  image.type: squashfs
  volatile.apply_template: create
  volatile.base_image: 62cf969fa90ca8f7e20a578465b886d7078d5567b6f35210c609f7c0de958ef8
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
devices:
  eth0:
    type: none
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""