"/var/snap/lxd/common/lxd/virtual-machines/machine-3/state": context canceled

Sometimes getting this error when trying to restore a stateful snapshot for VM:

Failed restoring state from 
"/var/snap/lxd/common/lxd/virtual-machines/machine-3/state":
context canceled

LXD version: 5.14

What the error mean and what cases we may get this err?

Please get the output of lxc monitor --pretty from a new window when the restore is occurring to see whats going on. It suggests a timeout is being hit.

1 Like

I tried to extract important data from the log, worker-12 is VM with context canceled:

msg="Stateful checkpoint restore finished" instance=worker-12 instanceType=virtual-machine project=default source=/var/snap/lxd/common/lxd/virtual-machines/worker-12/state
msg="Instance operation lock finished" action=start err="Failed restoring state from \"/var/snap/lxd/common/lxd/virtual-machines/worker-12/state\": context canceled" instance=worker-12 project=default reusable=false
msg="Failed to unmount" attempt=0 err="device or resource busy" path=/var/snap/lxd/common/lxd/devices/worker-12/config.mount
msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0 username=ubuntu
msg="Handling API request" ip=@ method=GET protocol=unix url="/1.0/operations?recursion=1" username=ubuntu
msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0 username=ubuntu
msg="Handling API request" ip=@ method=GET protocol=unix url="/1.0/operations?recursion=1" username=ubuntu
msg="QMP monitor stopped" path=/var/snap/lxd/common/lxd/logs/worker-12/qemu.monitor
msg="Stopping device" device=root instance=worker-12 instanceType=virtual-machine project=default type=disk
msg="Stopping device" device=eth0 instance=worker-12 instanceType=virtual-machine project=default type=nic
msg="Got response struct from LXD"
msg="Sending request to LXD" etag= method=PUT url="https://custom.socket/1.0"
...
msg="UnmountInstance started" driver=zfs instance=worker-12 pool=zfspool project=default
msg="Unmounted ZFS dataset" dev=zfspool/virtual-machines/worker-12 driver=zfs path=/var/snap/lxd/common/lxd/storage-pools/zfspool/virtual-machines/worker-12 pool=zfspool volName=worker-12
msg="UnmountInstance finished" driver=zfs instance=worker-12 pool=zfspool project=default
msg="Deactivated ZFS volume" dev=zfspool/virtual-machines/worker-12.block driver=zfs pool=zfspool volName=worker-12
msg="Failure for operation" class=task description="Restoring snapshot" err="Failed restoring state from \"/var/snap/lxd/common/lxd/virtual-machines/worker-12/state\": context canceled" operation=8652e1f3-9fe1-4094-a3b6-778bb9b23d9c project=default
msg="Start finished" instance=worker-12 instanceType=virtual-machine project=default stateful=true
msg="ID: 8652e1f3-9fe1-4094-a3b6-778bb9b23d9c, Class: task, Description: Restoring snapshot" CreatedAt="2023-06-01 13:47:29.476236394 +0000 UTC" Err="Failed restoring state from \"/var/snap/lxd/common/lxd/virtual-machines/worker-12/state\": context canceled" Location=none MayCancel=false Metadata="map[]" Resources="map[instances:[/1.0/instances/worker-12]]" Status=Failure StatusCode=Failure UpdatedAt="2023-06-01 13:47:29.476236394 +0000 UTC"
msg="Event listener server handler stopped" listener=acf87b2d-2e3d-4762-b294-bd638a3fb90f local=/var/snap/lxd/common/lxd/unix.socket remote=@

5.14 fixed i/o timout issues I had, but this one persists

I’ve not been able to reproduce the issue, but in theory its possible that the cancelling of the stateCtx was being picked up as an error before the stateful restore had fully finished, so I’ve added a PR that should avoid this potential issue, that perhaps you were encountering.

1 Like

I can try the change via sudo snap install lxd --edge isn’t? (after merge)
or
https://linuxcontainers.org/lxd/docs/latest/installing/#installing-lxd-from-source

Yes give it a few days to get into edge though as its not instant.

1 Like