I’m running LXD 5.0.2 on Ubuntu 22.04 using ZFS as the storage backend. Sometimes, when attempting to restore an LXD container to a snapshot, I get the following timeout:
$ lxc restore mycontainer mysnapshot
Error: Instance "restore" operation timed out after 30s
What can I do to debug this and force reverting to this snapshot? Thanks!
We’ve tested this 8 times over the course of the past week. We are always using the same instance and trying to lxc restore to the same snapshot. Of the 8 runs, 6 have succeeded, 1 failed with the above error, and the last 1 failed with the following error:
Error: In use
Is there a way to force the restore or otherwise make it more reliably succeed?
If you’re getting stuck on “In Use” then reloading LXD will clear the reference counters without restarting the instances.
sudo systemctl reload snap.lxd.daemon
Ideally if you can reboot the host machine to get a fresh state though and then try doing a restore in one window whilst in another window run:
lxc monitor --type=logging --pretty
And post the output here, as it sounds like the ZFS restore is starting, but for some reason taking a long time and then LXD is giving up, with it still running in the background.
Also if you could get a sudo ps aux output while you’re running the restore process, that would be helpful too, along with the name of the instance and the snapshot name.
We’re still working on reproducing the timeout error but did successfully reproduce the Error: In use error and got the following output from lxc monitor:
DEBUG [2023-02-24T07:03:07-06:00] Start finished instance=mycontainer instanceType=container project=default stateful=false
DEBUG [2023-02-24T07:03:07-06:00] Instance operation lock finished action=start err="<nil>" instance=mycontainer project=default reusable=false
DEBUG [2023-02-24T07:03:07-06:00] Event listener server handler stopped listener=7d99f914-c88a-4ec4-a332-0d0d2d9433fd local=/var/snap/lxd/common/lxd/unix.socket remote=@
ERROR [2023-02-24T07:03:07-06:00] balance: Unable to set cpuset err="setting cgroup item for the container failed" name=mycontainer2 value="1,15,23,26,30,42,50,56"
DEBUG [2023-02-24T07:03:55-06:00] Container stopped, cleaning up instance=mycontainer instanceType=container project=default
DEBUG [2023-02-24T07:03:55-06:00] Stopping device device=root instance=mycontainer instanceType=container project=default type=disk
DEBUG [2023-02-24T07:03:55-06:00] UnmountInstance started instance=mycontainer project=default
DEBUG [2023-02-24T07:03:55-06:00] UnmountInstance finished instance=mycontainer project=default
DEBUG [2023-02-24T07:03:55-06:00] Skipping unmount as in use driver=zfs pool=default refCount=1 volName=mycontainer
INFO [2023-02-24T07:03:55-06:00] Restoring container created="2023-02-23 13:23:42.598150347 +0000 UTC" ephemeral=false instance=mycontainer instanceType=container project=default source=mycontainer/mysnap used="2023-02-24 13:03:06.869029524 +0000 UTC"
DEBUG [2023-02-24T07:03:55-06:00] Instance operation lock finished action=restore err="<nil>" instance=mycontainer project=default reusable=false
DEBUG [2023-02-24T07:03:55-06:00] Scheduler: container mycontainer stopped: re-balancing
DEBUG [2023-02-24T07:03:55-06:00] Instance operation lock created action=restore instance=mycontainer project=default reusable=false
DEBUG [2023-02-24T07:03:55-06:00] Stop finished instance=mycontainer instanceType=container project=default stateful=false
DEBUG [2023-02-24T07:03:55-06:00] MountInstance started instance=mycontainer project=default
DEBUG [2023-02-24T07:03:55-06:00] Mounting instance to check for CRIU state path existence instance=mycontainer instanceType=container project=default
DEBUG [2023-02-24T07:03:55-06:00] UnmountInstance started instance=mycontainer project=default
DEBUG [2023-02-24T07:03:55-06:00] MountInstance finished instance=mycontainer project=default
DEBUG [2023-02-24T07:03:55-06:00] UnmountInstance finished instance=mycontainer project=default
DEBUG [2023-02-24T07:03:55-06:00] Skipping unmount as in use driver=zfs pool=default refCount=1 volName=mycontainer
DEBUG [2023-02-24T07:03:55-06:00] Failure for operation class=task description="Restoring snapshot" err="In use" operation=d9b513b0-ec3e-45f3-a3e6-4e43da0842cc project=default
DEBUG [2023-02-24T07:03:55-06:00] Instance operation lock finished action=restore err="In use" instance=mycontainer project=default reusable=false
DEBUG [2023-02-24T07:03:55-06:00] Event listener server handler stopped listener=61acc074-12b3-4b9e-8adb-9cdd93f49688 local=/var/snap/lxd/common/lxd/unix.socket remote=@
ERROR [2023-02-24T07:03:55-06:00] balance: Unable to set cpuset err="setting cgroup item for the container failed" name=mycontainer2 value="0,11,20,39,42,54,55,6"