Container stop/restart actions often hang or take ages

kamzar1 · September 19, 2021, 7:09pm

Ubuntu 20.04
LXD 4.18 snap
I cant exactly determine, at which update/upgrade it happened, but for few months, stopping or restarting a container often hangs or take very long time. After stop, container network get disabled, but container by itself shows ‘Running’.
Console shows nothing, naturally cant login to container to see what is happening, neither lxd nor container logs showing useful information.
I saw in discussions in few past months people complain similar and the general answer is using --force, but it seems --force hasn’t help anyone.
Is there a “–SUPERFORCE” option to kill the container, rather rebooting the server? Something Like virsh destroy command?
Because there is only a server reboot which can help in this situation and thats happening a lot recently.
Also snap restart lxd takes generally 10 times more than the time as LXD(Container Management) was not subordinated to yet another container management(snap) and to make it more stealth, pushed into a user space.

stgraber · September 20, 2021, 8:37am

lxc stop --force NAME kills the container directly. If this hangs too, most likely your kernel hit a bug and the machine needs a reboot.

If lxc stop --force NAME hangs, then run dmesg and ps fauxww to see what’s going on.
Changes are you’ll see a bunch of processes stuck in unterruptible I/O state (D) and will see kernel errors that explain why they are stuck that way.

In general, we don’t recommend snap restart as that restarts every single container, instead use systemctl reload snap.lxd.daemon which only restarts the managing daemon (LXD) but keeps all instances running.