Moving from the issue back to the forums, I have spent some time debugging and written down my findings here. A lot of text and screenshots (screencopies?) to describe my findings.
I discovered that the VMs do a clean shutdown and that takes them roughly 90 seconds to achieve, and then I am immediately thrown the “operation timed out after 30 seconds” error once the qemu processes disappear if the specified timeout is longer than the time it takes to shut down those VMs.
No timeout:
root @ node1 # lxc stop kubew1 kubew2 kubew3 kubew4 kubew5 kubew6
kubew1: error: Instance stop operation timed out after 30 seconds
kubew1: Try `lxc info --show-log kubew1` for more info
kubew4: error: Instance stop operation timed out after 30 seconds
kubew4: Try `lxc info --show-log kubew4` for more info
kubew6: error: Instance stop operation timed out after 30 seconds
kubew6: Try `lxc info --show-log kubew6` for more info
kubew2: error: Instance stop operation timed out after 30 seconds
kubew2: Try `lxc info --show-log kubew2` for more info
kubew5: error: Instance stop operation timed out after 30 seconds
kubew5: Try `lxc info --show-log kubew5` for more info
kubew3: error: Instance stop operation timed out after 30 seconds
kubew3: Try `lxc info --show-log kubew3` for more info
I believe the above error is https://github.com/lxc/lxd/blob/d69e2134c3c88f6ea7d63604f16d6ad5616aebba/lxd/instance/operationlock/operationlock.go#L74
With (short) timeout:
lxc stop kubew1 kubew4 --timeout 60
kubew1: error: Instance was not shutdown after timeout
kubew1: Try `lxc info --show-log kubew1` for more info
kubew4: error: Instance was not shutdown after timeout
kubew4: Try `lxc info --show-log kubew4` for more info
The qemu processes stay up & running for ~90 seconds after the stop is issued. Kernel logs inside VM show that the shutdown was clean and initiated the same second I did a lxc stop
on one of my hosts.
If I add a timeout of 45 sec, it still takes the qemu processes 90-92 seconds to disappear.
~
root @ node3 # date; lxc stop kubew1 kubew4 --timeout 45; date
Mon 04 Oct 2021 05:36:37 PM CEST
kubew1: error: Instance was not shutdown after timeout
kubew1: Try `lxc info --show-log kubew1` for more info
kubew4: error: Instance was not shutdown after timeout
kubew4: Try `lxc info --show-log kubew4` for more info
Error: Some instances failed to stop
Mon 04 Oct 2021 05:37:23 PM CEST
an ugly while loop (while [ 1 ]; do echo "\n"; date;ps ax | grep qemu | grep kubew; sleep 1; done
) on the node that hosts these VMs shows the qemu processes disappearing roughly 90 seconds after I issued the lxc stop
Mon 04 Oct 2021 05:38:08 PM CEST
269431 ? Sl 4:03 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew4 -uuid 1667bfd9-dce7-4348-be06-85b9d36643f9 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew4/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew4/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew4/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew4/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
269436 ? Sl 12:30 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew1 -uuid bcb9ca73-dae7-413d-a937-504c7ded3786 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew1/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew1/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew1/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew1/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
Mon 04 Oct 2021 05:38:09 PM CEST
269431 ? Dl 4:04 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew4 -uuid 1667bfd9-dce7-4348-be06-85b9d36643f9 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew4/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew4/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew4/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew4/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
269436 ? Sl 12:30 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew1 -uuid bcb9ca73-dae7-413d-a937-504c7ded3786 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew1/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew1/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew1/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew1/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
Mon 04 Oct 2021 05:38:10 PM CEST
<nothing>
As you can see, roughly 90 seconds after the stop command.
The following is interesting. When I issue a lxc stop --timeout 300
the qemu processes still disappear after roughly 90 seconds, but I am thrown the instance lock timeout of 30 seconds error immediately after they are shutdown.
root @ node3 # date; lxc stop kubew1 kubew4 --timeout 300; date
Mon 04 Oct 2021 05:43:06 PM CEST
kubew4: error: Instance stop operation timed out after 30 seconds
kubew4: Try `lxc info --show-log kubew4` for more info
kubew1: error: Instance stop operation timed out after 30 seconds
kubew1: Try `lxc info --show-log kubew1` for more info
Error: Some instances failed to stop
Mon 04 Oct 2021 05:44:40 PM CEST
Mon 04 Oct 2021 05:44:38 PM CEST
274681 ? Sl 10:19 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew1 -uuid bcb9ca73-dae7-413d-a937-504c7ded3786 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew1/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew1/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew1/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew1/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
274685 ? Sl 3:16 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew4 -uuid 1667bfd9-dce7-4348-be06-85b9d36643f9 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew4/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew4/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew4/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew4/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
Mon 04 Oct 2021 05:44:39 PM CEST
274681 ? Sl 10:19 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew1 -uuid bcb9ca73-dae7-413d-a937-504c7ded3786 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew1/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew1/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew1/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew1/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
Mon 04 Oct 2021 05:44:40 PM CEST
<nothing>
So I double checked and the lxc stop
without a timeout takes roughly 90 seconds to complete instead of the 30 seconds that is stated in the error.
~
root @ node3 # time lxc stop kubew1 kubew2 kubew3 kubew4 kubew5 kubew6
kubew1: error: Instance stop operation timed out after 30 seconds
kubew1: Try `lxc info --show-log kubew1` for more info
kubew4: error: Instance stop operation timed out after 30 seconds
kubew4: Try `lxc info --show-log kubew4` for more info
kubew2: error: Instance stop operation timed out after 30 seconds
kubew2: Try `lxc info --show-log kubew2` for more info
kubew3: error: Instance stop operation timed out after 30 seconds
kubew3: Try `lxc info --show-log kubew3` for more info
kubew5: error: Instance stop operation timed out after 30 seconds
kubew5: Try `lxc info --show-log kubew5` for more info
kubew6: error: Instance stop operation timed out after 30 seconds
kubew6: Try `lxc info --show-log kubew6` for more info
Error: Some instances failed to stop
lxc stop kubew1 kubew2 kubew3 kubew4 kubew5 kubew6 0.15s user 0.21s system 0% cpu 1:32.40 total
Hooray I found a bug? 
If anything I have written down is unclear please let me know I’ll try to explain in more (or less?) detail. I just brought the subject up with a friend and he says he also experiences this issue with his (mostly) kubernetes VMs.