Instance stop operation timed out after 30 seconds

vosdev · October 4, 2021, 12:43pm

~
root @ node1 # lxc stop micro1 micro2 micro3

micro1: error: Instance stop operation timed out after 30 seconds
micro1: Try `lxc info --show-log micro1` for more info
micro3: error: Instance stop operation timed out after 30 seconds
micro3: Try `lxc info --show-log micro3` for more info

Error: Some instances failed to stop

~
root @ node1 # lxc profile show default
config:
  boot.autostart.delay: "3"
  boot.host_shutdown_timeout: "90"

~
root @ node1 # lxc config show micro1 -e
architecture: x86_64
config:
  boot.autostart.delay: "3"
  boot.host_shutdown_timeout: "90"
  cluster.evacuate: stop
  image.architecture: amd64
  image.description: Ubuntu focal amd64 (20210915_08:59)
  image.os: Ubuntu
  image.release: focal
  image.serial: "20210915_08:59"
  image.type: disk-kvm.img
  image.variant: cloud
  limits.cpu: "8"
  limits.memory: 8GB
  limits.memory.enforce: soft

I set my instance shutdown timeout to 90 seconds, but the reality is 30.

The name of the config option is confusing, as if it only counts when I shutdown my host? I could not find any other configuration option to set the instance shutdown timeout.

The description states Seconds to wait for instance to shutdown before it is force stopped so I think I have the right option.

The guests are 3 Ubuntu 20.04 VMs running microk8s and that takes a little over 30 seconds to shut down.

tomp · October 4, 2021, 12:55pm

According to this:

https://linuxcontainers.org/lxd/docs/master/api-extensions#container_host_shutdown_timeout

This setting is only supposed to take effect on LXD shutdown, rather than using lxc stop.

You can pass --timeout to lxc stop though.

vosdev · October 4, 2021, 1:19pm

Aah darn. Do you think this would be a good idea to make a global/instance configurable option?

lxc alias add stop 'stop --timeout 90'

Unfortunately the above workaround does not work . I’ll have to think of a new word to make that work.

tomp · October 4, 2021, 1:20pm

Best thing to do is open an issue over at https://github.com/lxc/lxd/issues and perhaps it would be feasible to make that setting also be the default for lxc stop.

tomp · October 4, 2021, 2:18pm

Issue created:

github.com/lxc/lxd

Configurable timeout for manually stopping instances

opened 01:35PM - 04 Oct 21 UTC

vosdev

# Required information * Distribution: Ubuntu 20.04 * LXD version: 4.18 … # Feature Request https://linuxcontainers.org/lxd/docs/master/api-extensions#container95host95shutdown95timeout The `boot.host_shutdown_timeout` option is only used on a clean shutdown of the LXD daemon. I would like to request a similar option (`boot.instance_shutdown_timeout`?) that is used for the timeout when stopping instances like `lxc stop c1`. Currently that timeout is only configurable through the `--timeout` option on the command itself like `lxc stop --timeout 90 c1`, with a default of 30 seconds. Basically all my VMs with kubernetes take little over 30 seconds to shutdown so I run into this problem every time I stop one of those instances and forget the --timeout option.

vosdev · October 4, 2021, 3:58pm

Moving from the issue back to the forums, I have spent some time debugging and written down my findings here. A lot of text and screenshots (screencopies?) to describe my findings.
I discovered that the VMs do a clean shutdown and that takes them roughly 90 seconds to achieve, and then I am immediately thrown the “operation timed out after 30 seconds” error once the qemu processes disappear if the specified timeout is longer than the time it takes to shut down those VMs.

No timeout:

root @ node1 # lxc stop kubew1 kubew2 kubew3 kubew4 kubew5 kubew6
kubew1: error: Instance stop operation timed out after 30 seconds
kubew1: Try `lxc info --show-log kubew1` for more info
kubew4: error: Instance stop operation timed out after 30 seconds
kubew4: Try `lxc info --show-log kubew4` for more info
kubew6: error: Instance stop operation timed out after 30 seconds
kubew6: Try `lxc info --show-log kubew6` for more info
kubew2: error: Instance stop operation timed out after 30 seconds
kubew2: Try `lxc info --show-log kubew2` for more info
kubew5: error: Instance stop operation timed out after 30 seconds
kubew5: Try `lxc info --show-log kubew5` for more info
kubew3: error: Instance stop operation timed out after 30 seconds
kubew3: Try `lxc info --show-log kubew3` for more info

I believe the above error is https://github.com/lxc/lxd/blob/d69e2134c3c88f6ea7d63604f16d6ad5616aebba/lxd/instance/operationlock/operationlock.go#L74

With (short) timeout:

lxc stop kubew1 kubew4 --timeout 60
kubew1: error: Instance was not shutdown after timeout
kubew1: Try `lxc info --show-log kubew1` for more info
kubew4: error: Instance was not shutdown after timeout
kubew4: Try `lxc info --show-log kubew4` for more info

The qemu processes stay up & running for ~90 seconds after the stop is issued. Kernel logs inside VM show that the shutdown was clean and initiated the same second I did a lxc stop on one of my hosts.

If I add a timeout of 45 sec, it still takes the qemu processes 90-92 seconds to disappear.

~
root @ node3 # date; lxc stop kubew1 kubew4  --timeout 45; date
Mon 04 Oct 2021 05:36:37 PM CEST
kubew1: error: Instance was not shutdown after timeout
kubew1: Try `lxc info --show-log kubew1` for more info
kubew4: error: Instance was not shutdown after timeout
kubew4: Try `lxc info --show-log kubew4` for more info

Error: Some instances failed to stop
Mon 04 Oct 2021 05:37:23 PM CEST

an ugly while loop (while [ 1 ]; do echo "\n"; date;ps ax | grep qemu | grep kubew; sleep 1; done) on the node that hosts these VMs shows the qemu processes disappearing roughly 90 seconds after I issued the lxc stop

Mon 04 Oct 2021 05:38:08 PM CEST
 269431 ?        Sl     4:03 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew4 -uuid 1667bfd9-dce7-4348-be06-85b9d36643f9 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew4/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew4/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew4/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew4/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
 269436 ?        Sl    12:30 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew1 -uuid bcb9ca73-dae7-413d-a937-504c7ded3786 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew1/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew1/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew1/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew1/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

Mon 04 Oct 2021 05:38:09 PM CEST
 269431 ?        Dl     4:04 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew4 -uuid 1667bfd9-dce7-4348-be06-85b9d36643f9 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew4/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew4/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew4/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew4/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
 269436 ?        Sl    12:30 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew1 -uuid bcb9ca73-dae7-413d-a937-504c7ded3786 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew1/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew1/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew1/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew1/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

Mon 04 Oct 2021 05:38:10 PM CEST
<nothing>

As you can see, roughly 90 seconds after the stop command.

The following is interesting. When I issue a lxc stop --timeout 300 the qemu processes still disappear after roughly 90 seconds, but I am thrown the instance lock timeout of 30 seconds error immediately after they are shutdown.

root @ node3 # date; lxc stop kubew1 kubew4  --timeout 300; date
Mon 04 Oct 2021 05:43:06 PM CEST
kubew4: error: Instance stop operation timed out after 30 seconds
kubew4: Try `lxc info --show-log kubew4` for more info
kubew1: error: Instance stop operation timed out after 30 seconds
kubew1: Try `lxc info --show-log kubew1` for more info

Error: Some instances failed to stop
Mon 04 Oct 2021 05:44:40 PM CEST

Mon 04 Oct 2021 05:44:38 PM CEST
 274681 ?        Sl    10:19 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew1 -uuid bcb9ca73-dae7-413d-a937-504c7ded3786 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew1/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew1/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew1/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew1/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
 274685 ?        Sl     3:16 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew4 -uuid 1667bfd9-dce7-4348-be06-85b9d36643f9 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew4/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew4/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew4/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew4/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

Mon 04 Oct 2021 05:44:39 PM CEST
 274681 ?        Sl    10:19 /snap/lxd/21497/bin/qemu-system-x86_64 -S -name kubew1 -uuid bcb9ca73-dae7-413d-a937-504c7ded3786 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/kubew1/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/kubew1/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/kubew1/qemu.pid -D /var/snap/lxd/common/lxd/logs/kubew1/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

Mon 04 Oct 2021 05:44:40 PM CEST
<nothing>

So I double checked and the lxc stop without a timeout takes roughly 90 seconds to complete instead of the 30 seconds that is stated in the error.

~
root @ node3 # time lxc stop kubew1 kubew2 kubew3 kubew4 kubew5 kubew6
kubew1: error: Instance stop operation timed out after 30 seconds
kubew1: Try `lxc info --show-log kubew1` for more info
kubew4: error: Instance stop operation timed out after 30 seconds
kubew4: Try `lxc info --show-log kubew4` for more info
kubew2: error: Instance stop operation timed out after 30 seconds
kubew2: Try `lxc info --show-log kubew2` for more info
kubew3: error: Instance stop operation timed out after 30 seconds
kubew3: Try `lxc info --show-log kubew3` for more info
kubew5: error: Instance stop operation timed out after 30 seconds
kubew5: Try `lxc info --show-log kubew5` for more info
kubew6: error: Instance stop operation timed out after 30 seconds
kubew6: Try `lxc info --show-log kubew6` for more info

Error: Some instances failed to stop
lxc stop kubew1 kubew2 kubew3 kubew4 kubew5 kubew6  0.15s user 0.21s system 0% cpu 1:32.40 total

Hooray I found a bug?

If anything I have written down is unclear please let me know I’ll try to explain in more (or less?) detail. I just brought the subject up with a friend and he says he also experiences this issue with his (mostly) kubernetes VMs.

tomp · October 4, 2021, 4:41pm

Thanks for this, will go through it, but have chatted with @stgraber about this and we have a pretty good idea what the problem is.

tomp · October 5, 2021, 12:53pm

This should fix it:

vosdev · October 5, 2021, 7:34pm

Awesome! I see I opened a can of worms

Thank you both for your dedication