Anyone any thoughts on what might cause this … experienced during a container restart … container doesn’t restart, then neither will incus …
Jun 09 10:05:50 lite kernel: INFO: task incusd:2939 blocked for more than 362 seconds.
Jun 09 10:05:50 lite kernel: Tainted: P O 6.12.25-v8-16k #1
Jun 09 10:05:50 lite kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 09 10:05:50 lite kernel: task:incusd state:D stack:0 pid:2939 tgid:2847 ppid:1 flags:0x0000000c
Jun 09 10:05:50 lite kernel: Call trace:
Jun 09 10:05:50 lite kernel: __switch_to+0xf0/0x150
Jun 09 10:05:50 lite kernel: __schedule+0x38c/0xdd8
Jun 09 10:05:50 lite kernel: schedule+0x3c/0x148
Jun 09 10:05:50 lite kernel: grab_super+0x158/0x1c0
Jun 09 10:05:50 lite kernel: sget+0x150/0x268
Jun 09 10:05:50 lite kernel: zpl_mount+0x134/0x2f8 [zfs]
Jun 09 10:05:50 lite kernel: legacy_get_tree+0x38/0x70
Jun 09 10:05:50 lite kernel: vfs_get_tree+0x30/0x100
Jun 09 10:05:50 lite kernel: path_mount+0x410/0xa98
Jun 09 10:05:50 lite kernel: __arm64_sys_mount+0x194/0x2c0
Jun 09 10:05:50 lite kernel: invoke_syscall+0x50/0x120
Jun 09 10:05:50 lite kernel: el0_svc_common.constprop.0+0x48/0xf0
Jun 09 10:05:50 lite kernel: do_el0_svc+0x24/0x38
Jun 09 10:05:50 lite kernel: el0_svc+0x30/0xd0
Jun 09 10:05:50 lite kernel: el0t_64_sync_handler+0x100/0x130
Jun 09 10:05:50 lite kernel: el0t_64_sync+0x190/0x198
Update: happened again on a different node, possibly associated with;
ovsdb-server[2598]: ovs|00034|raft|INFO|Transferring leadership to write a snapshot.
ovsdb-server[2598]: ovs|00035|raft|INFO|rejected append_reply (not leader)
ovsdb-server[2598]: ovs|00036|raft|INFO|rejected append_reply (not leader)
ovsdb-server[2598]: ovs|00037|raft|INFO|server 18e5 is leader for term 6
And
kernel: eth1: renamed from veth87299ae4
kernel: veth6f6b37eb: renamed from physn1QH2O
incusd[8150]: time="2025-06-09T12:39:03+01:00" level=warning msg="Could not find OVN Switch port associated to OVS interface" device=eth-1 driver=nic instance=kuma interface=vethf03f89dc project=default
This is bad news because it locks the entire node needing a reboot.
Ok, I’ve not been able to reproduce it “exactly”, however it tends to happen when I’m changing an instance, either the profile (which implicitly changes the network) or adding / deleting interfaces, predominantly OVN networks and interfaces. Had it 4 times today so far … node won’t even close down, needs a power button.
The error seems to indicate a kernel level lock with a mount operation.
You can look at ps fauxww on the system and look for processes in D state to get an idea of what’s currently stuck, but when you get those kind of messages, there’s nothing that Incus can do about it, it’s no longer running until whatever syscall it’s stuck on finally completes.
Sure, makes sense … but it doesn’t happen (at all) when I’m not messing with interfaces, which I’m doing through Incus. Whereas I appreciate the problem is at a lower level somewhere in the kernel, it would appear to be being triggered by Incus’ behavior … maybe the way or order in which incus is adding and removing things.