nils
(Nils Meyer)
March 25, 2025, 5:56am
1
I’m using incus through ansible which may start multiple “incus exec” processes. Every so often the incus API just hangs (on both the Unix socket as well as https) giving now response, while connections are still accepted on the port or socket there is no response to requests.
The incus daemon is still running, strace shows a lot of futex() calls. The only thing that reliably seems to fix this is rebooting the machine. The vms and containers are still working.
Currently running incus 6.10.1 on Ubuntu 24.10.
stgraber
(Stéphane Graber)
March 25, 2025, 2:46pm
2
Can you do incus config set core.debug_address 127.0.0.1:8444
Once that’s done, next time it hangs, try to do curl http://127.0.0.1:8444/debug/pprof/goroutine?debug=2
This will show us a dump of all internal goroutines and hopefully explain why it’s hanging. You may also want to look at dmesg
to make sure you’re not stuck on some kind of kernel or hardware bug.
1 Like
nils
(Nils Meyer)
March 25, 2025, 2:54pm
3
Nothing in the kernel log, I’ve added the debug address. Will post here once I’ve been able to provoke the error again.
nils
(Nils Meyer)
May 31, 2025, 7:03am
4
Took a while but I’ve triggered the problem again. The problem is that the curl task also hangs without response.
This is something I’ve noticed in the kernel log:
[Sat May 31 06:26:50 2025] ------------[ cut here ]------------
[Sat May 31 06:26:50 2025] WARNING: CPU: 27 PID: 5873 at set_task_cpu+0x197/0x1d0
[Sat May 31 06:26:50 2025] Modules linked in: vhost_net tap tun dm_snapshot wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel veth nft_masq dummy bridge stp llc snd_hrtimer vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables rfcomm cmac algif_hash algif_skcipher af_alg qrtr bnep overlay sg btusb btrtl btintel btbcm btmtk bluetooth vfat fat squashfs loop f2fs snd_acp_legacy_mach snd_acp_mach snd_soc_nau8821 snd_acp3x_rn snd_acp70 snd_acp_i2s snd_acp_pdm snd_soc_dmic snd_acp_pcm snd_sof_amd_acp70 snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_ctl_led snd_sof_amd_acp snd_sof_pci snd_hda_codec_realtek snd_sof_xtensa_dsp snd_hda_codec_generic
[Sat May 31 06:26:50 2025] snd_hda_scodec_component snd_sof snd_hda_codec_hdmi snd_sof_utils snd_pci_ps amd_atl intel_rapl_msr snd_soc_acpi_amd_match snd_amd_sdw_acpi intel_rapl_common snd_hda_scodec_cs35l56_spi soundwire_amd amdgpu snd_hda_intel soundwire_generic_allocation snd_intel_dspcfg soundwire_bus snd_intel_sdw_acpi snd_soc_sdca snd_hda_codec snd_soc_core mt7925e snd_compress snd_hda_core ac97_bus mt7925_common snd_seq_dummy snd_hwdep snd_pcm_dmaengine mt792x_lib snd_rpl_pci_acp6x snd_seq_oss drm_panel_backlight_quirks mt76_connac_lib snd_acp_pci amdxcp snd_seq_midi snd_acp_legacy_common mt76 drm_buddy sch_fq_codel snd_seq_midi_event snd_hda_scodec_cs35l56_i2c drm_exec snd_pci_acp6x kvm_amd mac80211 snd_rawmidi ecryptfs snd_hda_scodec_cs35l56 drm_suballoc_helper snd_pcm snd_hda_cirrus_scodec snd_seq parport_pc drm_ttm_helper kvm snd_soc_cs35l56_shared libarc4 hid_sensor_gyro_3d hid_sensor_prox snd_pci_acp5x ttm ppdev amd_pmf snd_seq_device snd_soc_cs_amp_lib hid_sensor_trigger snd_rn_pci_acp3x i2c_algo_bit spd5118 amdtee
[Sat May 31 06:26:50 2025] snd_timer cs_dsp drm_display_helper industrialio_triggered_buffer hp_wmi amdxdna irqbypass lp snd_acp_config kfifo_buf cfg80211 snd sparse_keymap i2c_piix4 snd_soc_acpi parport hid_sensor_iio_common rapl snd_pci_acp3x wmi_bmof cec ccp rfkill gpu_sched soundcore i2c_smbus serial_multi_instantiate wireless_hotkey platform_profile industrialio amd_pmc joydev mousedev mac_hid nfsd auth_rpcgss nfs_acl lockd grace nfs_localio nvme_fabrics nvme_keyring sunrpc nfnetlink zram zsmalloc 842_decompress 842_compress lz4hc_compress ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio hid_logitech_hidpp uas usb_storage hid_sensor_hub usbhid dm_crypt dm_mod cbc encrypted_keys trusted asn1_encoder tee raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 linear md_mod polyval_clmulni polyval_generic serio_raw ghash_clmulni_intel atkbd sha512_ssse3 libps2 sha256_ssse3 atlantic vivaldi_fmap nvme sha1_ssse3 igc macsec ucsi_acpi aesni_intel i8042 video nvme_core ptp
[Sat May 31 06:26:50 2025] crypto_simd typec_ucsi hid_multitouch hid_generic cryptd thunderbolt roles amd_sfh pps_core nvme_auth typec serio wmi i2c_hid_acpi i2c_hid
[Sat May 31 06:26:50 2025] Unloaded tainted modules: bbswitch(OE):1
[Sat May 31 06:26:50 2025] CPU: 27 UID: 0 PID: 5873 Comm: incusd Tainted: G W OE 6.14.9-x64v3-xanmod1 #0~20250529.g46d3999 7bf3c0ead03f6fbaa6484abcd6a440869a52a774
[Sat May 31 06:26:50 2025] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[Sat May 31 06:26:50 2025] Hardware name: HP HP ZBook Ultra G1a 14 inch Mobile Workstation PC/8D01, BIOS X89 Ver. 01.02.01 03/05/2025
[Sat May 31 06:26:50 2025] Sched_ext: bpfland (enabled+all), task: runnable_at=-4ms
[Sat May 31 06:26:50 2025] RIP: 0010:set_task_cpu+0x197/0x1d0
[Sat May 31 06:26:50 2025] Code: ff ff 48 81 bf 80 03 00 00 f0 b4 cb 95 0f 85 ad fe ff ff 8b 47 68 85 c0 0f 84 a2 fe ff ff 8b 47 68 83 f8 02 0f 84 96 fe ff ff <0f> 0b e9 8f fe ff ff 80 8b 84 0a 00 00 04 e9 f5 fe ff ff 0f 0b 66
[Sat May 31 06:26:50 2025] RSP: 0018:ffffd353030c3b40 EFLAGS: 00010046
[Sat May 31 06:26:50 2025] RAX: 0000000000002001 RBX: ffff8e4a09370000 RCX: 0000000000000004
[Sat May 31 06:26:50 2025] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8e4a09370000
[Sat May 31 06:26:50 2025] RBP: 0000000000000004 R08: 0000000000000004 R09: 0000000000000004
[Sat May 31 06:26:50 2025] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[Sat May 31 06:26:50 2025] R13: ffff8e4a09370dcc R14: 00000000000000a8 R15: 0000000000000287
[Sat May 31 06:26:50 2025] FS: 00007efb8d7fa6c0(0000) GS:ffff8e669e780000(0000) knlGS:0000000000000000
[Sat May 31 06:26:50 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat May 31 06:26:50 2025] CR2: 00007fa595453970 CR3: 0000000107cc9000 CR4: 0000000000f50ef0
[Sat May 31 06:26:50 2025] PKRU: 55555554
[Sat May 31 06:26:50 2025] Call Trace:
[Sat May 31 06:26:50 2025] <TASK>
[Sat May 31 06:26:50 2025] try_to_wake_up+0x1c7/0x780
[Sat May 31 06:26:50 2025] ? futex_wake+0x177/0x1a0
[Sat May 31 06:26:50 2025] wake_up_q+0x4e/0x90
[Sat May 31 06:26:50 2025] futex_wake+0x177/0x1a0
[Sat May 31 06:26:50 2025] do_futex+0x125/0x190
[Sat May 31 06:26:50 2025] __x64_sys_futex+0x127/0x1e0
[Sat May 31 06:26:50 2025] do_syscall_64+0x6d/0x140
[Sat May 31 06:26:50 2025] ? wake_up_q+0x4e/0x90
[Sat May 31 06:26:50 2025] ? futex_wake+0x177/0x1a0
[Sat May 31 06:26:50 2025] ? do_futex+0x125/0x190
[Sat May 31 06:26:50 2025] ? __x64_sys_futex+0x127/0x1e0
[Sat May 31 06:26:50 2025] ? syscall_exit_to_user_mode+0x3b/0x110
[Sat May 31 06:26:50 2025] ? do_syscall_64+0x79/0x140
[Sat May 31 06:26:50 2025] ? do_futex+0x125/0x190
[Sat May 31 06:26:50 2025] ? __x64_sys_futex+0x127/0x1e0
[Sat May 31 06:26:50 2025] ? syscall_exit_to_user_mode+0x3b/0x110
[Sat May 31 06:26:50 2025] ? do_syscall_64+0x79/0x140
[Sat May 31 06:26:50 2025] ? do_syscall_64+0x79/0x140
[Sat May 31 06:26:50 2025] ? ksys_read+0x96/0xe0
[Sat May 31 06:26:50 2025] ? syscall_exit_to_user_mode+0x3b/0x110
[Sat May 31 06:26:50 2025] ? do_syscall_64+0x79/0x140
[Sat May 31 06:26:50 2025] entry_SYSCALL_64_after_hwframe+0x6c/0x74
[Sat May 31 06:26:50 2025] RIP: 0033:0x48cc43
[Sat May 31 06:26:50 2025] Code: 24 20 c3 cc cc cc cc 48 8b 7c 24 08 8b 74 24 10 8b 54 24 14 4c 8b 54 24 18 4c 8b 44 24 20 44 8b 4c 24 28 b8 ca 00 00 00 0f 05 <89> 44 24 30 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[Sat May 31 06:26:50 2025] RSP: 002b:00007efb8d7f9868 EFLAGS: 00000202 ORIG_RAX: 00000000000000ca
[Sat May 31 06:26:50 2025] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 000000000048cc43
[Sat May 31 06:26:50 2025] RDX: 0000000000000001 RSI: 0000000000000081 RDI: 000000c000e01528
[Sat May 31 06:26:50 2025] RBP: 00007efb8d7f98b8 R08: 0000000000000000 R09: 0000000000000000
[Sat May 31 06:26:50 2025] R10: 0000000000000000 R11: 0000000000000202 R12: 00007efb8d7f98a8
[Sat May 31 06:26:50 2025] R13: 0000000000000001 R14: 000000c000e03880 R15: 0000000000000013
[Sat May 31 06:26:50 2025] </TASK>
[Sat May 31 06:26:50 2025] ---[ end trace 0000000000000000 ]---
I’ve upgraded to Ubuntu 25.04 and incus 6.12. Will upgrade to 6.13.
stgraber
(Stéphane Graber)
May 31, 2025, 10:23pm
5
[Sat May 31 06:26:50 2025] WARNING: CPU: 27 PID: 5873 at set_task_cpu+0x197/0x1d0
This implies that the hang is happening within the CPU scheduler logic.
[Sat May 31 06:26:50 2025] Sched_ext: bpfland (enabled+all), task: runnable_at=-4ms
Suggests that your system is using a userspace scheduler through schedext.
Please turn that off and try to reproduce the issue with the standard Linux scheduler.
1 Like
nils
(Nils Meyer)
June 29, 2025, 9:13am
6
This seems to indeed have been the problem, it also explains why even the go debugger doesn’t work (incus never gets scheduled on a CPU). I don’t know why this kernel ships with an external scheduler but disabling it works.