Why is fuse suspected to be the culprit?
What does dmesg
show?
Why is fuse suspected to be the culprit?
What does dmesg
show?
This happened on 5.3.0-46
this morning I restarted the node to fix the issue and booted into 5.4.0-58
.
I’m asking about dmesg
again because sometimes you’ll get stalling warnings which can be helpful.
FUSE is suspected because of processes stuck in D state and the shutdown log of the container showing a failure to unmount a mountpoint that was FUSE backed.
I actually went through the logs again looking for fuse
mentions and found something.
Dec 16 01:36:36 procyon kernel: [1351281.744215] INFO: task rclone:53642 blocked for more than 120 seconds.
Dec 16 01:36:36 procyon kernel: [1351281.744284] Tainted: P W O 5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:36:36 procyon kernel: [1351281.744340] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:36:36 procyon kernel: [1351281.744398] rclone D 0 53642 73478 0x00000120
Dec 16 01:36:36 procyon kernel: [1351281.744403] Call Trace:
Dec 16 01:36:36 procyon kernel: [1351281.744417] __schedule+0x2a8/0x670
Dec 16 01:36:36 procyon kernel: [1351281.744420] schedule+0x33/0xa0
Dec 16 01:36:36 procyon kernel: [1351281.744423] schedule_preempt_disabled+0xe/0x10
Dec 16 01:36:36 procyon kernel: [1351281.744427] __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:36:36 procyon kernel: [1351281.744431] __mutex_lock_slowpath+0x13/0x20
Dec 16 01:36:36 procyon kernel: [1351281.744434] ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:36:36 procyon kernel: [1351281.744437] mutex_lock+0x2f/0x40
Dec 16 01:36:36 procyon kernel: [1351281.744442] fuse_lock_inode+0x29/0x40
Dec 16 01:36:36 procyon kernel: [1351281.744446] fuse_readdir+0x243/0xf30
Dec 16 01:36:36 procyon kernel: [1351281.744452] iterate_dir+0x9a/0x1b0
Dec 16 01:36:36 procyon kernel: [1351281.744456] ksys_getdents64+0x9d/0x130
Dec 16 01:36:36 procyon kernel: [1351281.744462] ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:36:36 procyon kernel: [1351281.744465] ? filldir+0x190/0x190
Dec 16 01:36:36 procyon kernel: [1351281.744468] __x64_sys_getdents64+0x1a/0x20
Dec 16 01:36:36 procyon kernel: [1351281.744471] ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:36:36 procyon kernel: [1351281.744475] do_syscall_64+0x5a/0x130
Dec 16 01:36:36 procyon kernel: [1351281.744479] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:36:36 procyon kernel: [1351281.744483] RIP: 0033:0x47c584
Dec 16 01:36:36 procyon kernel: [1351281.744491] Code: Bad RIP value.
Dec 16 01:36:36 procyon kernel: [1351281.744493] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:36:36 procyon kernel: [1351281.744497] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:36:36 procyon kernel: [1351281.744499] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:36:36 procyon kernel: [1351281.744501] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:36:36 procyon kernel: [1351281.744502] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:36:36 procyon kernel: [1351281.744504] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:38:37 procyon kernel: [1351402.577993] INFO: task rclone:53642 blocked for more than 241 seconds.
Dec 16 01:38:37 procyon kernel: [1351402.578078] Tainted: P W O 5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:38:37 procyon kernel: [1351402.578147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:38:37 procyon kernel: [1351402.578218] rclone D 0 53642 73478 0x00000120
Dec 16 01:38:37 procyon kernel: [1351402.578224] Call Trace:
Dec 16 01:38:37 procyon kernel: [1351402.578240] __schedule+0x2a8/0x670
Dec 16 01:38:37 procyon kernel: [1351402.578244] schedule+0x33/0xa0
Dec 16 01:38:37 procyon kernel: [1351402.578248] schedule_preempt_disabled+0xe/0x10
Dec 16 01:38:37 procyon kernel: [1351402.578251] __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:38:37 procyon kernel: [1351402.578257] __mutex_lock_slowpath+0x13/0x20
Dec 16 01:38:37 procyon kernel: [1351402.578260] ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:38:37 procyon kernel: [1351402.578263] mutex_lock+0x2f/0x40
Dec 16 01:38:37 procyon kernel: [1351402.578269] fuse_lock_inode+0x29/0x40
Dec 16 01:38:37 procyon kernel: [1351402.578274] fuse_readdir+0x243/0xf30
Dec 16 01:38:37 procyon kernel: [1351402.578281] iterate_dir+0x9a/0x1b0
Dec 16 01:38:37 procyon kernel: [1351402.578285] ksys_getdents64+0x9d/0x130
Dec 16 01:38:37 procyon kernel: [1351402.578293] ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:38:37 procyon kernel: [1351402.578296] ? filldir+0x190/0x190
Dec 16 01:38:37 procyon kernel: [1351402.578301] __x64_sys_getdents64+0x1a/0x20
Dec 16 01:38:37 procyon kernel: [1351402.578303] ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:38:37 procyon kernel: [1351402.578308] do_syscall_64+0x5a/0x130
Dec 16 01:38:37 procyon kernel: [1351402.578314] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:38:37 procyon kernel: [1351402.578319] RIP: 0033:0x47c584
Dec 16 01:38:37 procyon kernel: [1351402.578328] Code: Bad RIP value.
Dec 16 01:38:37 procyon kernel: [1351402.578330] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:38:37 procyon kernel: [1351402.578334] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:38:37 procyon kernel: [1351402.578337] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:38:37 procyon kernel: [1351402.578339] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:38:37 procyon kernel: [1351402.578342] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:38:37 procyon kernel: [1351402.578343] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:40:38 procyon kernel: [1351523.410858] INFO: task rclone:53642 blocked for more than 362 seconds.
Dec 16 01:40:38 procyon kernel: [1351523.410934] Tainted: P W O 5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:40:38 procyon kernel: [1351523.411003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:40:38 procyon kernel: [1351523.411075] rclone D 0 53642 73478 0x00000120
Dec 16 01:40:38 procyon kernel: [1351523.411081] Call Trace:
Dec 16 01:40:38 procyon kernel: [1351523.411096] __schedule+0x2a8/0x670
Dec 16 01:40:38 procyon kernel: [1351523.411100] schedule+0x33/0xa0
Dec 16 01:40:38 procyon kernel: [1351523.411104] schedule_preempt_disabled+0xe/0x10
Dec 16 01:40:38 procyon kernel: [1351523.411108] __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:40:38 procyon kernel: [1351523.411113] __mutex_lock_slowpath+0x13/0x20
Dec 16 01:40:38 procyon kernel: [1351523.411117] ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:40:38 procyon kernel: [1351523.411120] mutex_lock+0x2f/0x40
Dec 16 01:40:38 procyon kernel: [1351523.411126] fuse_lock_inode+0x29/0x40
Dec 16 01:40:38 procyon kernel: [1351523.411130] fuse_readdir+0x243/0xf30
Dec 16 01:40:38 procyon kernel: [1351523.411138] iterate_dir+0x9a/0x1b0
Dec 16 01:40:38 procyon kernel: [1351523.411143] ksys_getdents64+0x9d/0x130
Dec 16 01:40:38 procyon kernel: [1351523.411151] ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:40:38 procyon kernel: [1351523.411154] ? filldir+0x190/0x190
Dec 16 01:40:38 procyon kernel: [1351523.411159] __x64_sys_getdents64+0x1a/0x20
Dec 16 01:40:38 procyon kernel: [1351523.411162] ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:40:38 procyon kernel: [1351523.411167] do_syscall_64+0x5a/0x130
Dec 16 01:40:38 procyon kernel: [1351523.411173] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:40:38 procyon kernel: [1351523.411177] RIP: 0033:0x47c584
Dec 16 01:40:38 procyon kernel: [1351523.411187] Code: Bad RIP value.
Dec 16 01:40:38 procyon kernel: [1351523.411189] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:40:38 procyon kernel: [1351523.411194] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:40:38 procyon kernel: [1351523.411196] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:40:38 procyon kernel: [1351523.411198] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:40:38 procyon kernel: [1351523.411200] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:40:38 procyon kernel: [1351523.411202] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:42:39 procyon kernel: [1351644.242970] INFO: task rclone:53642 blocked for more than 483 seconds.
Dec 16 01:42:39 procyon kernel: [1351644.243081] Tainted: P W O 5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:42:39 procyon kernel: [1351644.243150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:42:39 procyon kernel: [1351644.243222] rclone D 0 53642 73478 0x00000120
Dec 16 01:42:39 procyon kernel: [1351644.243228] Call Trace:
Dec 16 01:42:39 procyon kernel: [1351644.243242] __schedule+0x2a8/0x670
Dec 16 01:42:39 procyon kernel: [1351644.243246] schedule+0x33/0xa0
Dec 16 01:42:39 procyon kernel: [1351644.243250] schedule_preempt_disabled+0xe/0x10
Dec 16 01:42:39 procyon kernel: [1351644.243254] __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:42:39 procyon kernel: [1351644.243259] __mutex_lock_slowpath+0x13/0x20
Dec 16 01:42:39 procyon kernel: [1351644.243262] ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:42:39 procyon kernel: [1351644.243266] mutex_lock+0x2f/0x40
Dec 16 01:42:39 procyon kernel: [1351644.243271] fuse_lock_inode+0x29/0x40
Dec 16 01:42:39 procyon kernel: [1351644.243275] fuse_readdir+0x243/0xf30
Dec 16 01:42:39 procyon kernel: [1351644.243282] iterate_dir+0x9a/0x1b0
Dec 16 01:42:39 procyon kernel: [1351644.243287] ksys_getdents64+0x9d/0x130
Dec 16 01:42:39 procyon kernel: [1351644.243294] ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:42:39 procyon kernel: [1351644.243297] ? filldir+0x190/0x190
Dec 16 01:42:39 procyon kernel: [1351644.243301] __x64_sys_getdents64+0x1a/0x20
Dec 16 01:42:39 procyon kernel: [1351644.243305] ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:42:39 procyon kernel: [1351644.243309] do_syscall_64+0x5a/0x130
Dec 16 01:42:39 procyon kernel: [1351644.243315] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:42:39 procyon kernel: [1351644.243319] RIP: 0033:0x47c584
Dec 16 01:42:39 procyon kernel: [1351644.243328] Code: Bad RIP value.
Dec 16 01:42:39 procyon kernel: [1351644.243330] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:42:39 procyon kernel: [1351644.243335] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:42:39 procyon kernel: [1351644.243337] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:42:39 procyon kernel: [1351644.243339] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:42:39 procyon kernel: [1351644.243341] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:42:39 procyon kernel: [1351644.243343] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:44:40 procyon kernel: [1351765.076445] INFO: task rclone:53642 blocked for more than 604 seconds.
Dec 16 01:44:40 procyon kernel: [1351765.076526] Tainted: P W O 5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:44:40 procyon kernel: [1351765.076595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:44:40 procyon kernel: [1351765.076668] rclone D 0 53642 73478 0x00000120
Dec 16 01:44:40 procyon kernel: [1351765.076674] Call Trace:
Dec 16 01:44:40 procyon kernel: [1351765.076689] __schedule+0x2a8/0x670
Dec 16 01:44:40 procyon kernel: [1351765.076693] schedule+0x33/0xa0
Dec 16 01:44:40 procyon kernel: [1351765.076697] schedule_preempt_disabled+0xe/0x10
Dec 16 01:44:40 procyon kernel: [1351765.076701] __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:44:40 procyon kernel: [1351765.076706] __mutex_lock_slowpath+0x13/0x20
Dec 16 01:44:40 procyon kernel: [1351765.076710] ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:44:40 procyon kernel: [1351765.076714] mutex_lock+0x2f/0x40
Dec 16 01:44:40 procyon kernel: [1351765.076719] fuse_lock_inode+0x29/0x40
Dec 16 01:44:40 procyon kernel: [1351765.076724] fuse_readdir+0x243/0xf30
Dec 16 01:44:40 procyon kernel: [1351765.076731] iterate_dir+0x9a/0x1b0
Dec 16 01:44:40 procyon kernel: [1351765.076736] ksys_getdents64+0x9d/0x130
Dec 16 01:44:40 procyon kernel: [1351765.076744] ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:44:40 procyon kernel: [1351765.076747] ? filldir+0x190/0x190
Dec 16 01:44:40 procyon kernel: [1351765.076751] __x64_sys_getdents64+0x1a/0x20
Dec 16 01:44:40 procyon kernel: [1351765.076754] ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:44:40 procyon kernel: [1351765.076759] do_syscall_64+0x5a/0x130
Dec 16 01:44:40 procyon kernel: [1351765.076765] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:44:40 procyon kernel: [1351765.076770] RIP: 0033:0x47c584
Dec 16 01:44:40 procyon kernel: [1351765.076779] Code: Bad RIP value.
Dec 16 01:44:40 procyon kernel: [1351765.076781] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:44:40 procyon kernel: [1351765.076786] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:44:40 procyon kernel: [1351765.076788] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:44:40 procyon kernel: [1351765.076790] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:44:40 procyon kernel: [1351765.076792] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:44:40 procyon kernel: [1351765.076794] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:46:41 procyon kernel: [1351885.909865] INFO: task rclone:53642 blocked for more than 724 seconds.
Dec 16 01:46:41 procyon kernel: [1351885.909947] Tainted: P W O 5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:46:41 procyon kernel: [1351885.910017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:46:41 procyon kernel: [1351885.910090] rclone D 0 53642 73478 0x00000120
Dec 16 01:46:41 procyon kernel: [1351885.910096] Call Trace:
Dec 16 01:46:41 procyon kernel: [1351885.910111] __schedule+0x2a8/0x670
Dec 16 01:46:41 procyon kernel: [1351885.910115] schedule+0x33/0xa0
Dec 16 01:46:41 procyon kernel: [1351885.910119] schedule_preempt_disabled+0xe/0x10
Dec 16 01:46:41 procyon kernel: [1351885.910123] __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:46:41 procyon kernel: [1351885.910128] __mutex_lock_slowpath+0x13/0x20
Dec 16 01:46:41 procyon kernel: [1351885.910131] ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:46:41 procyon kernel: [1351885.910134] mutex_lock+0x2f/0x40
Dec 16 01:46:41 procyon kernel: [1351885.910140] fuse_lock_inode+0x29/0x40
Dec 16 01:46:41 procyon kernel: [1351885.910144] fuse_readdir+0x243/0xf30
Dec 16 01:46:41 procyon kernel: [1351885.910152] iterate_dir+0x9a/0x1b0
Dec 16 01:46:41 procyon kernel: [1351885.910156] ksys_getdents64+0x9d/0x130
Dec 16 01:46:41 procyon kernel: [1351885.910164] ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:46:41 procyon kernel: [1351885.910167] ? filldir+0x190/0x190
Dec 16 01:46:41 procyon kernel: [1351885.910172] __x64_sys_getdents64+0x1a/0x20
Dec 16 01:46:41 procyon kernel: [1351885.910174] ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:46:41 procyon kernel: [1351885.910179] do_syscall_64+0x5a/0x130
Dec 16 01:46:41 procyon kernel: [1351885.910185] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:46:41 procyon kernel: [1351885.910190] RIP: 0033:0x47c584
Dec 16 01:46:41 procyon kernel: [1351885.910199] Code: Bad RIP value.
Dec 16 01:46:41 procyon kernel: [1351885.910201] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:46:41 procyon kernel: [1351885.910206] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:46:41 procyon kernel: [1351885.910208] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:46:41 procyon kernel: [1351885.910210] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:46:41 procyon kernel: [1351885.910212] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:46:41 procyon kernel: [1351885.910214] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:46:41 procyon kernel: [1351885.910221] INFO: task ls:53864 blocked for more than 120 seconds.
Dec 16 01:46:41 procyon kernel: [1351885.910281] Tainted: P W O 5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:46:41 procyon kernel: [1351885.910349] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:46:41 procyon kernel: [1351885.910421] ls D 0 53864 73478 0x00004124
Dec 16 01:46:41 procyon kernel: [1351885.910425] Call Trace:
Dec 16 01:46:41 procyon kernel: [1351885.910432] __schedule+0x2a8/0x670
Dec 16 01:46:41 procyon kernel: [1351885.910436] schedule+0x33/0xa0
Dec 16 01:46:41 procyon kernel: [1351885.910441] request_wait_answer+0x137/0x210
Dec 16 01:46:41 procyon kernel: [1351885.910446] ? wait_woken+0x80/0x80
Dec 16 01:46:41 procyon kernel: [1351885.910451] __fuse_request_send+0x86/0x90
Dec 16 01:46:41 procyon kernel: [1351885.910455] fuse_request_send+0x29/0x30
Dec 16 01:46:41 procyon kernel: [1351885.910458] fuse_readdir+0x251/0xf30
Dec 16 01:46:41 procyon kernel: [1351885.910464] iterate_dir+0x9a/0x1b0
Dec 16 01:46:41 procyon kernel: [1351885.910467] __x64_sys_getdents+0xab/0x140
Dec 16 01:46:41 procyon kernel: [1351885.910470] ? fillonedir+0x120/0x120
Dec 16 01:46:41 procyon kernel: [1351885.910477] do_syscall_64+0x5a/0x130
Dec 16 01:46:41 procyon kernel: [1351885.910481] ? do_syscall_64+0x5a/0x130
Dec 16 01:46:41 procyon kernel: [1351885.910486] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:46:41 procyon kernel: [1351885.910489] RIP: 0033:0x7f6532d28e78
Dec 16 01:46:41 procyon kernel: [1351885.910494] Code: Bad RIP value.
Dec 16 01:46:41 procyon kernel: [1351885.910496] RSP: 002b:00007ffc79a9d0c0 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
Dec 16 01:46:41 procyon kernel: [1351885.910500] RAX: ffffffffffffffda RBX: 0000564fff9d34f0 RCX: 00007f6532d28e78
Dec 16 01:46:41 procyon kernel: [1351885.910502] RDX: 0000000000008000 RSI: 0000564fff9d34f0 RDI: 0000000000000003
Dec 16 01:46:41 procyon kernel: [1351885.910504] RBP: 0000564fff9d34f0 R08: 0000000000000078 R09: 0000000000000000
Dec 16 01:46:41 procyon kernel: [1351885.910506] R10: 0000564fff9cb010 R11: 0000000000000206 R12: fffffffffffffe98
Dec 16 01:46:41 procyon kernel: [1351885.910508] R13: 0000000000000000 R14: 0000564fff9d34a0 R15: 0000564ffe620c4a
Dec 16 01:48:41 procyon kernel: [1352006.741839] INFO: task rclone:53642 blocked for more than 845 seconds.
Dec 16 01:48:41 procyon kernel: [1352006.741920] Tainted: P W O 5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:48:41 procyon kernel: [1352006.741988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:48:41 procyon kernel: [1352006.742061] rclone D 0 53642 73478 0x00000120
Dec 16 01:48:41 procyon kernel: [1352006.742066] Call Trace:
Dec 16 01:48:41 procyon kernel: [1352006.742080] __schedule+0x2a8/0x670
Dec 16 01:48:41 procyon kernel: [1352006.742085] schedule+0x33/0xa0
Dec 16 01:48:41 procyon kernel: [1352006.742089] schedule_preempt_disabled+0xe/0x10
Dec 16 01:48:41 procyon kernel: [1352006.742092] __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:48:41 procyon kernel: [1352006.742097] __mutex_lock_slowpath+0x13/0x20
Dec 16 01:48:41 procyon kernel: [1352006.742101] ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:48:41 procyon kernel: [1352006.742104] mutex_lock+0x2f/0x40
Dec 16 01:48:41 procyon kernel: [1352006.742109] fuse_lock_inode+0x29/0x40
Dec 16 01:48:41 procyon kernel: [1352006.742113] fuse_readdir+0x243/0xf30
Dec 16 01:48:41 procyon kernel: [1352006.742120] iterate_dir+0x9a/0x1b0
Dec 16 01:48:41 procyon kernel: [1352006.742124] ksys_getdents64+0x9d/0x130
Dec 16 01:48:41 procyon kernel: [1352006.742131] ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:48:41 procyon kernel: [1352006.742134] ? filldir+0x190/0x190
Dec 16 01:48:41 procyon kernel: [1352006.742139] __x64_sys_getdents64+0x1a/0x20
Dec 16 01:48:41 procyon kernel: [1352006.742142] ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:48:41 procyon kernel: [1352006.742147] do_syscall_64+0x5a/0x130
Dec 16 01:48:41 procyon kernel: [1352006.742152] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:48:41 procyon kernel: [1352006.742157] RIP: 0033:0x47c584
Dec 16 01:48:41 procyon kernel: [1352006.742165] Code: Bad RIP value.
Dec 16 01:48:41 procyon kernel: [1352006.742168] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:48:41 procyon kernel: [1352006.742172] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:48:41 procyon kernel: [1352006.742175] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:48:41 procyon kernel: [1352006.742177] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:48:41 procyon kernel: [1352006.742179] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:48:41 procyon kernel: [1352006.742181] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:48:41 procyon kernel: [1352006.742188] INFO: task ls:53864 blocked for more than 241 seconds.
Dec 16 01:48:41 procyon kernel: [1352006.742248] Tainted: P W O 5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:48:41 procyon kernel: [1352006.742315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:48:41 procyon kernel: [1352006.742385] ls D 0 53864 73478 0x00004124
Dec 16 01:48:41 procyon kernel: [1352006.742389] Call Trace:
Dec 16 01:48:41 procyon kernel: [1352006.742397] __schedule+0x2a8/0x670
Dec 16 01:48:41 procyon kernel: [1352006.742401] schedule+0x33/0xa0
Dec 16 01:48:41 procyon kernel: [1352006.742405] request_wait_answer+0x137/0x210
Dec 16 01:48:41 procyon kernel: [1352006.742411] ? wait_woken+0x80/0x80
Dec 16 01:48:41 procyon kernel: [1352006.742415] __fuse_request_send+0x86/0x90
Dec 16 01:48:41 procyon kernel: [1352006.742420] fuse_request_send+0x29/0x30
Dec 16 01:48:41 procyon kernel: [1352006.742423] fuse_readdir+0x251/0xf30
Dec 16 01:48:41 procyon kernel: [1352006.742429] iterate_dir+0x9a/0x1b0
Dec 16 01:48:41 procyon kernel: [1352006.742433] __x64_sys_getdents+0xab/0x140
Dec 16 01:48:41 procyon kernel: [1352006.742436] ? fillonedir+0x120/0x120
Dec 16 01:48:41 procyon kernel: [1352006.742442] do_syscall_64+0x5a/0x130
Dec 16 01:48:41 procyon kernel: [1352006.742446] ? do_syscall_64+0x5a/0x130
Dec 16 01:48:41 procyon kernel: [1352006.742451] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:48:41 procyon kernel: [1352006.742454] RIP: 0033:0x7f6532d28e78
Dec 16 01:48:41 procyon kernel: [1352006.742459] Code: Bad RIP value.
Dec 16 01:48:41 procyon kernel: [1352006.742461] RSP: 002b:00007ffc79a9d0c0 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
Dec 16 01:48:41 procyon kernel: [1352006.742464] RAX: ffffffffffffffda RBX: 0000564fff9d34f0 RCX: 00007f6532d28e78
Dec 16 01:48:41 procyon kernel: [1352006.742467] RDX: 0000000000008000 RSI: 0000564fff9d34f0 RDI: 0000000000000003
Dec 16 01:48:41 procyon kernel: [1352006.742468] RBP: 0000564fff9d34f0 R08: 0000000000000078 R09: 0000000000000000
Dec 16 01:48:41 procyon kernel: [1352006.742471] R10: 0000564fff9cb010 R11: 0000000000000206 R12: fffffffffffffe98
Dec 16 01:48:41 procyon kernel: [1352006.742472] R13: 0000000000000000 R14: 0000564fff9d34a0 R15: 0000564ffe620c4a
Though looking at cat /proc/PID/stack
for the leftover processes stuck in D
state would be interesting and would confirm the FUSE hypothesis (though by the sound of it, the system was rebooted so this isn’t an option anymore :)).
I’m sure this will happen again so it’s good to have other things to explore when it does
The zombie process is more interesting to me currently.
What’s the fuse version you were using?
Yeah but you can’t dump the stack of a zombie process unfortunately
I guess you can dump the stack of its parent to try to see why it wouldn’t have handled SIGCHILD.
# dpkg -l | grep fuse
ii fuse 2.9.7-1ubuntu1 amd64 Filesystem in Userspace
ii libfuse2:amd64 2.9.7-1ubuntu1 amd64 Filesystem in Userspace (library)
I guess 2.9.7 ? (This is on the host but since it’s a kernel mod this would be shared into the container, correct?)
I wonder if you can reproduce with a newer fuse, i.e. fuse 3.+.
Looking at:
1000000 68729 0.0 0.0 0 0 ? Ss Dec16 0:03 [systemd-shutdow]
1001001 53637 0.0 0.0 0 0 ? Zl Dec16 0:00 \_ [rclone] <defunct>
1001001 53864 0.0 0.0 15240 888 ? D Dec16 0:00 \_ ls --color=auto
1001001 71767 0.0 0.0 15240 916 ? D Dec16 0:00 \_ ls --color=auto
1000000 22579 0.0 0.0 8204 2024 ? D Dec16 1:15 \_ /usr/bin/updatedb.mlocate
This suggests rclone
was the FUSE server with both ls
and updatedb.mlocate
being users of the filesystem it provides. Based on the systemd output, I suspect rclone
got killed prior to the mount being unmounted, then for some reason FUSE didn’t forcibly yank out the mount and cause I/O error for the clients but instead got stuck into some kind of deadlock?
Sounds reasonable also notice all the deadlock scenarios listed here:
Specifically
* Abort filesystem through the FUSE control filesystem. Most powerful method, always works.
This sounds amazing, at least it would give me a way to fix this manually. I am trying to see if I can replicate the behaviour somehow with a SSHFS mount under my control.
@brauner do you think we should have liblxc clear any FUSE filesystem in the namespace through the fuse control API on shutdown or something similar?
Hm, not sure. Would that go through
/dev/fuse?
If so that might be an attack vector if the container can somehow trick
us into either making /dev/fuse something else or shutting down
something we weren’t supposed to shut down?
One idea would be to stash an fd to /dev/fuse during container startup
if fuse has been requested.
But I’m not super confident that this is a great idea.
Thoughts?
We should be able to grab the host’s /dev/fuse, attach to userns+mntns and clean things up, no?
Yeah, that’s what I hinted at in my other comment but I honestly forgot
that /dev/fuse is a single device on the system.
I honestly can’t confidently say something about this before looking at
fuse once more a little closer. It sounds ok but I want to check!
Christian