Stuck container on 4.9

Why is fuse suspected to be the culprit?

What does dmesg show?

This happened on 5.3.0-46 this morning I restarted the node to fix the issue and booted into 5.4.0-58 .

I’m asking about dmesg again because sometimes you’ll get stalling warnings which can be helpful.

FUSE is suspected because of processes stuck in D state and the shutdown log of the container showing a failure to unmount a mountpoint that was FUSE backed.

I actually went through the logs again looking for fuse mentions and found something.

Dec 16 01:36:36 procyon kernel: [1351281.744215] INFO: task rclone:53642 blocked for more than 120 seconds.
Dec 16 01:36:36 procyon kernel: [1351281.744284]       Tainted: P        W  O      5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:36:36 procyon kernel: [1351281.744340] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:36:36 procyon kernel: [1351281.744398] rclone          D    0 53642  73478 0x00000120
Dec 16 01:36:36 procyon kernel: [1351281.744403] Call Trace:
Dec 16 01:36:36 procyon kernel: [1351281.744417]  __schedule+0x2a8/0x670
Dec 16 01:36:36 procyon kernel: [1351281.744420]  schedule+0x33/0xa0
Dec 16 01:36:36 procyon kernel: [1351281.744423]  schedule_preempt_disabled+0xe/0x10
Dec 16 01:36:36 procyon kernel: [1351281.744427]  __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:36:36 procyon kernel: [1351281.744431]  __mutex_lock_slowpath+0x13/0x20
Dec 16 01:36:36 procyon kernel: [1351281.744434]  ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:36:36 procyon kernel: [1351281.744437]  mutex_lock+0x2f/0x40
Dec 16 01:36:36 procyon kernel: [1351281.744442]  fuse_lock_inode+0x29/0x40
Dec 16 01:36:36 procyon kernel: [1351281.744446]  fuse_readdir+0x243/0xf30
Dec 16 01:36:36 procyon kernel: [1351281.744452]  iterate_dir+0x9a/0x1b0
Dec 16 01:36:36 procyon kernel: [1351281.744456]  ksys_getdents64+0x9d/0x130
Dec 16 01:36:36 procyon kernel: [1351281.744462]  ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:36:36 procyon kernel: [1351281.744465]  ? filldir+0x190/0x190
Dec 16 01:36:36 procyon kernel: [1351281.744468]  __x64_sys_getdents64+0x1a/0x20
Dec 16 01:36:36 procyon kernel: [1351281.744471]  ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:36:36 procyon kernel: [1351281.744475]  do_syscall_64+0x5a/0x130
Dec 16 01:36:36 procyon kernel: [1351281.744479]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:36:36 procyon kernel: [1351281.744483] RIP: 0033:0x47c584
Dec 16 01:36:36 procyon kernel: [1351281.744491] Code: Bad RIP value.
Dec 16 01:36:36 procyon kernel: [1351281.744493] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:36:36 procyon kernel: [1351281.744497] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:36:36 procyon kernel: [1351281.744499] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:36:36 procyon kernel: [1351281.744501] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:36:36 procyon kernel: [1351281.744502] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:36:36 procyon kernel: [1351281.744504] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:38:37 procyon kernel: [1351402.577993] INFO: task rclone:53642 blocked for more than 241 seconds.
Dec 16 01:38:37 procyon kernel: [1351402.578078]       Tainted: P        W  O      5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:38:37 procyon kernel: [1351402.578147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:38:37 procyon kernel: [1351402.578218] rclone          D    0 53642  73478 0x00000120
Dec 16 01:38:37 procyon kernel: [1351402.578224] Call Trace:
Dec 16 01:38:37 procyon kernel: [1351402.578240]  __schedule+0x2a8/0x670
Dec 16 01:38:37 procyon kernel: [1351402.578244]  schedule+0x33/0xa0
Dec 16 01:38:37 procyon kernel: [1351402.578248]  schedule_preempt_disabled+0xe/0x10
Dec 16 01:38:37 procyon kernel: [1351402.578251]  __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:38:37 procyon kernel: [1351402.578257]  __mutex_lock_slowpath+0x13/0x20
Dec 16 01:38:37 procyon kernel: [1351402.578260]  ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:38:37 procyon kernel: [1351402.578263]  mutex_lock+0x2f/0x40
Dec 16 01:38:37 procyon kernel: [1351402.578269]  fuse_lock_inode+0x29/0x40
Dec 16 01:38:37 procyon kernel: [1351402.578274]  fuse_readdir+0x243/0xf30
Dec 16 01:38:37 procyon kernel: [1351402.578281]  iterate_dir+0x9a/0x1b0
Dec 16 01:38:37 procyon kernel: [1351402.578285]  ksys_getdents64+0x9d/0x130
Dec 16 01:38:37 procyon kernel: [1351402.578293]  ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:38:37 procyon kernel: [1351402.578296]  ? filldir+0x190/0x190
Dec 16 01:38:37 procyon kernel: [1351402.578301]  __x64_sys_getdents64+0x1a/0x20
Dec 16 01:38:37 procyon kernel: [1351402.578303]  ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:38:37 procyon kernel: [1351402.578308]  do_syscall_64+0x5a/0x130
Dec 16 01:38:37 procyon kernel: [1351402.578314]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:38:37 procyon kernel: [1351402.578319] RIP: 0033:0x47c584
Dec 16 01:38:37 procyon kernel: [1351402.578328] Code: Bad RIP value.
Dec 16 01:38:37 procyon kernel: [1351402.578330] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:38:37 procyon kernel: [1351402.578334] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:38:37 procyon kernel: [1351402.578337] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:38:37 procyon kernel: [1351402.578339] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:38:37 procyon kernel: [1351402.578342] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:38:37 procyon kernel: [1351402.578343] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:40:38 procyon kernel: [1351523.410858] INFO: task rclone:53642 blocked for more than 362 seconds.
Dec 16 01:40:38 procyon kernel: [1351523.410934]       Tainted: P        W  O      5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:40:38 procyon kernel: [1351523.411003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:40:38 procyon kernel: [1351523.411075] rclone          D    0 53642  73478 0x00000120
Dec 16 01:40:38 procyon kernel: [1351523.411081] Call Trace:
Dec 16 01:40:38 procyon kernel: [1351523.411096]  __schedule+0x2a8/0x670
Dec 16 01:40:38 procyon kernel: [1351523.411100]  schedule+0x33/0xa0
Dec 16 01:40:38 procyon kernel: [1351523.411104]  schedule_preempt_disabled+0xe/0x10
Dec 16 01:40:38 procyon kernel: [1351523.411108]  __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:40:38 procyon kernel: [1351523.411113]  __mutex_lock_slowpath+0x13/0x20
Dec 16 01:40:38 procyon kernel: [1351523.411117]  ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:40:38 procyon kernel: [1351523.411120]  mutex_lock+0x2f/0x40
Dec 16 01:40:38 procyon kernel: [1351523.411126]  fuse_lock_inode+0x29/0x40
Dec 16 01:40:38 procyon kernel: [1351523.411130]  fuse_readdir+0x243/0xf30
Dec 16 01:40:38 procyon kernel: [1351523.411138]  iterate_dir+0x9a/0x1b0
Dec 16 01:40:38 procyon kernel: [1351523.411143]  ksys_getdents64+0x9d/0x130
Dec 16 01:40:38 procyon kernel: [1351523.411151]  ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:40:38 procyon kernel: [1351523.411154]  ? filldir+0x190/0x190
Dec 16 01:40:38 procyon kernel: [1351523.411159]  __x64_sys_getdents64+0x1a/0x20
Dec 16 01:40:38 procyon kernel: [1351523.411162]  ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:40:38 procyon kernel: [1351523.411167]  do_syscall_64+0x5a/0x130
Dec 16 01:40:38 procyon kernel: [1351523.411173]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:40:38 procyon kernel: [1351523.411177] RIP: 0033:0x47c584
Dec 16 01:40:38 procyon kernel: [1351523.411187] Code: Bad RIP value.
Dec 16 01:40:38 procyon kernel: [1351523.411189] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:40:38 procyon kernel: [1351523.411194] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:40:38 procyon kernel: [1351523.411196] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:40:38 procyon kernel: [1351523.411198] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:40:38 procyon kernel: [1351523.411200] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:40:38 procyon kernel: [1351523.411202] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:42:39 procyon kernel: [1351644.242970] INFO: task rclone:53642 blocked for more than 483 seconds.
Dec 16 01:42:39 procyon kernel: [1351644.243081]       Tainted: P        W  O      5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:42:39 procyon kernel: [1351644.243150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:42:39 procyon kernel: [1351644.243222] rclone          D    0 53642  73478 0x00000120
Dec 16 01:42:39 procyon kernel: [1351644.243228] Call Trace:
Dec 16 01:42:39 procyon kernel: [1351644.243242]  __schedule+0x2a8/0x670
Dec 16 01:42:39 procyon kernel: [1351644.243246]  schedule+0x33/0xa0
Dec 16 01:42:39 procyon kernel: [1351644.243250]  schedule_preempt_disabled+0xe/0x10
Dec 16 01:42:39 procyon kernel: [1351644.243254]  __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:42:39 procyon kernel: [1351644.243259]  __mutex_lock_slowpath+0x13/0x20
Dec 16 01:42:39 procyon kernel: [1351644.243262]  ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:42:39 procyon kernel: [1351644.243266]  mutex_lock+0x2f/0x40
Dec 16 01:42:39 procyon kernel: [1351644.243271]  fuse_lock_inode+0x29/0x40
Dec 16 01:42:39 procyon kernel: [1351644.243275]  fuse_readdir+0x243/0xf30
Dec 16 01:42:39 procyon kernel: [1351644.243282]  iterate_dir+0x9a/0x1b0
Dec 16 01:42:39 procyon kernel: [1351644.243287]  ksys_getdents64+0x9d/0x130
Dec 16 01:42:39 procyon kernel: [1351644.243294]  ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:42:39 procyon kernel: [1351644.243297]  ? filldir+0x190/0x190
Dec 16 01:42:39 procyon kernel: [1351644.243301]  __x64_sys_getdents64+0x1a/0x20
Dec 16 01:42:39 procyon kernel: [1351644.243305]  ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:42:39 procyon kernel: [1351644.243309]  do_syscall_64+0x5a/0x130
Dec 16 01:42:39 procyon kernel: [1351644.243315]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:42:39 procyon kernel: [1351644.243319] RIP: 0033:0x47c584
Dec 16 01:42:39 procyon kernel: [1351644.243328] Code: Bad RIP value.
Dec 16 01:42:39 procyon kernel: [1351644.243330] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:42:39 procyon kernel: [1351644.243335] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:42:39 procyon kernel: [1351644.243337] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:42:39 procyon kernel: [1351644.243339] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:42:39 procyon kernel: [1351644.243341] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:42:39 procyon kernel: [1351644.243343] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:44:40 procyon kernel: [1351765.076445] INFO: task rclone:53642 blocked for more than 604 seconds.
Dec 16 01:44:40 procyon kernel: [1351765.076526]       Tainted: P        W  O      5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:44:40 procyon kernel: [1351765.076595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:44:40 procyon kernel: [1351765.076668] rclone          D    0 53642  73478 0x00000120
Dec 16 01:44:40 procyon kernel: [1351765.076674] Call Trace:
Dec 16 01:44:40 procyon kernel: [1351765.076689]  __schedule+0x2a8/0x670
Dec 16 01:44:40 procyon kernel: [1351765.076693]  schedule+0x33/0xa0
Dec 16 01:44:40 procyon kernel: [1351765.076697]  schedule_preempt_disabled+0xe/0x10
Dec 16 01:44:40 procyon kernel: [1351765.076701]  __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:44:40 procyon kernel: [1351765.076706]  __mutex_lock_slowpath+0x13/0x20
Dec 16 01:44:40 procyon kernel: [1351765.076710]  ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:44:40 procyon kernel: [1351765.076714]  mutex_lock+0x2f/0x40
Dec 16 01:44:40 procyon kernel: [1351765.076719]  fuse_lock_inode+0x29/0x40
Dec 16 01:44:40 procyon kernel: [1351765.076724]  fuse_readdir+0x243/0xf30
Dec 16 01:44:40 procyon kernel: [1351765.076731]  iterate_dir+0x9a/0x1b0
Dec 16 01:44:40 procyon kernel: [1351765.076736]  ksys_getdents64+0x9d/0x130
Dec 16 01:44:40 procyon kernel: [1351765.076744]  ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:44:40 procyon kernel: [1351765.076747]  ? filldir+0x190/0x190
Dec 16 01:44:40 procyon kernel: [1351765.076751]  __x64_sys_getdents64+0x1a/0x20
Dec 16 01:44:40 procyon kernel: [1351765.076754]  ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:44:40 procyon kernel: [1351765.076759]  do_syscall_64+0x5a/0x130
Dec 16 01:44:40 procyon kernel: [1351765.076765]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:44:40 procyon kernel: [1351765.076770] RIP: 0033:0x47c584
Dec 16 01:44:40 procyon kernel: [1351765.076779] Code: Bad RIP value.
Dec 16 01:44:40 procyon kernel: [1351765.076781] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:44:40 procyon kernel: [1351765.076786] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:44:40 procyon kernel: [1351765.076788] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:44:40 procyon kernel: [1351765.076790] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:44:40 procyon kernel: [1351765.076792] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:44:40 procyon kernel: [1351765.076794] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:46:41 procyon kernel: [1351885.909865] INFO: task rclone:53642 blocked for more than 724 seconds.
Dec 16 01:46:41 procyon kernel: [1351885.909947]       Tainted: P        W  O      5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:46:41 procyon kernel: [1351885.910017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:46:41 procyon kernel: [1351885.910090] rclone          D    0 53642  73478 0x00000120
Dec 16 01:46:41 procyon kernel: [1351885.910096] Call Trace:
Dec 16 01:46:41 procyon kernel: [1351885.910111]  __schedule+0x2a8/0x670
Dec 16 01:46:41 procyon kernel: [1351885.910115]  schedule+0x33/0xa0
Dec 16 01:46:41 procyon kernel: [1351885.910119]  schedule_preempt_disabled+0xe/0x10
Dec 16 01:46:41 procyon kernel: [1351885.910123]  __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:46:41 procyon kernel: [1351885.910128]  __mutex_lock_slowpath+0x13/0x20
Dec 16 01:46:41 procyon kernel: [1351885.910131]  ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:46:41 procyon kernel: [1351885.910134]  mutex_lock+0x2f/0x40
Dec 16 01:46:41 procyon kernel: [1351885.910140]  fuse_lock_inode+0x29/0x40
Dec 16 01:46:41 procyon kernel: [1351885.910144]  fuse_readdir+0x243/0xf30
Dec 16 01:46:41 procyon kernel: [1351885.910152]  iterate_dir+0x9a/0x1b0
Dec 16 01:46:41 procyon kernel: [1351885.910156]  ksys_getdents64+0x9d/0x130
Dec 16 01:46:41 procyon kernel: [1351885.910164]  ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:46:41 procyon kernel: [1351885.910167]  ? filldir+0x190/0x190
Dec 16 01:46:41 procyon kernel: [1351885.910172]  __x64_sys_getdents64+0x1a/0x20
Dec 16 01:46:41 procyon kernel: [1351885.910174]  ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:46:41 procyon kernel: [1351885.910179]  do_syscall_64+0x5a/0x130
Dec 16 01:46:41 procyon kernel: [1351885.910185]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:46:41 procyon kernel: [1351885.910190] RIP: 0033:0x47c584
Dec 16 01:46:41 procyon kernel: [1351885.910199] Code: Bad RIP value.
Dec 16 01:46:41 procyon kernel: [1351885.910201] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:46:41 procyon kernel: [1351885.910206] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:46:41 procyon kernel: [1351885.910208] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:46:41 procyon kernel: [1351885.910210] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:46:41 procyon kernel: [1351885.910212] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:46:41 procyon kernel: [1351885.910214] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:46:41 procyon kernel: [1351885.910221] INFO: task ls:53864 blocked for more than 120 seconds.
Dec 16 01:46:41 procyon kernel: [1351885.910281]       Tainted: P        W  O      5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:46:41 procyon kernel: [1351885.910349] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:46:41 procyon kernel: [1351885.910421] ls              D    0 53864  73478 0x00004124
Dec 16 01:46:41 procyon kernel: [1351885.910425] Call Trace:
Dec 16 01:46:41 procyon kernel: [1351885.910432]  __schedule+0x2a8/0x670
Dec 16 01:46:41 procyon kernel: [1351885.910436]  schedule+0x33/0xa0
Dec 16 01:46:41 procyon kernel: [1351885.910441]  request_wait_answer+0x137/0x210
Dec 16 01:46:41 procyon kernel: [1351885.910446]  ? wait_woken+0x80/0x80
Dec 16 01:46:41 procyon kernel: [1351885.910451]  __fuse_request_send+0x86/0x90
Dec 16 01:46:41 procyon kernel: [1351885.910455]  fuse_request_send+0x29/0x30
Dec 16 01:46:41 procyon kernel: [1351885.910458]  fuse_readdir+0x251/0xf30
Dec 16 01:46:41 procyon kernel: [1351885.910464]  iterate_dir+0x9a/0x1b0
Dec 16 01:46:41 procyon kernel: [1351885.910467]  __x64_sys_getdents+0xab/0x140
Dec 16 01:46:41 procyon kernel: [1351885.910470]  ? fillonedir+0x120/0x120
Dec 16 01:46:41 procyon kernel: [1351885.910477]  do_syscall_64+0x5a/0x130
Dec 16 01:46:41 procyon kernel: [1351885.910481]  ? do_syscall_64+0x5a/0x130
Dec 16 01:46:41 procyon kernel: [1351885.910486]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:46:41 procyon kernel: [1351885.910489] RIP: 0033:0x7f6532d28e78
Dec 16 01:46:41 procyon kernel: [1351885.910494] Code: Bad RIP value.
Dec 16 01:46:41 procyon kernel: [1351885.910496] RSP: 002b:00007ffc79a9d0c0 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
Dec 16 01:46:41 procyon kernel: [1351885.910500] RAX: ffffffffffffffda RBX: 0000564fff9d34f0 RCX: 00007f6532d28e78
Dec 16 01:46:41 procyon kernel: [1351885.910502] RDX: 0000000000008000 RSI: 0000564fff9d34f0 RDI: 0000000000000003
Dec 16 01:46:41 procyon kernel: [1351885.910504] RBP: 0000564fff9d34f0 R08: 0000000000000078 R09: 0000000000000000
Dec 16 01:46:41 procyon kernel: [1351885.910506] R10: 0000564fff9cb010 R11: 0000000000000206 R12: fffffffffffffe98
Dec 16 01:46:41 procyon kernel: [1351885.910508] R13: 0000000000000000 R14: 0000564fff9d34a0 R15: 0000564ffe620c4a
Dec 16 01:48:41 procyon kernel: [1352006.741839] INFO: task rclone:53642 blocked for more than 845 seconds.
Dec 16 01:48:41 procyon kernel: [1352006.741920]       Tainted: P        W  O      5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:48:41 procyon kernel: [1352006.741988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:48:41 procyon kernel: [1352006.742061] rclone          D    0 53642  73478 0x00000120
Dec 16 01:48:41 procyon kernel: [1352006.742066] Call Trace:
Dec 16 01:48:41 procyon kernel: [1352006.742080]  __schedule+0x2a8/0x670
Dec 16 01:48:41 procyon kernel: [1352006.742085]  schedule+0x33/0xa0
Dec 16 01:48:41 procyon kernel: [1352006.742089]  schedule_preempt_disabled+0xe/0x10
Dec 16 01:48:41 procyon kernel: [1352006.742092]  __mutex_lock.isra.9+0x26d/0x4e0
Dec 16 01:48:41 procyon kernel: [1352006.742097]  __mutex_lock_slowpath+0x13/0x20
Dec 16 01:48:41 procyon kernel: [1352006.742101]  ? __mutex_lock_slowpath+0x13/0x20
Dec 16 01:48:41 procyon kernel: [1352006.742104]  mutex_lock+0x2f/0x40
Dec 16 01:48:41 procyon kernel: [1352006.742109]  fuse_lock_inode+0x29/0x40
Dec 16 01:48:41 procyon kernel: [1352006.742113]  fuse_readdir+0x243/0xf30
Dec 16 01:48:41 procyon kernel: [1352006.742120]  iterate_dir+0x9a/0x1b0
Dec 16 01:48:41 procyon kernel: [1352006.742124]  ksys_getdents64+0x9d/0x130
Dec 16 01:48:41 procyon kernel: [1352006.742131]  ? syscall_trace_enter+0xc3/0x2a0
Dec 16 01:48:41 procyon kernel: [1352006.742134]  ? filldir+0x190/0x190
Dec 16 01:48:41 procyon kernel: [1352006.742139]  __x64_sys_getdents64+0x1a/0x20
Dec 16 01:48:41 procyon kernel: [1352006.742142]  ? __x64_sys_getdents64+0x1a/0x20
Dec 16 01:48:41 procyon kernel: [1352006.742147]  do_syscall_64+0x5a/0x130
Dec 16 01:48:41 procyon kernel: [1352006.742152]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:48:41 procyon kernel: [1352006.742157] RIP: 0033:0x47c584
Dec 16 01:48:41 procyon kernel: [1352006.742165] Code: Bad RIP value.
Dec 16 01:48:41 procyon kernel: [1352006.742168] RSP: 002b:000000c4202aea48 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Dec 16 01:48:41 procyon kernel: [1352006.742172] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000047c584
Dec 16 01:48:41 procyon kernel: [1352006.742175] RDX: 0000000000001000 RSI: 000000c4202b1000 RDI: 0000000000000003
Dec 16 01:48:41 procyon kernel: [1352006.742177] RBP: 000000c4202aea98 R08: 0000000000000000 R09: 0000000000000000
Dec 16 01:48:41 procyon kernel: [1352006.742179] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000038
Dec 16 01:48:41 procyon kernel: [1352006.742181] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000000000f5
Dec 16 01:48:41 procyon kernel: [1352006.742188] INFO: task ls:53864 blocked for more than 241 seconds.
Dec 16 01:48:41 procyon kernel: [1352006.742248]       Tainted: P        W  O      5.3.0-46-generic #38~18.04.1-Ubuntu
Dec 16 01:48:41 procyon kernel: [1352006.742315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 16 01:48:41 procyon kernel: [1352006.742385] ls              D    0 53864  73478 0x00004124
Dec 16 01:48:41 procyon kernel: [1352006.742389] Call Trace:
Dec 16 01:48:41 procyon kernel: [1352006.742397]  __schedule+0x2a8/0x670
Dec 16 01:48:41 procyon kernel: [1352006.742401]  schedule+0x33/0xa0
Dec 16 01:48:41 procyon kernel: [1352006.742405]  request_wait_answer+0x137/0x210
Dec 16 01:48:41 procyon kernel: [1352006.742411]  ? wait_woken+0x80/0x80
Dec 16 01:48:41 procyon kernel: [1352006.742415]  __fuse_request_send+0x86/0x90
Dec 16 01:48:41 procyon kernel: [1352006.742420]  fuse_request_send+0x29/0x30
Dec 16 01:48:41 procyon kernel: [1352006.742423]  fuse_readdir+0x251/0xf30
Dec 16 01:48:41 procyon kernel: [1352006.742429]  iterate_dir+0x9a/0x1b0
Dec 16 01:48:41 procyon kernel: [1352006.742433]  __x64_sys_getdents+0xab/0x140
Dec 16 01:48:41 procyon kernel: [1352006.742436]  ? fillonedir+0x120/0x120
Dec 16 01:48:41 procyon kernel: [1352006.742442]  do_syscall_64+0x5a/0x130
Dec 16 01:48:41 procyon kernel: [1352006.742446]  ? do_syscall_64+0x5a/0x130
Dec 16 01:48:41 procyon kernel: [1352006.742451]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 01:48:41 procyon kernel: [1352006.742454] RIP: 0033:0x7f6532d28e78
Dec 16 01:48:41 procyon kernel: [1352006.742459] Code: Bad RIP value.
Dec 16 01:48:41 procyon kernel: [1352006.742461] RSP: 002b:00007ffc79a9d0c0 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
Dec 16 01:48:41 procyon kernel: [1352006.742464] RAX: ffffffffffffffda RBX: 0000564fff9d34f0 RCX: 00007f6532d28e78
Dec 16 01:48:41 procyon kernel: [1352006.742467] RDX: 0000000000008000 RSI: 0000564fff9d34f0 RDI: 0000000000000003
Dec 16 01:48:41 procyon kernel: [1352006.742468] RBP: 0000564fff9d34f0 R08: 0000000000000078 R09: 0000000000000000
Dec 16 01:48:41 procyon kernel: [1352006.742471] R10: 0000564fff9cb010 R11: 0000000000000206 R12: fffffffffffffe98
Dec 16 01:48:41 procyon kernel: [1352006.742472] R13: 0000000000000000 R14: 0000564fff9d34a0 R15: 0000564ffe620c4a

Though looking at cat /proc/PID/stack for the leftover processes stuck in D state would be interesting and would confirm the FUSE hypothesis (though by the sound of it, the system was rebooted so this isn’t an option anymore :)).

I’m sure this will happen again so it’s good to have other things to explore when it does :slight_smile:

The zombie process is more interesting to me currently. :slight_smile:

What’s the fuse version you were using?

Yeah but you can’t dump the stack of a zombie process unfortunately :wink:
I guess you can dump the stack of its parent to try to see why it wouldn’t have handled SIGCHILD.

# dpkg -l | grep fuse
ii  fuse                                   2.9.7-1ubuntu1                                  amd64        Filesystem in Userspace
ii  libfuse2:amd64                         2.9.7-1ubuntu1                                  amd64        Filesystem in Userspace (library)

I guess 2.9.7 ? (This is on the host but since it’s a kernel mod this would be shared into the container, correct?)

I wonder if you can reproduce with a newer fuse, i.e. fuse 3.+.

Looking at:

1000000  68729  0.0  0.0      0     0 ?        Ss   Dec16   0:03 [systemd-shutdow]
1001001  53637  0.0  0.0      0     0 ?        Zl   Dec16   0:00  \_ [rclone] <defunct>
1001001  53864  0.0  0.0  15240   888 ?        D    Dec16   0:00  \_ ls --color=auto
1001001  71767  0.0  0.0  15240   916 ?        D    Dec16   0:00  \_ ls --color=auto
1000000  22579  0.0  0.0   8204  2024 ?        D    Dec16   1:15  \_ /usr/bin/updatedb.mlocate

This suggests rclone was the FUSE server with both ls and updatedb.mlocate being users of the filesystem it provides. Based on the systemd output, I suspect rclone got killed prior to the mount being unmounted, then for some reason FUSE didn’t forcibly yank out the mount and cause I/O error for the clients but instead got stuck into some kind of deadlock?

Sounds reasonable also notice all the deadlock scenarios listed here:

Specifically

* Abort filesystem through the FUSE control filesystem. Most powerful method, always works.

This sounds amazing, at least it would give me a way to fix this manually. I am trying to see if I can replicate the behaviour somehow with a SSHFS mount under my control.

@brauner do you think we should have liblxc clear any FUSE filesystem in the namespace through the fuse control API on shutdown or something similar?

Hm, not sure. Would that go through
/dev/fuse?
If so that might be an attack vector if the container can somehow trick
us into either making /dev/fuse something else or shutting down
something we weren’t supposed to shut down?

One idea would be to stash an fd to /dev/fuse during container startup
if fuse has been requested.
But I’m not super confident that this is a great idea.

Thoughts?

We should be able to grab the host’s /dev/fuse, attach to userns+mntns and clean things up, no?

Yeah, that’s what I hinted at in my other comment but I honestly forgot
that /dev/fuse is a single device on the system.
I honestly can’t confidently say something about this before looking at
fuse once more a little closer. It sounds ok but I want to check!

Christian