Stuck container on 4.9

I have a container that won’t stop, be deleted and seems to be permanently stuck in shutting down mode. lxc stop (-f) does not work, neither does lxc delete -f.

It seems some FUSE magic was happening inside the container and it can’t gracefully shut this container down.

[FAILED] Failed unmounting /home/anon/mnt/drive.
[  OK  ] Stopped Apply Kernel Variables.
[  OK  ] Stopped target Network (Pre).
[  OK  ] Stopped Initial cloud-init job (pre-networking).
[  OK  ] Unmounted /var/lib/lxcfs.
[  OK  ] Unmounted /run/user/1001.
[  OK  ] Stopped target Swap.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped target Local File Systems (Pre).
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Reached target Shutdown.
[  OK  ] Reached target Final Step.
         Starting Halt...

It appears to be stopped but the container is still listed as running and I see these processes when using lsof on the host.

updatedb. 22579                1000000  cwd   unknown                                            /home/anon/mnt/drive (stat: Permission denied)
updatedb. 22579                1000000   10r      DIR             252,48         4096     528936 /home/anon
updatedb. 22579                1000000   11r      DIR             252,48         4096     658512 /home/anon/mnt
updatedb. 22579                1000000   12r  unknown                                            /home/anon/mnt/drive (stat: Permission denied)
rclone    53637 53642          1001001  cwd       DIR             252,48         4096     658512 /home/anon/mnt
rclone    53637 53642          1001001    3r  unknown                                            /home/anon/mnt/drive (stat: Permission denied)
ls        53864                1001001  cwd   unknown                                            /home/anon/mnt/drive (stat: Permission denied)
ls        53864                1001001    3r  unknown                                            /home/anon/mnt/drive (stat: Permission denied)
ls        71767                1001001  cwd   unknown                                            /home/anon/mnt/drive (stat: Permission denied)
ls        71767                1001001    3r  unknown                                            /home/anon/mnt/drive (stat: Permission denied)

The owner of most these processes is

ps -f 68729
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
1000000  68729 68707  0 00:54 ?        Ss     0:03 [systemd-shutdow]

Which is owned by

ps -f 68707
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
root     68707     1  0 00:54 ?        Ss     0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers anon

Does anybody have any idea, besides restarting the whole node, how to delete this container and/or get it unstuck?

Can you show output of lxc info <instance> --show-log?

Also is there anything in /var/snap/lxd/common/lxd/logs/lxd.log that indicates an issue shutting down the container?

The whole instance log is 23MB, I can upload it somewhere.

I looked closer in lxd.log and I do see messages like these that are new to me.

109426:t=2020-12-16T16:40:23+0100 lvl=eror msg="Failed to retrieve network information via netlink" instance=anon instanceType=container pid=68729 project=default
109427:t=2020-12-16T16:40:23+0100 lvl=eror msg="Error calling 'lxd forknet" err="Failed to run: /snap/lxd/current/bin/lxd forknet info -- 68729 -1: Failed setns to container network namespace: No such file or directory" instance=anon instanceType=container pid=68729 project=default

Looking further for more.

Have you tried killing the processes holding the container shutdown process open?

Yup all of them, even with -9.

Anything in dmesg, it suggests a mount is hanging (specifically /home/anon/mnt/drive - what is that?)

I think one of my customers is running Rclone which can be used to mount your Google Drive as a mount point.

Some apparmor messages in dmesg about the container but nothing about this path.

I think this is where the issues started.

85681:t=2020-12-16T13:32:03+0100 lvl=info msg="Shutting down container" action=shutdown created=2020-12-07T15:26:13+0100 ephemeral=false instance=anon instanceType=container project=default timeout=-1s used=2020-12-16T00:54:31+0100
85682:t=2020-12-16T13:32:03+0100 lvl=dbug msg="\n\t{\n\t\t\"type\": \"async\",\n\t\t\"status\": \"Operation created\",\n\t\t\"status_code\": 100,\n\t\t\"operation\": \"/1.0/operations/9c2620dc-de79-4a08-a1e7-b748ed642124\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\": {\n\t\t\t\"id\": \"9c2620dc-de79-4a08-a1e7-b748ed642124\",\n\t\t\t\"class\": \"task\",\n\t\t\t\"description\": \"Stopping container\",\n\t\t\t\"created_at\": \"2020-12-16T13:32:03.369306263+01:00\",\n\t\t\t\"updated_at\": \"2020-12-16T13:32:03.369306263+01:00\",\n\t\t\t\"status\": \"Running\",\n\t\t\t\"status_code\": 103,\n\t\t\t\"resources\": {\n\t\t\t\t\"containers\": [\n\t\t\t\t\t\"/1.0/containers/anon\"\n\t\t\t\t]\n\t\t\t},\n\t\t\t\"metadata\": null,\n\t\t\t\"may_cancel\": false,\n\t\t\t\"err\": \"\",\n\t\t\t\"location\": \"procyon\"\n\t\t}\n\t}"
85804:t=2020-12-16T13:35:04+0100 lvl=dbug msg="Instance process stopped" PID=68285 instance=anon
85805:t=2020-12-16T13:35:04+0100 lvl=dbug msg="Got error getting next reader" PID=68285 err="read unix /var/snap/lxd/common/lxd/unix.socket->@: use of closed network connection" instance=anon
85806:t=2020-12-16T13:35:04+0100 lvl=dbug msg="Interactive child process handler finished" PID=68285 instance=anon
85810:t=2020-12-16T13:35:04+0100 lvl=dbug msg="Finished mirroring websocket" PID=68285 instance=anon

Is that using fuse, if so can you kill the fuse process, that may release things.

Those are normal, just suggests that an interactive shell was disconnected as the container was shutdown.

Yeah it uses Fuse, sadly that process is in Zombie state so it can’t be killed further.

UID        PID  PPID  C STIME TTY      STAT   TIME CMD
1001001  53637 68729  0 01:34 ?        Zl     0:00 [rclone] <defunct>

Also killing the parent does nothing.

@stgraber @brauner any ideas on releasing a hung fuse mount?

Can you show a full ps fauxww?

I rather not, there might be a lot of customer information in there. If it’s really needed I will first edit out all customer info but it will be a bit of a job.

root     68707  0.0  0.0 1605888 16360 ?       Ss   00:54   0:00 [lxc monitor] /var/snap/lxd/common/lxd/containers anon
1000000  68729  0.0  0.0      0     0 ?        Ss   00:54   0:03  \_ [systemd-shutdow]
1001001  53637  0.0  0.0      0     0 ?        Zl   01:34   0:00      \_ [rclone] <defunct>
1001001  53864  0.0  0.0  15240   888 ?        D    01:34   0:00      \_ ls --color=auto
1001001  71767  0.0  0.0  15240   916 ?        D    01:45   0:00      \_ ls --color=auto
1000000  22579  0.1  0.0   8204  2024 ?        D    06:25   1:15      \_ /usr/bin/updatedb.mlocate

This is the tree for the “stuck” part.

nsenter -t 68729 -m should let you get into the mount namespace. I’m not sure that it will be particularly helpful though as the fuse filesystem looks gone already.

Can you maybe check for /sys/fs/cgroup/freezer/lxc.payload.anon/freezer.state to see if part of the killing logic may have frozen the container causing part of this issue?

root@procyon:~# nsenter -t 68729 -m
nsenter: cannot open /proc/68729/ns/mnt: No such file or directory

Not sure that was expected?

 cat /sys/fs/cgroup/freezer/lxc.payload.anon/freezer.state
THAWED

I’m unsure what this means, I’m unfamiliar with the freezer subgroup.

It means it’s not frozen, so that’s not why things aren’t getting unblocked…

This is starting to feel more and more like a kernel bug unfortunately.

@brauner thoughts?

So I’m unsure what to blame here. My hope was that regardless of what a customer did in the container it wouldn’t effect other containers. However it now seems that getting a messed up FUSE mount can do just that since it looks like I will have to reboot the node.

When killing the lxc monitor process I at least could start the container up again but I’m unsure whether this is a good idea since there are still open file descriptors of all the processes that are in D or Z state. They still exist, I also can’t unmount the rbd device and thus not delete the container still.

In the end it seems like systemd might be to blame for waiting indefinitely, should I raise this issue with them?

1000000  68729  0.0  0.0      0     0 ?        Ss   Dec16   0:03 [systemd-shutdow]
1001001  53637  0.0  0.0      0     0 ?        Zl   Dec16   0:00  \_ [rclone] <defunct>
1001001  53864  0.0  0.0  15240   888 ?        D    Dec16   0:00  \_ ls --color=auto
1001001  71767  0.0  0.0  15240   916 ?        D    Dec16   0:00  \_ ls --color=auto
1000000  22579  0.0  0.0   8204  2024 ?        D    Dec16   1:15  \_ /usr/bin/updatedb.mlocate

What kernel are you using?

@brauner any idea? I know there are fuse and freezer interactions that can lead to deadlocks but this isn’t it and feels like a potential kernel bug which could maybe turn into a security bug.