Namespace symlinks gone

johnm · April 26, 2018, 3:56pm

An LXC container (under 2.1) that was initially running fine has ended up with some namespace problems.

‘lxc-ls -f’ shows:

# lxc-ls -f
The configuration file contains legacy configuration keys.
Please update your configuration file!
lxc-ls: utils.c: switch_to_ns: 1119 No such file or directory - failed to open /proc/128562/ns/net
lxc-ls: lxccontainer.c: do_lxcapi_get_interfaces: 2039 No such file or directory - failed to enter namespace
lxc-ls: utils.c: switch_to_ns: 1119 No such file or directory - failed to open /proc/128562/ns/net
lxc-ls: lxccontainer.c: do_lxcapi_get_ips: 2131 No such file or directory - failed to enter namespace
lxc-ls: utils.c: switch_to_ns: 1119 No such file or directory - failed to open /proc/128562/ns/net
lxc-ls: lxccontainer.c: do_lxcapi_get_ips: 2131 No such file or directory - failed to enter namespace
NAME       STATE   AUTOSTART GROUPS IPV4 IPV6 
2619278-12 RUNNING 0         -      -    -

A list of the related ns dir shows:

# ls -l /proc/128562/ns
ls: cannot read symbolic link '/proc/128562/ns/net': No such file or directory
ls: cannot read symbolic link '/proc/128562/ns/uts': No such file or directory
ls: cannot read symbolic link '/proc/128562/ns/ipc': No such file or directory
ls: cannot read symbolic link '/proc/128562/ns/mnt': No such file or directory
ls: cannot read symbolic link '/proc/128562/ns/cgroup': No such file or directory
total 0
lrwxrwxrwx 1 root root 0 Apr 24 13:32 cgroup
lrwxrwxrwx 1 root root 0 Apr 24 13:32 ipc
lrwxrwxrwx 1 root root 0 Apr 24 13:32 mnt
lrwxrwxrwx 1 root root 0 Apr 24 13:32 net
lrwxrwxrwx 1 root root 0 Apr 24 13:32 pid -> pid:[4026535758]
lrwxrwxrwx 1 root root 0 Apr 26 15:35 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Apr 24 13:32 uts

lxc-stop has no effect and I am suspicious that this situation occurred because of an attempted lxc-stop call.

It would be helpful to know why this is happening, but more importantly, what options do I have to clean this up? Am I forced to reboot?

Thanks

brauner · April 27, 2018, 11:05am

Assuming that 128562 is the PID of the container you should simply get away with lxc-stop -k.

johnm · April 27, 2018, 1:34pm

strace of lxc-stop -k hangs at:

stat("/var/lib/lxc/2619276-11/partial", 0x7ffccbdfa840) = -1 ENOENT (No such file or directory)
socket(PF_LOCAL, SOCK_STREAM, 0)        = 5
connect(5, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/2619276-11/command"}, 34) = 0
getuid()                                = 0
getgid()                                = 0
sendmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16}], msg_controllen=32, [{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, {pid=45867, uid=0, gid=0}}], msg_flags=0}, MSG_NOSIGNAL) = 16
recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\0\0\0\0\0\313\316\1\0\0\0\0\0", 16}], msg_controllen=0, msg_flags=0}, 0) = 16
close(5)                                = 0
socket(PF_LOCAL, SOCK_STREAM, 0)        = 5
connect(5, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/2619276-11/command"}, 34) = 0
getuid()                                = 0
getgid()                                = 0
sendmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\6\0\0\0\10\0\0\0p\250\337\313\374\177\0\0", 16}], msg_controllen=32, [{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, {pid=45867, uid=0, gid=0}}], msg_flags=0}, MSG_NOSIGNAL) = 16
sendto(5, "freezer\0", 8, MSG_NOSIGNAL, NULL, 0) = 8
recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\0\35\0\0\0\324\304\340\327\23V\0\0", 16}], msg_controllen=0, msg_flags=0}, 0) = 16
recvfrom(5, "//jobs/2619276-11/2619276-11\0", 29, 0, NULL, NULL) = 29
close(5)                                = 0
open("/sys/fs/cgroup/blkio//jobs/2619276-11/2619276-11/freezer.state", O_RDONLY|O_CLOEXEC) = 5
read(5, "THAWED\n", 100)                = 7
close(5)                                = 0
socket(PF_LOCAL, SOCK_STREAM, 0)        = 5
connect(5, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/2619276-11/command"}, 34) = 0
getuid()                                = 0
getgid()                                = 0
sendmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16}], msg_controllen=32, [{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, {pid=45867, uid=0, gid=0}}], msg_flags=0}, MSG_NOSIGNAL) = 16
recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0", 16}], msg_controllen=0, msg_flags=0}, 0) = 16
close(5)                                = 0
socket(PF_LOCAL, SOCK_STREAM, 0)        = 5
connect(5, {sa_family=AF_LOCAL, sun_path=@"/var/lib/lxc/2619276-11/command"}, 34) = 0
getuid()                                = 0
getgid()                                = 0
sendmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16}], msg_controllen=32, [{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, {pid=45867, uid=0, gid=0}}], msg_flags=0}, MSG_NOSIGNAL) = 16
recvmsg(5,

Some of the processes in the container cgroup are in D state, so they cannot be killed. But the ns of those processes is an empty symlink.

To clean up (somewhat)

moved those processes to the parent cgroup leaving the cgroup used by the container empty of processes
killed the lxc_monitor (seems to be required for ‘lxc-stop -k’ to proceed)
‘lxc-stop -k’
rmdir the container cgroup

I am left with the processes in D state. Which I guess is my problem and related to something else.

brauner · April 27, 2018, 1:45pm

Oh, ok. Can you please provide the following information?:

uname -a
cat /proc/<container-init-pid>/stack
ps auxf
dmesg

johnm · April 27, 2018, 7:49pm

Is there somewhere I can send this other than posting all in the forum?

stgraber · April 27, 2018, 9:02pm

You could private message @brauner or e-mail to christian.brauner at ubuntu dot com

johnm · April 27, 2018, 9:24pm

Sent by email.

Thanks

johnm · May 1, 2018, 6:23pm

So we’ve had this issue recur a bunch of times (affecting > 100 nodes).

It seems that as a result of lxc-stop, the top-level process for the container (rc.local in my case) ends up with some of the symlinks under /proc//ns being unset and a process under it spinning. I’m not able to strace it, kill it, or freeze the cgroup containing it. I can only move it out of the cgroup but that doesn’t really resolve anything.

Is there something you can suggest I do? Is it possible that there is a race in how things are being shutdown (at some level) so that a process gets orphaned and becomes inaccessible?

brauner · May 2, 2018, 10:40am

I think the broken symlinks are just a red herring. Is it possible for you to upgrade to a HWE kernel and see if you can reproduce the issue? This smells like a regression in 4.4-119.

brauner · May 2, 2018, 10:45am

Is this located on NFS?

brauner · May 2, 2018, 10:58am

I think the broken symlinks are just a red herring. Is it possible for you to upgrade to a HWE kernel and see if you can reproduce the issue? This smells like a regression in 4.4-119.

Actually, don’t upgrade to a HWE kernel, just try to get your hands on the new stable kernel for Xenial. Should be something like 4.4.0.121.127.

johnm · May 2, 2018, 1:55pm

We will upgrade to the latest in the repo 4.4.0-122-generic.

johnm · May 2, 2018, 1:56pm

The images are on nfs with a local overlay. If that is what you are asking.

brauner · May 2, 2018, 2:02pm

This very much looks either like an NFS bug or a generic VFS bug that might have already been fixed. Let’s see what your kernel update says.

johnm · May 16, 2018, 9:53pm

Here’s a bit of an update.

After upgrading to:
Linux ib9-bc1oo31-be05 4.4.0-122-generic #146-Ubuntu SMP Mon Apr 23 15:34:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

we are still having the same problem with the main process (pid 1) in the container having partial namespace settings (i.e., some are set, others are nothing/empty, as shown in initial post).

From “ps auxf”:

root      33125  0.0  0.0      0     0 ?        Ss   11:23   0:00 [rc.local]
root      33187 99.0  0.0  90460     4 ?        RNs  11:23 597:23  \_ /usr/sbin/nscd

the rc.local is/was the main process of the container and has the partial namespace settings. The nscd is still running in the cgroup (the only pid in the tasks file). It is always R (running) state and does not seem to be stuck in the kernel. But I am not able to kill it. I am, however, able to fiddle with cpu.cfs_* settings and cpuset.cpus to effectively eliminate any cpu time for the process.

As a shot in the dark, we mount all our cgroup controllers under a single mount point. Could this be related in any way?

I may try with lxc 3 to see if I get any different results. But if it is indeed unrelated to LXC, I am not hopeful.

johnm · July 30, 2018, 10:53am

So, we’ve decided to go back to v2.0.8.

Different kernels had no impact.