Lxc-attach for unprivileged containers not working after 24/48 hours uptime ubuntu 18.04

VTChevalier · August 11, 2018, 12:46am

I am running multiple unprivileged lxcs and after they’ve been up for roughly 24/48 hours lxc-attach just doesn’t work. There’s no error unless I throw the -o then I get:

lxc-attach: vpn: cgroups/cgfsng.c: cgfsng_attach: 2004 No such file or directory - Failed to attach 29830 to /sys/fs/cgroup/memory/user/lxcuser/0/lxc/vpn/cgroup.procs

VTChevalier · March 22, 2019, 5:44am

March 2019, still happening, any suggestions out there?

lxc-attach: vpn: cgroups/cgfsng.c: cgfsng_attach: 1991 No such file or directory - Failed to attach 24257 to /sys/fs/cgroup/memory/user/lxcuser/0/lxc/vpn/cgroup.procs

madda4 · April 9, 2019, 7:15am

Hello,

I run into same issue, it happens from time to time to my setup, i would say it’s random.

I checked cgroup tree and in my case is missing whole /sys/fs/cgroup/memory/user/<username>/0/lxc.

Any suggestions how to investigate the issue?

Thanks.

$ uname -a Linux r98-u1-web 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

`$ lxc-checkconfig
Kernel configuration not found at /proc/config.gz; searching…
Kernel configuration found at /boot/config-4.15.0-47-generic
— Namespaces —
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled

— Control groups —
Cgroups: enabled

Cgroup v1 mount points:
/sys/fs/cgroup/systemd
/sys/fs/cgroup/perf_event
/sys/fs/cgroup/rdma
/sys/fs/cgroup/memory
/sys/fs/cgroup/hugetlb
/sys/fs/cgroup/freezer
/sys/fs/cgroup/net_cls,net_prio
/sys/fs/cgroup/blkio
/sys/fs/cgroup/cpu,cpuacct
/sys/fs/cgroup/cpuset
/sys/fs/cgroup/pids
/sys/fs/cgroup/devices

Cgroup v2 mount points:
/sys/fs/cgroup/unified

Cgroup v1 clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled

— Misc —
Veth pair device: enabled, loaded
Macvlan: enabled, not loaded
Vlan: enabled, loaded
Bridges: enabled, loaded
Advanced netfilter: enabled, not loaded
CONFIG_NF_NAT_IPV4: enabled, loaded
CONFIG_NF_NAT_IPV6: enabled, not loaded
CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, not loaded
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, loaded
FUSE (for use with lxcfs): enabled, not loaded

— Checkpoint/Restore —
checkpoint restore: enabled
CONFIG_FHANDLE: enabled
CONFIG_EVENTFD: enabled
CONFIG_EPOLL: enabled
CONFIG_UNIX_DIAG: enabled
CONFIG_INET_DIAG: enabled
CONFIG_PACKET_DIAG: enabled
CONFIG_NETLINK_DIAG: enabled
File capabilities:

Note : Before booting a new kernel, you can check its configuration
usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig`

$ cat /proc/self/cgroup
12:devices:/user.slice
11:pids:/user.slice/user-0.slice/session-37308.scope
10:cpuset:/
9:cpu,cpuacct:/user.slice
8:blkio:/user.slice
7:net_cls,net_prio:/
6:freezer:/user/root/0
5:hugetlb:/
4:memory:/user/root/0
3:rdma:/
2:perf_event:/
1:name=systemd:/user.slice/user-0.slice/session-37308.scope
0::/user.slice/user-0.slice/session-37308.scope

beneth · January 6, 2020, 11:06pm

I have exactly the same issue on Debian Buster with unprivileged Lxcs.

$ uname -a
Linux <Hostname> 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u1 (2019-09-20) x86_64 GNU/Linux

$ apt show lxc
Package: lxc
Version: 1:3.1.0+really3.0.3-8

beneth · March 15, 2020, 11:36am

Hello,

Any news about this issue ? Do you have some idea to make further investigation ?

I was thinking on living only with ssh to connect to my container, and not use lxc-attach, but I just see that when this happen, the container memory limit is not working anymore.

VTChevalier · March 15, 2020, 2:24pm

I noticed that it seems to happen around my process stack cycling- if my lxc does multiple tasks it will exhaust proc for the container and then boom locked out. I minimized the amount of new processes but that means I have to restart my lxc seven days instead of two. I know it’s gotta be something around proc trying to cycle back after hitting some sort of max. Not sure who to tell about it though.

beneth · March 16, 2020, 9:14am

If I understand correctly what you have said, I tried to make a script that will spawn a lot of process. I see pid cycling until around 32768 and goes back to low PID. But it does not cause the issue to happen. (Even after 10h of cycling)

What I see inside the container (because I can connect to it via ssh) is that /sys/fs/cgroup/memory is empty.

ls -l /sys/fs/cgroup/memory/
total 0

For example, another unprivileged LXC which is running fine for the moment, this directory is not empty :

ls -l /sys/fs/cgroup/memory/
total 0
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 cgroup.clone_children
--w--w--w- 1 nobody nogroup 0 Mar 12 15:50 cgroup.event_control
-rw-rw-r-- 1 nobody root    0 Mar 15 22:51 cgroup.procs
drwxr-xr-x 2 root   root    0 Mar 12 15:50 init.scope
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.failcnt
--w------- 1 nobody nogroup 0 Mar 12 15:50 memory.force_empty
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.kmem.failcnt
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.kmem.limit_in_bytes
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.kmem.max_usage_in_bytes
-r--r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.kmem.slabinfo
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.kmem.tcp.failcnt
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.kmem.tcp.limit_in_bytes
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.kmem.tcp.max_usage_in_bytes
-r--r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.kmem.tcp.usage_in_bytes
-r--r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.kmem.usage_in_bytes
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.limit_in_bytes
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.max_usage_in_bytes
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.move_charge_at_immigrate
-r--r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.numa_stat
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.oom_control
---------- 1 nobody nogroup 0 Mar 12 15:50 memory.pressure_level
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.soft_limit_in_bytes
-r--r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.stat
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.swappiness
-r--r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.usage_in_bytes
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 memory.use_hierarchy
-rw-r--r-- 1 nobody nogroup 0 Mar 12 15:50 notify_on_release
drwxr-xr-x 8 root   root    0 Mar 16 06:47 system.slice
-rw-rw-r-- 1 nobody root    0 Mar 12 15:50 tasks
drwxr-xr-x 3 root   root    0 Mar 16 09:18 user.slice

Also, my setup was working fine with a previous version of Debian (oldstable now) and I begin to see this issue only from Debian buster (But that’s a lot of change, ie: kernel version, lxc version, and so on).

It is really hard to investigate.

VTChevalier · March 16, 2020, 1:28pm

Yes it is, let me look into it a little more again too. My solution was to make my lxcs autonomous so I never have to login and will reset if I force a reboot. But that took me some time and won’t work for everyone. If I think of anything else I’ll post back.

narcisgarcia · July 1, 2020, 7:14am

I’m affected by exactly same issue in Debian GNU/Linux 10 (buster):
Linux 4.19.0-8-amd64
LXC 1:3.1.0+really3.0.3-8

And also in Ubuntu GNU/Linux 18.04 (bionic):
Linux 4.15.0-106-generic
LXC 3.0.3-0ubuntu1~18.04.1

With this command:
$ lxc-ls -F NAME,RAM
Affected containers appear to use zero RAM

This seems a serious bug even more when it’s affecting to LTS versions of important distros.

beneth · April 5, 2021, 5:36am

Hi,

I am still workaround this issue…
Nobody find a solution or a clue of what happen here ?

Thanks,