We are running a high concurrency optimized webserver setup using apache and mpm_event in lxd containers. After upgrading to lxd 4.0.1 we have started to experience issues when reloading and restarting apache. lxcfs cpu usage goes to ~200% for between 2 and 30 seconds. Rare cases 120 seconds. All this time several services including apache is unavailable, and eg. running /proc related commands as top, ps, uptime stalls until lxcfs has become normal again.
We straced lxcfs while reloading apache.http://sprunge.us/ZxV8up
And found a lot of
<... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) lines.
Sometimes, but not consequently we also see the following in the hosts syslog:
cgroup: fork rejected by pids controller in /lxc.payload.phct-030/system.slice/apache2.service
It seems we can mostly replicate this with a rather aggressive mpm_event.conf:
StartServers 8 MinSpareThreads 100 MaxSpareThreads 300 ServerLimit 2000 ThreadLimit 256 ThreadsPerChild 100 MaxRequestWorkers 2000 MaxConnectionsPerChild 9999
root@HOSTNAME:~# lxd --version 4.0.1 root@HOSTNAME:~# /snap/lxd/current/bin/lxcfs --version 4.0.3 root@HOSTNAME:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic root@HOSTNAME:~# uname -a Linux HOSTNAME 5.3.0-51-generic #44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux