We are running a high concurrency optimized webserver setup using apache and mpm_event in lxd containers. After upgrading to lxd 4.0.1 we have started to experience issues when reloading and restarting apache. lxcfs cpu usage goes to ~200% for between 2 and 30 seconds. Rare cases 120 seconds. All this time several services including apache is unavailable, and eg. running /proc related commands as top, ps, uptime stalls until lxcfs has become normal again.
We straced lxcfs while reloading apache.http://sprunge.us/ZxV8up
And found a lot of <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) lines.
Sometimes, but not consequently we also see the following in the hosts syslog:
cgroup: fork rejected by pids controller in /lxc.payload.phct-030/system.slice/apache2.service
It seems we can mostly replicate this with a rather aggressive mpm_event.conf:
I can’t reproduce based on your instructions. Can you give a more detailed reproducer, please?
The “cgroup: fork rejected by pids controller” is the kernel telling you that you’ve exceeded the number of processes your cgroup is allowed. What limit have you set for the container? Could just be that your container simply isn’t allowed to spawn any more processes.
root@f3:~# cat /etc/apache2/mods-enabled/mpm_event.conf
# event MPM
# StartServers: initial number of server processes to start
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestWorkers: maximum number of worker threads
# MaxConnectionsPerChild: maximum number of requests a server process serves
<IfModule mpm_event_module>
StartServers 8
MinSpareThreads 100
MaxSpareThreads 300
ServerLimit 2000
ThreadLimit 256
ThreadsPerChild 100
MaxRequestWorkers 2000
MaxConnectionsPerChild 9999
</IfModule>
# vim: syntax=apache ts=4 sw=4 sts=4 sr noet
root@f3:~# apache2ctl graceful && time uptime
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
19:25:41 up 1 min, 0 users, load average: 1.71, 1.68, 1.50
real 0m0.260s
user 0m0.002s
sys 0m0.000s
root@f3:~# apache2ctl graceful && time uptime
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
19:25:42 up 1 min, 0 users, load average: 1.71, 1.68, 1.50
real 0m0.840s
user 0m0.002s
sys 0m0.000s
root@f3:~# apache2ctl graceful && time uptime
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
19:25:44 up 1 min, 0 users, load average: 2.13, 1.77, 1.53
real 0m0.921s
user 0m0.000s
sys 0m0.002s
root@f3:~# apache2ctl graceful && time uptime
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
19:25:46 up 1 min, 0 users, load average: 2.13, 1.77, 1.53
real 0m2.653s
user 0m0.002s
sys 0m0.000s
Lately we’ve had containers with service disruptions at midnight for 2-10 minutes, probably due to all containers on the same host running logrotate (and thereby apache reload) all at once causing serious load on lxcfs.
No - i did not have issues with this prior to the update. I can’t argue if this can be replicated in 3.x, but we did not experience this before upgrading.
FWIW, I’m still seeing this issue when running a Debian 13 container on a Debian 13 host (Debian incus packages, but I assume it would be the same on latest - some sort of cgroup related issue). This looks like a kernel cgroup issue rather than incus I think?
To reproduce:
launch debian 13 container e.g. incus launch images:debian/13 limitcheck
In container: apt install apache2
In container verify systemctl restart apache2 works as expected (restarts cleanly, no kernel messages on host)
incus config set limitcheck limits.processes 200
On the host, the command # for i in /sys/fs/cgroup/lxc.payload.limitcheck/pids.* ; do echo -n "$i: " && cat $i ; done gives output like: /sys/fs/cgroup/lxc.payload.limitcheck/pids.current: 64 /sys/fs/cgroup/lxc.payload.limitcheck/pids.events: max 0 /sys/fs/cgroup/lxc.payload.limitcheck/pids.events.local: max 0 /sys/fs/cgroup/lxc.payload.limitcheck/pids.max: 200 /sys/fs/cgroup/lxc.payload.limitcheck/pids.peak: 68
Re-run systemctl restart apache2
In the container verify messages like: (11)Resource temporarily unavailable: AH00480: ap_thread_create: unable to create worker thread
and: AH02324: A resource shortage or other unrecoverable failure was encountered before any child process initialized successfully... httpd is exiting!
are seen in /var/log/apache2/error.log.
On the host the kernel logs: cgroup: fork rejected by pids controller in /lxc.payload.limitcheck/system.slice/apache2.service
On the host: /sys/fs/cgroup/lxc.payload.limitcheck/pids.peak still reads 68
This is with host kernel: 6.12.90+deb13.1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.90-2 (2026-05-27) x86_64 GNU/Linux I haven’t tried any newer host kernel versions.
I’m not sure how to explain this behaviour. pids appears to be measuring total number of threads. The peak value is never going anywhere near the limit, yet apache is reporting failed forks. FWIW, the default apache settings are limiting total threads to 150 (also well below the limit - pids.current with apache not running is 8 so I’d expect with this config apache wouldn’t be able to exceed 158 threads.
Because the test is of a restart (not a reload), the original apache process should have fully exited before the replacement is started (so there shouldn’t be the possibility of two copies running at the same time, and the peak value agrees with this too.