Number of CPUs reported by /proc/stat fluctuates, causing issues

LXD 5.8 running on Ubuntu Jammy causes the number of CPUs reported by /proc/stat in the container to fluctuate, causing problems for applications expecting no change. The issue occurs regardless of limits.cpu being set or not.

This appeared for us after a server reboot after 5.8 had been auto-installed. The reboot before that was on 5.6, so we assume this broke in the LXCFS bundled with either 5.7 or 5.8. Sadly I was unable to downgrade due to the DB schema having been upgraded in 5.8 (is there a downgrade path?).

The issue can be observed running the following in the container (adjust the number to the number of container cpus+1):

cat /proc/stat > temp.txt

while [ $(cat temp.txt | grep 'cpu' | wc -l) == 29 ]
do
echo "29"
cat /proc/stat > temp.txt
done

Due to this, we are experiencing frequent issues with node.js/libuv, which for whatever reason checks the number of cpus multiple times:

[INFO] ng build --base-href ./: ../deps/uv/src/unix/linux-core.c:615: read_times: Assertion `num == numcpus' failed.

One for @amikhalitsyn

Hi @zrav!

Thanks for your detailed report with versions specification.
I canā€™t see any changes in lxc, lxcfs that can cause this problem between LXD 5.6 and LXD 5.8. It may be a kernel problem too. Couldnā€™t you check which kernel version you had before?

Link to pretty similar issue from GitHub:

How many containers do you have (on the physical node)?

We went from kernel 5.15.0-50 to 5.15.0-53.
There are currently 60 containers on this physical machine, an Epyc with 128 threads, and Iā€™d call the load medium.

1 Like

I canā€™t see any suspicious commits between 50/53 Ubuntu 20.04 (jammy) kernels, except one:
# proc: Fix a dentry lock race between release_task and lookup

I suggest you try rebooting to the older kernel and check if it helps. It will be good for us to understand that this is not kernel-related.

From lxc/lxcfs side there were no suspicious changes at all between LXD 5.6/5.8.

I rebooted with the -50 kernel, however the issue did reappear, both in our build process as well with the reproducer script in a matter of seconds. Interestingly, an older machine with the same setup is not affected.
Does the data in the container /proc/stat file come straight from the host kernel or does lxcfs massage it in any way?

1 Like

I rebooted with the -50 kernel, however the issue did reappear, both in our build process as well with the reproducer script in a matter of seconds.

Okay, so, thatā€™s not related to recent kernel changes. Good news for us.

Interestingly, an older machine with the same setup is not affected.

Do you have the same processor (128 threads) on it, or with fewer threads?

Does the data in the container /proc/stat file come straight from the host kernel or does lxcfs massage it in any way?

No-no, it comes from lxcfs fuse. Because we are hooking CPU count and so on.

Thanks a lot for your test with the older kernel, itā€™s really helpful. Iā€™ll try to guess what happens here. On my 6 core / 12 threads machine, itā€™s not reproducible )-:

Can you confirm that the issue appeared after a software upgrade on your host? So, hardware parts, the number of containers on the node, and other things were not changed?

The other machines I tested were 64 and 16 threads. While testing these I did oversubscribe the CPUs and generated loads with the ā€œstressā€ tool to see if it is load related.
We did add Mellanox NICs to the machines and installed its DKMS driver. I canā€™t exclude that being related, but only the 128 thread machine is affected. The number of containers and types of loads did not change significantly, if at all.
If you have any check to be run on the machine, let me know. The help is appreciated!

1 Like

You can try to put some threads on your 128-thread EPIC to offline mode using the CPU hotplug feature. Like this: echo 0 > /sys/devices/system/cpu/cpu65/online (then turn it on after the experiment by writing 1 to the same sysfs file). You can try to disable all threads from 65->128 and check if the issue is still reproducible (or even disable all threads except 32). There may be a hint for us.

So I was able to reproduce the issue on the 64 thread server too, it just took longer.
When looking at the temp.txt generated by the reproducer when the loop breaks, the pattern is that the number of CPUs reported by /proc/stat in that case were either 4 or the total number of the host cpus. During the looping, I also get occasional ā€œcat: /proc/stat: Invalid argumentā€, which seems very wrong.

1 Like

I also went ahead with the test disabling cpus. The issue still occurs when offlining all but 32.

1 Like

Huge thanks for playing with that. Will take a look carefully at the code tomorrow.

Iā€™ve found something and posted a pull request

It may be related to your problem, but Iā€™m not sure. Letā€™s wait for other developers opinion

@zrav LXD 5.9 was released yesterday, you can try to update your snap it contains this fix for LXCFS. Hope it helps in your case. If not, then weā€™ll continue the investigation.

I updated to 5.9 from the candidate channel and rebooted, however the issue is still reproducible with a similar frequency.

1 Like

@zrav okay, I have an idea how we will catch this. A special build of lxcfs with ASAN and TSAN :slight_smile:
Iā€™ll reach you.

Libfuse3 direct io by mihalicyn Ā· Pull Request #571 Ā· lxc/lxcfs Ā· GitHub should help

@zrav this change was picked up in the last build. Please try snap refresh lxd and check which revision you get (it should be bigger than 24164). And yes, youā€™ll need to reboot.