LXCFS-5.0 /sys/devices/system/cpu/cpu{coreid}/ is empty

I installed the lxd 5.0, but I found my application can not start in lxd-5.0 containers, because the directory /sys/devices/system/cpu/cpu{coreid} is empty:


root@ops-physical-100-117:~# lxc list
+----------------------------+---------+-----------------------+------+-----------------+-----------+----------------------+
|            NAME            |  STATE  |         IPV4          | IPV6 |      TYPE       | SNAPSHOTS |       LOCATION       |
+----------------------------+---------+-----------------------+------+-----------------+-----------+----------------------+
| ops-host-100-14-standalone | RUNNING | 192.168.100.14 (eth0) |      | CONTAINER       | 0         | ops-physical-100-117 |
+----------------------------+---------+-----------------------+------+-----------------+-----------+----------------------+
| vm173                      | STOPPED |                       |      | VIRTUAL-MACHINE | 0         | ops-physical-100-117 |
+----------------------------+---------+-----------------------+------+-----------------+-----------+----------------------+
[root@ops-host-100-14-standalone ~]# ll /sys/devices/system/cpu/
total 0
drwxr-xr-x 2 root root   0 Apr 25 19:36 cpu0
drwxr-xr-x 2 root root   0 Apr 25 19:36 cpu25
drwxr-xr-x 2 root root   0 Apr 25 19:36 cpu26
drwxr-xr-x 2 root root   0 Apr 25 19:36 cpu29
drwxr-xr-x 2 root root   0 Apr 25 19:36 cpufreq
drwxr-xr-x 2 root root   0 Apr 25 19:36 cpuidle
drwxr-xr-x 2 root root   0 Apr 25 19:36 hotplug
drwxr-xr-x 2 root root   0 Apr 25 19:36 intel_pstate
-r--r--r-- 1 root root   1 Apr 25 19:36 isolated
-r--r--r-- 1 root root   5 Apr 25 19:36 kernel_max
drwxr-xr-x 2 root root   0 Apr 25 19:36 microcode
-r--r--r-- 1 root root 579 Apr 25 19:36 modalias
-r--r--r-- 1 root root   6 Apr 25 19:36 offline
-r--r--r-- 1 root root   5 Apr 25 19:36 online
-r--r--r-- 1 root root   5 Apr 25 19:36 possible
drwxr-xr-x 2 root root   0 Apr 25 19:36 power
-r--r--r-- 1 root root   5 Apr 25 19:36 present
drwxr-xr-x 2 root root   0 Apr 25 19:36 smt
-rw-r--r-- 1 root root   0 Apr 25 19:36 uevent
drwxr-xr-x 2 root root   0 Apr 25 19:36 vulnerabilities
[root@ops-host-100-14-standalone ~]# ll /sys/devices/system/cpu/cpu0
total 0

Back to the lxd-4.24, the directory looks like that:

[root@dev-host-0-109-staging-enterprise-3 ~]# ll /sys/devices/system/cpu/cpu95/
total 0
drwxr-xr-x 6 root root    0 Apr 22 13:44 cache
drwxr-xr-x 6 root root    0 Apr 25 19:29 cpuidle
-r-------- 1 root root 4096 Apr 25 19:29 crash_notes
-r-------- 1 root root 4096 Apr 25 19:29 crash_notes_size
lrwxrwxrwx 1 root root    0 Apr 25 19:29 driver -> ../../../../bus/cpu/drivers/processor
lrwxrwxrwx 1 root root    0 Apr 25 19:29 firmware_node -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0004:01/ACPI0007:67
drwxr-xr-x 2 root root    0 Apr 25 19:29 hotplug
drwxr-xr-x 2 root root    0 Apr 25 19:29 microcode
lrwxrwxrwx 1 root root    0 Apr 25 19:29 node1 -> ../../node/node1
-rw-r--r-- 1 root root 4096 Apr 22 11:57 online
drwxr-xr-x 2 root root    0 Apr 25 19:29 power
lrwxrwxrwx 1 root root    0 Apr 25 19:29 subsystem -> ../../../../bus/cpu
drwxr-xr-x 2 root root    0 Apr 25 19:29 thermal_throttle
drwxr-xr-x 2 root root    0 Apr 22 13:44 topology
-rw-r--r-- 1 root root 4096 Apr 18 11:54 uevent

And I notice the lxcfs-5.0 released the feature:

/sys/devices/system/cpu/ support
In addition to the existing /sys/devices/system/cpu/online support, LXCFS will now virtualize the entire /sys/devices/system/cpu directory in order to hide CPUs which aren’t available to the container.

How can I fix it?

If building from source, there are a few follow-up commits in the lxcfs git repo that you probably want to include in your build as they are related to fixes with that particular endpoint.

So there is no configurations or something else can fix it ? And I have to build the lxd from source to disable this feature? :smiling_face_with_tear:

You can unmount /sys/devices/system/cpu/ inside of your container.

Ok, I see what’s happening, thank you.

Sorry to chime in like this, but I also noticed 5.0.0 was somewhat “unfinished”. Is there a plan to do a quick 5.0.1 release?

Not super quick because of travel and time off, but yes, 5.0.1 is planned.

I’m hoping to get LXC 5.0.0 out next week which will then unblock a 5.0.1 of all 3 projects in about a month.

Just adding that this change broke a tonne of our systems unexpectedly, there is a bug in even the later JVM 8 builds (cannot upgrade for legacy reasons) that relies on the assigned CPU numbers not being higher than the total available.

Is there any other way to fix this, perhaps a custom LXCFS build? Unmounting would surely break the online file which would just create other problems. An alternative would be to build a custom version of the JVM which uses the method that lscpu uses to obtain the real max CPU count.

We’re using the snap package.

LXCFS 5.0 introduced those files, unmounting shows you the original file which was the old behavior, so I’m not sure how that wouldn’t be a suitable workaround?

Because unmounting it results in the online file showing the incorrect amount of available processors.

> cat /sys/devices/system/cpu/online
11,29

After unmounting

> cat /sys/devices/system/cpu/online
0-31

The most important thing is that the simulation of lxcfs is incomplete. Because of the missing files in /sys/devices/system/cpu/{coreid}/, lots of java applications using jdk_1.8_x can’t start, such as kafka.

Please can you log a bug at Issues · lxc/lxcfs · GitHub

Done. https://github.com/lxc/lxcfs/issues/548

1 Like