[LXD] Restrict CPU placement to NUMA nodes

gabrielmougard · June 6, 2023, 5:59pm


Project	LXD
Status	Implemented
Release	5.15
Internal ID	LX042

Abstract

This feature will introduce a new limits.cpu.nodes config key which takes a range or comma separate list of NUMA nodes.

When set, any dynamic value of limits.cpu will trigger our scheduler to only place the instance CPUs within the configured set of NUMA nodes.

Rationale

Memory access is always the fastest, when the CPU can access its local memory. NUMA Nodes are CPU/Memory couples. Typically, the CPU socket and the closest memory banks build a NUMA Node. Whenever a CPU needs to access the memory of another NUMA node, it cannot access it directly but is required to access it through the CPU owning the memory. The performance degradation in a case where the CPU is not being able to access the memory through the local NUMA node can be massive and slow down the application a lot.

In order to avoid such cases, we would like to give the user the option to restrict instance CPUs to a set of specified NUMA nodes. As such, we could have an instance that has an optimized memory locality, or processor load, specifying a set of NUMA nodes that are close to each other.

Specification

Design

In a case where the specified limits.cpu is dynamic (i.e, a given number of CPUs to use) and not pinned (i.e, a given list of CPU thread IDs or range of CPU thread IDs), we will consider reading the value of limits.cpu.nodes which is either a list of NUMA node IDs or a range of NUMA node IDs.

For example, here is a NUMA configuration:

$ lscpu | grep -i numa

NUMA node(s):                    2
NUMA node0 CPU(s):               0-5,12-17
NUMA node1 CPU(s):               6-11,18-23

Here is an associated simplified image of the CPU topology:

Then, with my example in mind, if I choose limits.cpu.nodes=0, I will consider load-balancing on CPUs 0-5 and 12-17. If I choose limits.cpu.nodes=0,1 or limits.cpu.nodes=0-1 (it could be limits.cpu.nodes=0-x, x being an integer representing the upper NUMA node ID), in my example, I will consider load-balancing on all my CPUs.

What happens if limits.cpu is a number (let’s call it n) greater than the number of elements in the set of NUMA node ID ? We just chose to pin all the CPUs contained in the NUMA node set and the remainder is load-balanced as usual (not pinned)
If n is lesser or equal to the number of elements in the set of NUMA node ID, we pin the NUMA CPUs in ascending order ((NUMA_Node_0.NUMA_CPU_0 -> ... -> NUMA_Node_0.NUMA_CPU_K ) -> ... -> (NUMA_Node_L.NUMA_CPU_0 -> ... -> NUMA_Node_L.NUMA_CPU_M ) )

API changes

New limits.cpu.nodes config key which is a range or comma separate integers (each integer being a NUMA node identifier)

CLI changes

No CLI changes.

Database changes

No database changes.

stgraber · June 7, 2023, 4:32pm

I think in that scenario we should still limit to CPUs in the NUMA set.
So basically resulting in an instance with less CPUs than requested.
It’s the same behavior you get if you request more CPUs than you physically have without the NUMA set in play.

tomp · June 9, 2023, 7:40am

Thanks for this spec @gabrielmougard I appreciated the clear description of the problem space.