How to prevent the VM memory from getting swapped out?

ValdikSS · October 8, 2024, 12:03am

I’m looking for a way to prevent or limit amount of swapped data of Incus-managed VM (qemu).

I have a server which (in short) does two heavy tasks:

Runs a single VM in Incus
Serves a lot of files, fast (constant 50+ MB/s disk reads), 24/7 (lowest priority task)

The latter causes the kernel to prefer swapping off VM memory due to high amount of hot page cache and relatively low amount of anonymous pages of a VM, even with vm.swappiness=1.
Out of 8 GiB RAM allocated to the VM (limits.memory: 8GiB), usually about 4 GiB is in swap.

This creates huge latency when using the services in a VM, as the first access of rarely used functions causes access to 1-2 GB of rarely used (thus swapped off) RAM, which leads to swapins and slow page loading the first time.

I could not find any way to prevent swapping off the VM.

Incus doesn’t run the qemu in its own cgroup, so no memory.low setting is possible.
Incus doesn’t provide any way to mlockall qemu process

Is there any less ugly way to solve the issue than issuing mlockall using LD_PRELOAD/gdb for qemu process, or wrapping qemu into bash script making it run in its own systemd scope?

I have 32 GB of RAM on the server, 8 of which are occupied by the VM, 4-5 are occupied by the services (file sharing) on the host, the rest is page cache.
The file sharing service is running in its own systemd slice with memory limits, but it doesn’t help much (probably because the FS the files are served from is a FUSE proxy and it’s not limited yet).

RandomUser · October 8, 2024, 2:52am

I have a very similar use cases and as far as I can understand Stephane’s answer is that Cgroups2 does not allow setting such limits. I asked additionally if there is any work to improve Cgroups2 and for this I didn’t get answer. It’s strange that our use cases seem to be rare.