Understanding disk device limits for VMs

I know for containers cgroup is being used to deal with rate limiting disks; but what’s the mechanic when dealing with VMs? Should limits.max for instance work when VMs using a disk based storage pool? I tried to look at the documentation but I could not find the relevant parts.

I don’t believe that those limits are currently enforced by anything on the VM side.
Not that they’re typically enforced by anything on the container side either these days.

The main issue is that the traditional mechanism to do so, the blkio cgroup, is only supported by a few I/O scheduler and these days, this means it will not work on any system using NVME or scsi-mq, which are the two most common setups. Both of those effectively perform I/O queuing and scheduling as close to the drive as possible (or in the case of NVME, in the drive), leaving no real control to the OS.

Having some kind of control over the i/o limits per instance is a must have for my use-case. We only use spinning HDDs in our cluster. We currently use Ceph RBD with containers in production but sadly it’s too slow and expensive to fit in our requirements.

I’m currently testing BeegFS with VMs (using a directory storage pool) but I need a way to limit the containers, a quick Google seems to imply QEMU has something like iotune to tweak rate limits is this something that could be exposed to LXD?

An other thing I noticed:

The root disk for a Qemu VM seems to be a root.img file which gets translated to a /dev/root device inside the VM itself. Even resizing it to the correct size.

Is it possible to add other .img files as devices? I tried adding an other volume on the same pool but sadly it just exposed it as a folder, it did not create a new image to expose. Is this possible at all?


I confirmed that using virsh/libvirt I can achieve everything I require which makes me hopefully the same functionality can be added to LXD. I am willing to contribute it if nothing for it exists yet but I have no clue about the bigger picture here.

Basically what I’m talking about is:

  • The ability to create raw qemu files to expose as disk devices (not sure how this would work with storage pools, for dirs we could place the files in the containing folder?)
  • The ability to set iotune limits on disk devices.

i.e. a part of a virsh xml file

<disk type='file' device='disk'>
  <driver name='qemu' type='raw'/>
  <source file='/mnt/beegfs/lxd/qemu/testfile.bin'/>
  <target dev='vda' bus='virtio'/>
  <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>

You can create a block type custom volume and attach it to a VM using:

lxc storage volume create <pool> <volume> size=x --type=block
lxc storage volume attach <pool> <volume> <instance>

I can confirm this works.

That just leaves the iotune part; is there support for iotune yet or if not can I contribute it?

I’ve not tried it so don’t know whats involved, you’re welcome to open a PR though if you feel able to.

Great, I will check it out and see what I can come up with.