Looking for clarity on disk I/O limits with ZFS

noah · January 12, 2020, 5:14pm

It’s my understanding that ZFS doesn’t fully work with disk I/O limits in LXC due to how ZFS handles it’s own I/O path instead of through the kernel.

From: incus/doc/storage.md at main · lxc/incus · GitHub

I/O quotas (IOps/MBs) are unlikely to affect ZFS filesystems very much. That’s because of ZFS being a port of a Solaris module (using SPL) and not a native Linux filesystem using the Linux VFS API which is where I/O limits are applied.

My goal is to make sure a container can’t take up all disk IO bandwidth and prevent other containers from operating correctly.

Questions:

Does anyone have any info on if any I/O operations are affected on ZFS with limits in place?
We have a production environment which we would like to have a ZFS pool backed by 8 SSD drives and the entire pool is dedicated to LXD. Is there any way to limit disk I/O reliably?
If not possible with ZFS what is the next recommended backend store to handle disk I/O limits with the above hardware setup?

If anyone has any insight into the best way to achieve that I appreciate any help!

Thank you in advance!

Side note: I tried to limit disk writes (writing random data via dd command) on my dev environment which is an ubuntu host running ubuntu containers but the host is running on VirtualBox and the limits did not seem to apply. Is that because I’m running it through a virtualization layer under the host?

stgraber · January 12, 2020, 6:27pm

I’m not sure of what code paths may be properly restricted for zfs, it’s always a bit unknown what bits of normal infrastructure is used by zfs/spl and what bits are using their own implementation.

btrfs is usually our second recommendation (and in fact, the most used backend), though stability in raid-ed environments has been an issue in the past and disk quotas are effectively useless, so may not be suitable for everyone.

Using LVM would come with a dedicated block and filesystem for each container which should avoid all those issues, though at the cost of snapshot reliability and length exports/moves.

Another thing of note, if your SSDs are NVME, then none of that matters as Linux just plain doesn’t support I/O restrictions on those these days. They don’t go through the I/O scheduler and instead the queuing happens on the drives, outside of OS control.

Another reason why you may not be able to control block I/O in your environment is because of your choice of I/O scheduler, not all of them support limits and last I checked only cfq did a good job at enforcing them.