LxD vm io bottleneck on zfs (with a solution)

For the first time here, I have a topic with solution instead of questions.

We have Jenkins slaves using LxD as VM of building OEM images/iso. After comparing with those build upon Vagrant VM, the build time under the same conditions is much longer. Investigation shows the bottleneck on io when getting packages for installing.

It comes out there is a property in zfs “sync” to be default set to standard.

$zfs get sync
......
local/virtual-machines/oem-iot-focal-2                                                        sync      standard  default
local/virtual-machines/oem-iot-jammy-1                                                        sync      standard  default

This following set property resolve the issues.

$sudo zfs set sync=disabled local/virtual-machines
$zfs get sync
.......
local/virtual-machines/oem-iot-focal-2                                                        sync      disabled  inherited from local/virtual-machines
local/virtual-machines/oem-iot-jammy-1                                                        sync      disabled  inherited from local/virtual-machines

The total build time is originally 9hrs, in my case, down to 50mins. It is now compare to Vagrant VM even.

Hope this help someone hit similar issues.

1 Like

I guess this is unsafe and it must not be used for every use case right ?

Right, the VM will have the impression that its writes have been fully flushed when they haven’t been. Should the host lose power, you’ll lose data.

2 Likes

Yes, but since we use LxD vm as a build slave, everything on that vm after we have the artifacts, iso/images, is not important.
Not to safe we will destroy the vm and create a new one right after every build.

So in conclusion, I don’t even need zfs in this scenario right? I can use just local file system.

In my case, is it also for me to use dir as LxD backend. Like Directory backend for long-term containers?
has been mentioned?

I was thinking about maybe zfs is not a correct options in my scenario. I tried dir as back-end then I found things not working as I expeced.

IO still the bottleneck then I search and found this paragraph

While this backend is fully functional, it’s also much slower than all the others due to it having to unpack images or do instant copies of instances, snapshots and images.

storage-dir backend

Now maybe I have to switch back to zfs with sync disabled, to have kept flexibility? From the result this is the best closed to my benchmark using Vagrant VM on bare metal.

Yeah, unfortunately there aren’t any magic storage backend that’s good at everything…

For VMs stored on dir or btrfs, they’re stored in a loop file on disk, so performance isn’t going to be particularly great. ZFS is a bit better as its zvol at least offer a bit more flexibility in configuration.

The fastest for VMs I suspect would be LVM in a non-thinpool configuration, but that’s very costly in space and isn’t a great backend to run containers on, so if running both containers and VMs, that wouldn’t be so great.

Most often, I find ZFS to be the best compromise. It’s not the best at anything really, but it offers enough knobs that you can get close and still use a single backend for everything.

1 Like

Was this command ran in the host or vm?
sudo zfs set sync=disabled local/virtual-machines

On the host

Hi stgraber,

I got a question about this. If i use ssd with power outage safe option. Can I then safely disable the sync option on zfs pool?

Let’s say I will by some dc500m SSD drives?

PS I also use a ups. When a power outage my system will shutdown gracefully.

What are my options here?

Having drives with capacitors/batteries and/or have the system on UPS should make it safe for the power loss situation.

Note that this will not save you from system crashes though. If your kernel panics, whatever is left in memory will be lost.

thank you,

i will leave it on. i am trying to create a new zpool and restore all. maybe the slowness is then over.