ZFS vs EXT4 Performance on High Load with LXD-Benchmark

Yes. Expecting the same results without using the exact hardware, versions, etc. doesn’t seem a good goal.

My equipment and some information

OS: Alpine Linux v3.17 x86_64
Host: ProLiant DL380p Gen8
Kernel: 5.15.79-0-lts
CPU: Intel Xeon E5-2667 v2 (32) @ 4.000GHz
Memory: 2129MiB / 257902MiB
Raid Mode: HBA
Disk: 12 SAS HDDs of 6 TB

This is different than in the video ?

Certainly

I am testing on server hardware
But @stgraber is testing on his PC if I’m not mistaken

If it’s different hardware, you still wish similar synthetic performance ?

The fact is that I check the speed of ZFS and EXT4 on the same server with an identical setting

Your results do not seem unexpected. You may be trying to find an issue where none exists.

ZFS should be slower than EXT4?

I expect using a synthetic benchmarking utility on different hardware and software versions than a demo video would give you this type of result. The benchmark utility doesn’t demonstrate real-world performance differences with these different filesystems.

I tried to run ZFS and EXT4 with the following configurations

on one disk
on the raid
in ram
And in every case ZFS is slower than EXT4 with standard LXD settings

I checked with both FIO and LXD-BENCHMARK

I can check right now with any other utility and give any information, no problem

I really want to use ZFS but don’t know how to make it faster than EXT4

Regards.

Yes. You clearly demonstrate you wish the synthetic benchmarks to output a better performance result for zfs than ext4.

Do you know how to achieve faster performance?

Here is some information: I will use PostgreSQL on my containers with ZFS storage type in very high loads

Regards.

PS: Thanks for still helping me

Why not stage the system as you intend and run some real-world tests against it rather than synthetic benchmarks?

It’s not good that container creation is slower on ZFS

Okay then I’ll check the tests using the PGBench utility

Is this a good idea?

No. I am recomending to configure the system as you intend and run real-world tests against it to determine real-world performance.

Okay then I’m setting up my system now as if I set it up in production and then I’ll check the speed of its work

I will use Debian 12 on the host, Debian 11 on the container and PostgreSQL 15

Now I’ll set it up and tell you the speed of work

1 Like

There were problems installing Debian 12 on the server because there is no native support for ZFS unfortunately

But I decided to run Alpine Linux on my RAM and install LXD on it with standard settings

Created a pgbench debian 11 container

I launched my ERP system and it works exactly the same as on EXT4, but there is a nuance that these are not high loads, I work alone

To check high loads, I still had to run pgbench
Here are my stats on ZFS

root@pgbench:~# pgbench -h localhost -p 5432 -U postgres -c 50 -j 2 -P 60 -T 600 benchmark

Password:
pgbench (15.3)
starting vacuum…end.
progress: 60.0 s, 17607.9 tps, lat 2.744 ms stddev 1.433, 0 failed
progress: 120.0 s, 16685.8 tps, lat 2.911 ms stddev 1.139, 0 failed
progress: 180.0 s, 16754.9 tps, lat 2.899 ms stddev 1.611, 0 failed
progress: 240.0 s, 16348.3 tps, lat 2.972 ms stddev 1.150, 0 failed
progress: 300.0 s, 16373.4 tps, lat 2.968 ms stddev 1.666, 0 failed
progress: 360.0 s, 16450.3 tps, lat 2.954 ms stddev 1.257, 0 failed
progress: 420.0 s, 16272.8 tps, lat 2.986 ms stddev 1.737, 0 failed
progress: 480.0 s, 16109.5 tps, lat 3.018 ms stddev 1.167, 0 failed
progress: 540.0 s, 15348.4 tps, lat 3.171 ms stddev 2.173, 0 failed
progress: 600.0 s, 15340.2 tps, lat 3.173 ms stddev 1.875, 0 failed
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 150
query mode: simple
number of clients: 50
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 9797529
number of failed transactions: 0 (0.000%)
latency average = 2.975 ms
latency stddev = 1.554 ms
initial connection time = 100.341 ms
tps = 16331.009234 (without initial connection time)

You are unable to configure the system as you intend and run real-world tests against it ?

Sorry, I started testing in RAM again :slight_smile:
Ok I created a pool on my disks like this

bench:~# modprobe zfs
bench:~#    zpool create -f -o ashift=12 \
>        -O acltype=posixacl -O canmount=off -O compression=lz4 \
>        -O dnodesize=auto -O normalization=formD -O relatime=on -O xattr=sa \
>        -O recordsize=8K -O atime=off -O logbias=throughput \
>        data mirror /dev/sda /dev/sdb mirror /dev/sdc /dev/sdd mirror /dev/sde /dev/
sdf mirror /dev/sdg /dev/sdh mirror /dev/sdi /dev/sdj mirror /dev/sdk /dev/sdl

System set up successfully
Here are pgbench results when using disks

root@pgbench:~# pgbench -h localhost -p 5432 -U postgres -i -s 150 benchmark

Password:
dropping old tables…
NOTICE: table “pgbench_accounts” does not exist, skipping
NOTICE: table “pgbench_branches” does not exist, skipping
NOTICE: table “pgbench_history” does not exist, skipping
NOTICE: table “pgbench_tellers” does not exist, skipping
creating tables…
generating data (client-side)…
15000000 of 15000000 tuples (100%) done (elapsed 29.26 s, remaining 0.00 s)
vacuuming…
creating primary keys…
done in 41.73 s (drop tables 0.00 s, create tables 0.14 s, client-side generate 30.05 s, vacuum 0.77 s, primary keys 10.76 s).

root@pgbench:~# pgbench -h localhost -p 5432 -U postgres -c 50 -j 2 -P 60 -T 600 benchmark
Password:
pgbench (15.3)
starting vacuum…end.
progress: 60.0 s, 2156.3 tps, lat 23.016 ms stddev 9.696, 0 failed
progress: 120.0 s, 2125.8 tps, lat 23.379 ms stddev 10.477, 0 failed
progress: 180.0 s, 2082.9 tps, lat 23.866 ms stddev 10.236, 0 failed
progress: 240.0 s, 2067.1 tps, lat 24.054 ms stddev 11.318, 0 failed
progress: 300.0 s, 1526.7 tps, lat 32.653 ms stddev 24.898, 0 failed
progress: 360.0 s, 2005.5 tps, lat 24.821 ms stddev 11.331, 0 failed
progress: 420.0 s, 1956.7 tps, lat 25.460 ms stddev 10.749, 0 failed
progress: 480.0 s, 1698.5 tps, lat 29.348 ms stddev 11.020, 0 failed
progress: 540.0 s, 1842.1 tps, lat 27.051 ms stddev 12.325, 0 failed
progress: 600.0 s, 1888.4 tps, lat 26.363 ms stddev 14.288, 0 failed
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 150
query mode: simple
number of clients: 50
number of threads: 2
maximum number of tries: 1
duration: 600 s
number of transactions actually processed: 1161060
number of failed transactions: 0 (0.000%)
latency average = 25.720 ms
latency stddev = 13.184 ms
initial connection time = 89.406 ms
tps = 1935.208875 (without initial connection time)

The result is of course worse (because I ran the test on disks), but this is the setting that I would use in a production environment

Regards.