Zfs deduplication storage pool

Tl;dr - zfs deduplication doesn’t seem to work well with Incus 6.8, but only because Incus incorrectly sees the disk as being “full-up” when, according to zfs, it is not, it’s just got a high deduplication factor. Is that fixable in Incus (or more likely, have I just not configured something in incus correctly?); or is dedup simply not usable under Incus in this manner?

==================== Details ========================

I have been experimenting with using deduplication in zfs on some of my backup pools, to create more convenient solutions for accessing recent archived copies of data in large data blocks that don’t change much from backup to backup. I don’t use this for my daily driver, but I wanted to check out myself how well it works. I know I can use snapshots, but I wanted to try out dedup and use a different strategy for creating more convenient solutions for my use-case.

I created a data set on a separate 2TB disk. Something like:-

andrew@lando:~$ sudo zfs create nvme2/dedup
andrew@lando:~$ sudo zfs set dedup=yes nvme2/dedup
andrew@lando:~$ sudo zfs get dedup nvme2/dedup
NAME         PROPERTY  VALUE          SOURCE
nvme2/dedup  dedup     on             local

I set this as a pool called dedup, and gave it the zfs dataset as source backing. And it works as advertised - at least at the zfs level: I throw a ~300GiB archive (four containers per project) at it, and it shows ~300G used. I throw another copy on it in a convenient archive-named project, and it shows little more storage used, but not much since ~all the data can deduplicate under zfs. The deduplication process is a little slower, but I have substantial server resources in several EPYC CPU systems, so they can handle the crunching without breaking a sweat. Here’s the pool after four “copies”:

andrew@lando:~$ zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
nvme   5.45T  1.64T  3.81T        -         -    14%    30%  1.00x    ONLINE  -
nvme2  1.81T   302G  1.52T        -         -     4%    16%  4.17x    ONLINE  -

Very happy with zfs. It works as I expected.

Incus is happy in most ways too, it sees my four projects separately, and it correctly shows the containers inside: here’s one of the projects, called ‘Friday’:

andrew@lando:~$ incus list -c nbDs4Sle  
+-----------------------+--------------+------------+---------+------+-----------+----------------------+---------+
|         NAME          | STORAGE POOL | DISK USAGE |  STATE  | IPV4 | SNAPSHOTS |     LAST USED AT     | PROJECT |
+-----------------------+--------------+------------+---------+------+-----------+----------------------+---------+
| Fastcloud-Friday      | dedup        | 300.65GiB  | STOPPED |      | 3         | 1970/01/01 01:00 BST | Friday  |
+-----------------------+--------------+------------+---------+------+-----------+----------------------+---------+
| OGSelfHosting-Friday  | dedup        | 5.37GiB    | STOPPED |      | 8         | 1970/01/01 01:00 BST | Friday  |
+-----------------------+--------------+------------+---------+------+-----------+----------------------+---------+
| SysAdmin-22-04-Friday | dedup        | 4.66GiB    | STOPPED |      | 10        | 1970/01/01 01:00 BST | Friday  |
+-----------------------+--------------+------------+---------+------+-----------+----------------------+---------+
| c1-Friday             | dedup        | 17.00KiB   | STOPPED |      | 0         | 1970/01/01 01:00 BST | Friday  |
+-----------------------+--------------+------------+---------+------+-----------+----------------------+---------+

I have a bit of an issue with the “last used” date, but technically these have not been used. It resets as soon as I run any container - that’s perfect and thus not my concern. The containers can be started/stopped/accessed so I can retrieve or inspect. That part is all great as usual.

The zpool itself has actually used still around 300G per zfs, so LOTS of free usable space on it - even without deduplication, even though there are four sets of (~the-same) data on it. Dedup factor is HIGH (x4.17, per the above), basically showing significant deduplication, as expected.

But from a storage perspective, Incus sees this very differently and this is where it breaks:

andrew@lando:~$ incus storage info dedup
info:
  description: Dedup storage pool
  driver: zfs
  name: dedup
  space used: 1.22TiB
  total space: 2.68TiB
used by:
  images:
  - 1f684cd29012a832262b5ba5f6d72060f4c20975ba571ab78c60331e99daa9db (project "reserve")
  - d57ccafc3f99e243aaadcbb7dbeea22af4ecd15e4a5df5957bff5af5837245bc (project "reserve")
  instances:
  - Fastcloud-Friday (project "Friday")
  - Fastcloud-Saturday (project "Saturday")
  - Fastcloud-Week-00 (project "Week-00")
  - Fastcloud (project "reserve")
  - OGSelfHosting-Friday (project "Friday")
  - OGSelfHosting-Saturday (project "Saturday")
  - OGSelfHosting-Week-00 (project "Week-00")
  - OGSelfHosting (project "reserve")
  - SysAdmin-22-04-Friday (project "Friday")
  - SysAdmin-22-04-Saturday (project "Saturday")
  - SysAdmin-22-04-Week-00 (project "Week-00")
  - SysAdmin-22-04 (project "reserve")
  - c1-Friday (project "Friday")
  - c1-Week-00 (project "Week-00")
  - c1 (project "reserve")
  profiles:
  - br0 (project "Saturday")
  - br0 (project "Week-00")
  - br0 (project "reserve")
  - default (project "Friday")
  - default (project "Saturday")
  - default (project "Week-00")
  - default (project "reserve")

Note the SIZE. Incus is picking up the cumulate size (1.2TB). And it stops copying when it senses “disk full”. Meanwhile zfs says “lots of space left”.

Is there a config I am missing?

I tried this on two different systems (I had time, and it was fun experimenting) - same result. I can maybe work around this by creating a separate zfs dataset for each deduplicated storage pool - basically tricking Incus into thinking it’s a series of disks and creating pool dedup1, dedup2, … dedupn all backed by the same actual disk - that might work, but it’s a little clumsy. So before I do that, I wanted to see if there’s a better fix - a missing config or some other operator error as usual?

Happy Saturday and even Happier New Year!

Andrew

What does zfs get available nvme2/dedup and zfs get used nvme2/dedup show you?

andrew@lando:~$ sudo zfs get available nvme2/dedup
NAME         PROPERTY   VALUE  SOURCE
nvme2/dedup  available  1.46T  -
andrew@lando:~$ sudo zfs get used nvme2/dedup
NAME         PROPERTY  VALUE  SOURCE
nvme2/dedup  used      1.22T  -

Thank you!

Andrew

So sounds like the used value from ZFS is incorrect, what’s the better metric to use for this case?

The ‘available’ metric is the one that reflects the pools actual residual capacity, and if any of that is ‘deduplicatable’ (which in my case it is), then the available metric won’t change much even after adding another set of containers @ 300GB (I can create and show you that if it’s useful?).

I believe the ‘used’ metric is showing what the total used space would be if it were NOT deduplicated, and that’s the one Incus defacto (in my setup) seems to be using to make decisions about whether it can copy a file to the pool. Once this ‘used’ value approaches the disk capacity, Incus sensibly stops because it’s not taking deduplication into account. Whereas ‘available’ will show a lot more free space because zfs is doing amazing work to shrink the new additions considerably (even before zfs compression does its job).

If Incus could be optionally configured to look at ‘available’ space only when it copying a file/project, it would take the impact of deduplication out of any decisions, and it could merrily copy as much as zfs says it can.

Would a configurable setting like this be workable:-?

incus config set storage <pool> zfs.freecapacity=available

Unless this is set, Incus can continue doing what it does (making decisions based on ‘used’), but this might allow dedup storage to work as zfs intended? I don’t trust my logic too much, so I apologize now if I have this wrong. I can run all kinds of tests if useful. And this is not a show-stopper of course.

THANK YOU!
Andrew

Incus uses available for the available capacity already. Total for the dataset is calculated as used+available.

1 Like

Ok, so that makes complete sense.

I clearly have another issue going on that is confusing me. My bad, as usual. Thanks again for your patience!

Andrew