SOLVED: How does the ZFS storage driver work, anyway?

pgoetz · November 25, 2022, 10:16am

This is really a question about this video on the LXD channel:

which I commented on, but it’s sufficiently old that I’m not sure anyone is still paying attention to the comments. In the video Stéphane demonstrates that when using the ZFS storage backend, when you spin up 100 identical Ubuntu containers, the initial storage consumed is equal to that of a single Ubuntu image, and spinning up 100 containers in ZFS is orders of magnitude faster than when using the dir backend. I just want to make sure I understand how this works and think there is a missing level of more detailed documentation on how things work that would be super helpful to admins who don’t want to just push on buttons and hope for the best. You will also get more intelligent bug reports and feature requests when things are documented like this. So, reposting my video question/comment here:

With regard to using ZFS storage, I’m not following how COW (technically, copy on redirect) either reduces the storage requirements or speeds up the creation of multiple containers based on the same image. COW only comes into play on ZFS in 3 cases: updating existing files in a dataset, when creating snapshots, and when creating clones. Since this obviously works, I’m guessing when the storage pool is ZFS, LXD is cloning a snapshot of the original image in order to spin up new containers based on that image? I guess this is OK other than you can then never delete the original image or the snapshot the clones are based on. And I can see potential disasters for users who don’t understand ZFS if they try to do zfs destroy -R to get rid of the original image (see ZFS clones: Probably not what you really want – JRS Systems: the blog). But in any case, this will only take up very little space until the first time you run apt update; apt dist-upgrade in all the containers, at which point they will diverge fairly dramatically if there are a lot of updates, not to mention that this is where you will pay the time penalty, as in addition to creating new blocks for each individual container, the storage system will be updating a ton of binary tree pointers. Am I missing something?

tomp · November 25, 2022, 10:22am

Yes this is exactly right.

But, you should never be doing manual operations on LXD managed volumes using the underlying storage tooling (unless something has gone wrong), otherwise, as you rightly describe, you’re in for a bad day. LXD maintains meta data in its database about the volumes and if that diverges from reality then unexpected (potentially bad) things can happen.

This same technique is used for LVM thin, BTRFS and Ceph storage drivers too.

tomp · November 25, 2022, 10:25am

If you want to use full copies with LXD ZFS pools you can set zfs.clone_copy=false on the pool, e.g

lxc storage set mypool zfs.clone_copy=false

See ZFS - zfs - LXD documentation

pgoetz · November 25, 2022, 10:33am

Thanks, Tom. This might be worth documenting somewhere, as I can see someone who has spun up a lot of containers based on the same image (and who doesn’t understand how ZFS works) being lulled into a false sense of unlimited storage; space which then goes away exponentially quickly the first time they run updates on the containers. Will mark this as SOLVED in case it’s useful to someone else.

tomp · November 25, 2022, 10:34am

Would you be able/willing to open a pull request on our docs?

github.com

lxc/lxd/blob/master/doc/reference/storage_zfs.md#limitations

(storage-zfs)=
# ZFS - `zfs`

```{youtube} https://www.youtube.com/watch?v=ysLi_LYAs_M
```

{abbr}`ZFS (Zettabyte file system)` combines both physical volume management and a file system.
A ZFS installation can span across a series of storage devices and is very scalable, allowing you to add disks to expand the available space in the storage pool immediately.

ZFS is a block-based file system that protects against data corruption by using checksums to verify, confirm and correct every operation.
To run at a sufficient speed, this mechanism requires a powerful environment with a lot of RAM.

In addition, ZFS offers snapshots and replication, RAID management, copy-on-write clones, compression and other features.

To use ZFS, make sure you have `zfsutils-linux` installed on your machine.

## Terminology

ZFS creates logical units based on physical storage devices.
These logical units are called *ZFS pools* or *zpools*.

This file has been truncated. show original

pgoetz · November 25, 2022, 10:35am

Yes, I can do this.

pgoetz · November 25, 2022, 11:37am

Ugh, only noticed several typos after submitting the pull request and now I can’t fix them.

tomp · November 25, 2022, 12:43pm

I cant see the pull request, so I think you’ve not submitted it yet.

pgoetz · November 25, 2022, 5:14pm

I did the editing online using the handy pencil “Edit this file” but it looks like it didn’t save the changes to my branch, as promised, when I submitted the changes. I’ll clone my branch repo and re-do it by command line.