SOLVED: How does the ZFS storage driver work, anyway?

This is really a question about this video on the LXD channel:

which I commented on, but it’s sufficiently old that I’m not sure anyone is still paying attention to the comments. In the video Stéphane demonstrates that when using the ZFS storage backend, when you spin up 100 identical Ubuntu containers, the initial storage consumed is equal to that of a single Ubuntu image, and spinning up 100 containers in ZFS is orders of magnitude faster than when using the dir backend. I just want to make sure I understand how this works and think there is a missing level of more detailed documentation on how things work that would be super helpful to admins who don’t want to just push on buttons and hope for the best. You will also get more intelligent bug reports and feature requests when things are documented like this. So, reposting my video question/comment here:

With regard to using ZFS storage, I’m not following how COW (technically, copy on redirect) either reduces the storage requirements or speeds up the creation of multiple containers based on the same image. COW only comes into play on ZFS in 3 cases: updating existing files in a dataset, when creating snapshots, and when creating clones. Since this obviously works, I’m guessing when the storage pool is ZFS, LXD is cloning a snapshot of the original image in order to spin up new containers based on that image? I guess this is OK other than you can then never delete the original image or the snapshot the clones are based on. And I can see potential disasters for users who don’t understand ZFS if they try to do zfs destroy -R to get rid of the original image (see ZFS clones: Probably not what you really want – JRS Systems: the blog). But in any case, this will only take up very little space until the first time you run apt update; apt dist-upgrade in all the containers, at which point they will diverge fairly dramatically if there are a lot of updates, not to mention that this is where you will pay the time penalty, as in addition to creating new blocks for each individual container, the storage system will be updating a ton of binary tree pointers. Am I missing something?

Yes this is exactly right.

But, you should never be doing manual operations on LXD managed volumes using the underlying storage tooling (unless something has gone wrong), otherwise, as you rightly describe, you’re in for a bad day. LXD maintains meta data in its database about the volumes and if that diverges from reality then unexpected (potentially bad) things can happen.

This same technique is used for LVM thin, BTRFS and Ceph storage drivers too.

1 Like

If you want to use full copies with LXD ZFS pools you can set zfs.clone_copy=false on the pool, e.g

lxc storage set mypool zfs.clone_copy=false

See ZFS - zfs - LXD documentation

1 Like

Thanks, Tom. This might be worth documenting somewhere, as I can see someone who has spun up a lot of containers based on the same image (and who doesn’t understand how ZFS works) being lulled into a false sense of unlimited storage; space which then goes away exponentially quickly the first time they run updates on the containers. Will mark this as SOLVED in case it’s useful to someone else.

1 Like

Would you be able/willing to open a pull request on our docs?

Yes, I can do this.

2 Likes

Ugh, only noticed several typos after submitting the pull request and now I can’t fix them.

I cant see the pull request, so I think you’ve not submitted it yet.

I did the editing online using the handy pencil “Edit this file” but it looks like it didn’t save the changes to my branch, as promised, when I submitted the changes. I’ll clone my branch repo and re-do it by command line.