"nodatacow" on BTRFS storage breaks compression

tarruda · November 6, 2023, 9:32am

I recently imported an old VM image from LXD into incus (used lxc export VM).

The VM disk image size was 256GB, but due to compression (I use compress=zstd mount option for btrfs) the actual used size on the host was ~ 90GB. The VM also had a “clean install” snapshot which I took when I did the initial setup and the disk image had about 8GB of used host space.

When I used incus import VM.tar.gz, I noticed something very strange: The snapshot + VM was taking the full 512GB of disk space as if ignoring the compression option I had set globally on btrfs. I confirmed this by copying the VM disk image to another BTRFS subvolume, which resulted in it using only the original 90GB.

I tracked the issue down to the “nodatacow” attribute on the VM subvolume, lsattr -d was showing C and found that it was added in December 2021, which was after I had created the VM (which is probably why my VM/snapshot never had this attribute on LXD): lxd/storage/drivers/driver/btrfs/volumes: Enable nodatacow on subvolu… · lxc/incus@2fee704 · GitHub . I would have reported this as a bug, but since it seems intentional I’d rather discuss it here.

I don’t know about other BTRFS users, but for me the “nodatacow” option does not make any noticeable difference in performance since I use NVME ssds. I think in the age of spinning disks this might have made a lot of difference, but nowadays random I/O is very fast and I’d rather have the advantages of compression.

To show much difference this makes in my case, here’s the output of df before disabling nodatacow on my imported VM

Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1  1.4T  264G  1.2T  19% /var/lib/incus

Notice I removed the snapshot, so the VM was using 256GB vs 512GB when I imported it. Here’s the output after disabling “nodatacow”, moving the disk into another subvolume and then moving it back (to force re-compression):

Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1  1.4T   62G  1.4T   5% /var/lib/incus

200GB less, a 75% reduction!

This can make a lot of difference to sysadmins which use LXD/Incus to implement their clouds, allowing them to over commit VM storage space to users.

Can we consider removing “nodatacow” or at least add a configuration option for this? It is possible to fix this manually by going into each VM subvolume and invoking chattr -C . (as I already done), but it would be much more convenient if this was an user preference and done automatically by Incus.

cuphi · November 8, 2023, 12:31pm

See Swapfile — BTRFS documentation

The swap partition in a VM disk image is really just a swap file no matter what FS the image is stored on. In order to for swap partitions in VM images to function correctly NODATACOW must to set.

tarruda · November 8, 2023, 1:05pm

The VM kernel doesn’t know its virtual device is backed by a file on the host, nor that this file is stored in a CoW filesystem. I do have VMs with swap “partitions” and never had any issues.

In any case the issue was fixed by incus: Detect btrfs compression by stgraber · Pull Request #225 · lxc/incus · GitHub