Trying out `shiftfs`

Hm, not really atm.

How exactly are the zfs tools calculating disk usage. Because du seems to be working just fine, so that doesn’t look like a bug in shiftfs.

As far as I know, it’s kernel side tracking as the output is near instant and matches that of df.

@brauner Can you try to reproduce this issue with the instructions above and see if anything jumps out as far as behavior?

If things are counting twice somehow, a simple dd of 100MB should show 200MB used on zfs, hopefully making things easier to track down.

I actually tried that on my virtual setup and the results looked correct. Then I ran it again overwriting the same file to see if freeing up space was the issue but that also worked correctly.

The command I used was:

lxc exec disco -- dd if=/dev/urandom of=/swapfile bs=1M count=1K

First run:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          1.01G  1.37G  1.47G
pool/lxd/containers/disco-shiftfs  1006M  1.37G  1.45G

Second run:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          1.01G  1.39G  1.47G
pool/lxd/containers/disco-shiftfs   994M  1.39G  1.44G


Here’s some more oddness. I just ran poweroff inside of each container in preparation for rebooting my physical host.


NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco           195M  23.2G   651M
pool/lxd/containers/disco-shiftfs  3.15G  23.2G  3.60G


NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco           195M  26.1G   648M
pool/lxd/containers/disco-shiftfs   194M  26.1G   649M

Well, that’s special, so on stop it suddenly decides to sync back to reality.

@brauner any luck playing with this in a test VM to see what may tickle zfs into counting writes multiple time?

Not yet, but will soon! :slight_smile:

I finally figured out a way to more easily demonstrate the issue. This is on a newly installed virtual test system.


NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          8.08M  9.14G   484M
pool/lxd/containers/disco-shiftfs  5.82M  9.14G   487M

Create 1GB file with dd:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          1.01G  7.14G  1.47G
pool/lxd/containers/disco-shiftfs  1.01G  7.14G  1.48G

Remove the file:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          8.09M  8.14G   484M
pool/lxd/containers/disco-shiftfs  1.01G  8.14G  1.48G

Stop the containers:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          8.14M  9.14G   484M
pool/lxd/containers/disco-shiftfs  3.99M  9.14G   484M

Could this be related to the ZFS issue people still experience occasionally, where deleting files doesn’t free space until the filesystem is unmounted?

@brauner reminder to look into this soon please as this may also explain the cache corruption issue I’ve noticed on my servers and would be a bit of a problem in production.

Just to clarify, is shiftfs intended for bind-mounts the user may set up, or does it somehow affect “everything” related to filesystem operations in unprivileged containers?

I was interested in using it to share local data from my host with a process inside a container (without having to worry about ownership). But then you talk about container creation and startup time. This makes it sound like a global change under LXD’s hood, inviting caution.

The initial implementation in LXD was only for the container’s root filesystem and couldn’t be used to shift additional mounts into the containers as you’re describing.

This changed with LXD 3.16 released last month which added a shift property on the disk devices which tells LXD to use shiftfs to shift the filesystem when it appears in the container, making it easy to share data with the host.