Trying out `shiftfs`

brauner · August 13, 2019, 12:08pm

Hm, not really atm.

brauner · August 13, 2019, 12:14pm

How exactly are the zfs tools calculating disk usage. Because du seems to be working just fine, so that doesn’t look like a bug in shiftfs.

stgraber · August 13, 2019, 2:11pm

As far as I know, it’s kernel side tracking as the output is near instant and matches that of df.

@brauner Can you try to reproduce this issue with the instructions above and see if anything jumps out as far as behavior?

If things are counting twice somehow, a simple dd of 100MB should show 200MB used on zfs, hopefully making things easier to track down.

davidemyers · August 13, 2019, 4:27pm

I actually tried that on my virtual setup and the results looked correct. Then I ran it again overwriting the same file to see if freeing up space was the issue but that also worked correctly.

The command I used was:

lxc exec disco -- dd if=/dev/urandom of=/swapfile bs=1M count=1K

First run:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          1.01G  1.37G  1.47G
pool/lxd/containers/disco-shiftfs  1006M  1.37G  1.45G

Second run:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          1.01G  1.39G  1.47G
pool/lxd/containers/disco-shiftfs   994M  1.39G  1.44G

davidemyers · August 14, 2019, 1:02pm

Here’s some more oddness. I just ran poweroff inside of each container in preparation for rebooting my physical host.

Running:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco           195M  23.2G   651M
pool/lxd/containers/disco-shiftfs  3.15G  23.2G  3.60G

Stopped:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco           195M  26.1G   648M
pool/lxd/containers/disco-shiftfs   194M  26.1G   649M

stgraber · August 14, 2019, 1:12pm

Well, that’s special, so on stop it suddenly decides to sync back to reality.

@brauner any luck playing with this in a test VM to see what may tickle zfs into counting writes multiple time?

brauner · August 14, 2019, 2:37pm

Not yet, but will soon!

davidemyers · August 14, 2019, 4:21pm

I finally figured out a way to more easily demonstrate the issue. This is on a newly installed virtual test system.

Launch:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          8.08M  9.14G   484M
pool/lxd/containers/disco-shiftfs  5.82M  9.14G   487M

Create 1GB file with dd:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          1.01G  7.14G  1.47G
pool/lxd/containers/disco-shiftfs  1.01G  7.14G  1.48G

Remove the file:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          8.09M  8.14G   484M
pool/lxd/containers/disco-shiftfs  1.01G  8.14G  1.48G

Stop the containers:

NAME                                USED  AVAIL  REFER
pool/lxd/containers/disco          8.14M  9.14G   484M
pool/lxd/containers/disco-shiftfs  3.99M  9.14G   484M

zrav · August 28, 2019, 9:46am

Could this be related to the ZFS issue people still experience occasionally, where deleting files doesn’t free space until the filesystem is unmounted? https://github.com/zfsonlinux/zfs/issues/1548

stgraber · August 28, 2019, 6:14pm

@brauner reminder to look into this soon please as this may also explain the cache corruption issue I’ve noticed on my servers and would be a bit of a problem in production.

Adrian · September 7, 2019, 11:04pm

Just to clarify, is shiftfs intended for bind-mounts the user may set up, or does it somehow affect “everything” related to filesystem operations in unprivileged containers?

I was interested in using it to share local data from my host with a process inside a container (without having to worry about ownership). But then you talk about container creation and startup time. This makes it sound like a global change under LXD’s hood, inviting caution.

stgraber · September 7, 2019, 11:24pm

The initial implementation in LXD was only for the container’s root filesystem and couldn’t be used to shift additional mounts into the containers as you’re describing.

This changed with LXD 3.16 released last month which added a shift property on the disk devices which tells LXD to use shiftfs to shift the filesystem when it appears in the container, making it easy to share data with the host.

davidemyers · October 21, 2019, 2:29pm

I re-ran the dd and rm test from my most recent post above while running Linux bionic 5.0.0-32-generic #34~18.04.2-Ubuntu SMP Thu Oct 10 10:36:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux and the problem did not occur.

toby63 · March 2, 2020, 9:11pm

I would like to use shiftfs on Debian Testing (especially for mounting host directories into unpriviliged containers).

Unfortunately (as far as I can see) the Debian kernel does not include shiftfs, nor does Debian provide a dkms package.
And other distros I checked do not either.

So I wanted to ask for help on this.

As (you) @stgraber mentioned, it should be possible to create a dkms package for this, but I don’t know anything about how to do it.
I have only found tutorials so far, which describe how to pack an already finished module (source code) into a dkms package.
Or how to patch an already existent module.
To me this seems to be a different case (Am i wrong?).

As a second option I only see the patching and rebuilding of the kernel.

Or are there any other options I don’t see?

For the second option a few questions:

Is this the right repo: https://github.com/brauner/linux/tree/shiftfs ?
And are the commits mentioned there (three?), the complete (up to date) patches?
Can I just apply those patches or are there other things i have to consider?
My Kernel-Version in use, is 5.3.0-1-amd64 (apt-get upgrade is blocking some packages including a newer kernel).

stgraber · March 3, 2020, 8:29am

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/fs/shiftfs.c is the latest version of it from the Ubuntu kernel.

You should be able to package just that one file as a dkms module as it’s a standalone filesystem.

toby63 · March 4, 2020, 10:54pm

First of all thank you for the answer, but it does not seem to work that way.

I get an error that says (just an excerpt):
/var/lib/dkms/shiftfs/1.0/build/shiftfs.c:2003:28: error: ‘SHIFTFS_MAGIC’ undeclared (first use in this function); did you mean ‘SYSFS_MAGIC’?
2003 | if (lower_sb->s_magic == SHIFTFS_MAGIC) {
| ^~~~~~~~~~~~~
| SYSFS_MAGIC

When i looked at @brauner’s repo again, i found that he changed the “include/uapi/linux/magic.h”-file, which defines the SHIFTFS_MAGIC parameter.
commit that changed magic.h: https://github.com/brauner/linux/commit/0773ba9f1439a2430dd987a5f6211b47f68b17a7

The solution to this might be obvious to you, but I don’t know if I can simply change that file or treat it like a module?

stgraber · March 4, 2020, 11:34pm

@brauner any input on dkms for shiftfs?

Sound like this one case may be as simple as adding that define in the copied shiftfs.c or maybe even through the compiler so you don’t need to alter shiftfs.c at all.

brauner · March 5, 2020, 10:48am

That would be a patch for magic.h but you can probably just define this locally in shiftfs.c and call it a day.

toby63 · March 6, 2020, 11:07pm

@brauner @stgraber:
Thanks to both of you.
It is now working (I also tested it within containers).

Some additional questions:

Would you mind if I set up a github repo for shiftfs-dkms?
Even though the setup is fairly easy (just as you said), for users like me, it would probably be a good help.

I would not necessarily include shiftfs.c; I would just put a link into the Readme.
So people would always use the most recent version.

I am also working on a debian package, but i guess more professional users should do it for the official debian repos.

1.a) I guess the trick with putting the SHIFTFS_MAGIC parameter into shiftfs.c is not so good for packaging?

Am i right to assume that the disk device (with shift: true) always “mounts” the folder for/to the standard user (in my case: uid 1000)?
And is it possible to change that?

stgraber · March 6, 2020, 11:16pm

Having a Github repo with the needed dkms bits sounds good, go ahead!
You can include all that’s needed and maybe just put a script to re-sync things.

The SHIFTFS_MAGIC bit could probably be passed as a define directly to the compiler to avoid needing any code change at all.

As for how shiftfs work, we don’t shift to a particular user, shiftfs instead converts between user namespace ranges. So when you pass a mount with shift=true, the uid/gid you see in the container are now identical to what you would see outside of the container. It’s not tied to a single uid/gid.