"Not supported" permission error when using Fuse+Docker inside unprivileged container

cdauth · January 21, 2022, 4:46pm

I am new to LXD and I am experiencing a problem that is too complex for my level of knowledge to even know where to start looking for a solution.

On a Fedora Server host, I run an unprivileged Fedora Server LXD container. Inside that container, I have Docker running. The docker storage (/var/lib/docker) is encrypted using gocryptfs. The problem that I’m experiencing is that inside Docker containers, root trying to access files that are owned by another user fails with a “Not supported” error.

On the host system I have root:1000000:1001000000 in both /etc/subuid and /etc/subuid, in case this has anything to do with it.

To reproduce the problem, create an LXD container running a docker container with encrypted storage like this:

lxc launch images:fedora/35 test
lxc exec test bash

Then inside the container:

# Install docker and gocryptfs
dnf -y install dnf-plugins-core
dnf config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
dnf -y install docker-ce docker-ce-cli containerd.io gocryptfs

# Set up encrypted docker storage
mkdir -p /var/lib/docker.enc /var/lib/docker
gocryptfs -init /var/lib/docker.enc
gocryptfs -allow_other /var/lib/docker.enc /var/lib/docker

# Run docker container
systemctl start docker
docker run --rm -ti alpine sh

Then inside the Docker container:

touch test
cat test # Works
chown nobody:nobody test
cat test # Fails

There are some things that I find curious about this:

Doing the same thing outside of the Docker container, but on the encrypted file system, does not yield the error.
Doing the same thing inside the Docker container, but with the Docker storage not being encrypted, does not yield the error.
Doing the same thing inside the Docker container, but with the LXD container running in privileged mode, does not yield the error.
Using encfs instead of gocryptfs yields the same error.

I suspect that this might have to do something with the UID mapping, but I don’t know where to dig further. gocryptfs itself does not show any error messages.

My questions are:

What is causing this error and how can I fix it?
Are there any alternative encryption methods that I could use? My requirement is that the password needs to be entered inside the LXD container, rather than on the host system.

stgraber · January 21, 2022, 4:54pm

Might be worth having some files with different owners in there that you can ls -l from the Docker container to see if it’s just a writing issue or a reading one too.

Also knowing the exact error for the chown may help to see if it thinks it’s a permission error or an overflow one.

cdauth · January 21, 2022, 5:03pm

Sorry for not being clear in the description. The chown actually works fine. It is the cat command that fails. I can chown the file back to root and then cat it again without problem. ls -l also works without problems.

The exact output of the cat command is:

/ # cat test
cat: can't open 'test': Not supported

This is the output of strace cat test:

/ # strace cat test
execve("/bin/cat", ["cat", "test"], 0x7fff37128398 /* 6 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7f32540fdb48) = 0
set_tid_address(0x7f32540fdf90)         = 43
brk(NULL)                               = 0x55d6e36f4000
brk(0x55d6e36f6000)                     = 0x55d6e36f6000
mmap(0x55d6e36f4000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x55d6e36f4000
mprotect(0x7f32540fa000, 4096, PROT_READ) = 0
mprotect(0x55d6e204a000, 16384, PROT_READ) = 0
getuid()                                = 0
open("test", O_RDONLY|O_LARGEFILE)      = -1 EOPNOTSUPP (Not supported)
write(2, "cat: can't open 'test': Not supp"..., 38cat: can't open 'test': Not supported
) = 38
exit_group(1)                           = ?
+++ exited with 1 +++

stgraber · January 21, 2022, 5:06pm

Hmm, that’s interesting, can you maybe strace the gocryptfs process at the time of the cat? EOPNOTSUPP is not a very common response for the kernel.

stgraber · January 21, 2022, 5:09pm

The few EOPNOTSUPP directly coming from FUSE (there may be more cases caused by the VFS itself) are centered around filesystem ACLs, xattrs and fallocate or copy_file_range, none of which should really be at play here…

cdauth · January 21, 2022, 5:21pm

I ran gocryptfs with strace and it didn’t output anything during the cat command.

When I run chacl -l test on the encrypted file system outside of a Docker container, I get an “Operation not supported” error. That seems unusual to me, I thought that even on file systems that don’t support ACLs you can at least list the current permissions like that. Interestingly, when running the LXD container in privileged mode, chacl works and can even set ACLs.

It seems that Docker is using a fuse-overlayfs file system for its storage. I could imagine that this file system is maybe trying to do something unsupported on the underlying gocryptfs. I have not found a way to debug fuse-overlayfs yet, I will investigate further.

cdauth · January 21, 2022, 5:37pm

I managed to run fuse-overlayfs (which is the Docker storage driver) manually in debug mode. This is the error that it prints during the cat:

unique: 20, opcode: GETXATTR (22), nodeid: 36608464, insize: 72, pid: 751
ovl_getxattr(ino=36608464, name=system.posix_acl_access, size=4096)
   unique: 20, error: -95 (Operation not supported), outsize: 16

With strace, this is the output:

"H\0\0\0\26\0\0\0\22\0\0\0\0\0\0\0\320I\33\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 1052672) = 72
write(2, "unique: 18, opcode: GETXATTR (22"..., 74unique: 18, opcode: GETXATTR (22), nodeid: 18565584, insize: 72, pid: 765
) = 74
write(2, "ovl_getxattr(ino=18565584, name="..., 68ovl_getxattr(ino=18565584, name=system.posix_acl_access, size=4096)
) = 68
lgetxattr("/srv/docker/fuse-overlayfs/73581a072630508eb487fa9d47071d1a312aa1611c71cc5bcdbce904ecf6be02/diff/test", "system.posix_acl_access", 0x11c0fc0, 4096) = -1 EOPNOTSUPP (Operation not supported)
write(2, "   unique: 18, error: -95 (Opera"..., 65   unique: 18, error: -95 (Operation not supported), outsize: 16
) = 65
writev(7, [{iov_base="\20\0\0\0\241\377\377\377\22\0\0\0\0\0\0\0", iov_len=16}], 1) = 16
read(7,

So the question to solve now is: Why does gocryptfs support ACL/XATTR when running in a privileged LXD container, but not in an unprivileged one?

stgraber · January 21, 2022, 6:06pm

Ok, so the issue is actually with fuse-overlayfs then, though I don’t understand why you’re using fuse-overlayfs when overlay2 works unprivileged these days.

What kernel is this all running on?

cdauth · January 21, 2022, 6:14pm

I didn’t choose fuse-overlayfs, Docker chose it for me. When I manually try to use the overlay2 driver, it fails with the error failed to mount overlay: invalid argument. The kernel says:

Jan 21 19:09:43 friedhelm.rankenste.in kernel: overlayfs: upper fs does not support tmpfile.
Jan 21 19:09:43 friedhelm.rankenste.in kernel: overlayfs: upper fs does not support RENAME_WHITEOUT.
Jan 21 19:09:43 friedhelm.rankenste.in kernel: overlayfs: upper fs missing required features.

The kernel that I’m running is 5.15.14-200.fc35.x86_64.

stgraber · January 21, 2022, 7:18pm

Ah, what’s your underlying filesystem for the container?

cdauth · January 21, 2022, 7:21pm

I have found out one more detail: When I use gocryptfs inside a Docker Rootless container (on the host system), I can reproduce the same problem. I suppose this means that the problem has nothing to do with LXD at all, but it is rather a bug in gocryptfs or in fuse when using UID mapping.

Thanks for your help investigating this. If I find out any more details, I will document them here. I have also created bug reports for gocryptfs and encfs.

cdauth · January 23, 2022, 7:59am

I have investigated more and have found a solution. Here is a summary of all the findings

When encrypting the Docker storage with gocryptfs, Docker uses the fuse-overlayfs driver instead of overlay2 due to some missing features in gocryptfs. I have reported these missing features.

When accessing a file owned by a different user, fuse-overlayfs tries to read the ACL of the file. This fails because gocryptfs does not support ACLs (unless running as non-UID-mapped root, it seems) and thus gives an “Operation not supported” error. gocryptfs’s behaviour is consistent with other file systems that do not support ACLs (for example ext4 mounted with -o noacl). I have thus reported the issue to fuse-overlayfs.

The solution has been posted in the gocryptfs bug report: gocryptfs has to be called with the -acl flag. This flag has been introduced with gocryptfs 2.0 and had been missing from the documentation so far. gocryptfs has had some basic ACL support since version 1.8.0, however it only supported getting/setting ACLs but was not actually respecting them. The -acl flag changes that and enables proper ACL handling. The challenge is that most distributions ship a version of gocryptfs older than 2.0.

stgraber · January 23, 2022, 5:24pm

Ah, nice to hear that you sorted it out and that newer gocryptfs will handle it fine!