Difference between security.syscalls.intercept.mount.* config options?

amcduffee · April 17, 2020, 5:45pm

From doc/instances.md:

Key Type Default Live update Condition Description

security.syscalls.intercept.mount boolean false no container Handles the mount system call

security.syscalls.intercept.mount.allowed string - yes container Specify a comma-separated list of filesystems that are safe to mount for processes inside the instance

security.syscalls.intercept.mount.fuse string - yes container Whether to mount shiftfs on top of filesystems handled through mount syscall interception

security.syscalls.intercept.mount.shift boolean false yes container Whether to redirect mounts of a given filesystem to their fuse implemenation (e.g. ext4=fuse2fs)

I have a couple questions:

Are the ‘mount.allowed’ and ‘mount.fuse’ options mutually exclusive? In other words, if ‘ext4’ is in the ‘mount.allowed’ list then ‘ext4=fuse2fs’ can’t be in ‘mount.fuse’ also, right?
Are the comments for ‘mount.fuse’ and ‘mount.shift’ swapped? The one for shift makes mention of fuse and the one for fuse makes mention of shiftfs.

stgraber · April 17, 2020, 6:31pm

Correct, you can’t have a filesystem in both allowed and fuse.

stgraber · April 17, 2020, 6:32pm

Yes, looks like the comments have been swapped

Sending a branch to fix that now.

amcduffee · April 17, 2020, 6:55pm

Thanks for the confirmation.

Is there any extra documentation in regards to when/why to use allowed vs fuse? I presume fuse is a bit safer but at the cost of performance?

Also, I set security.syscalls.intercept.mount=true on a container and how it won’t start:

$ lxc start test-mount
Error: Common start logic: System doesn’t support syscall interception

I can see easily from ‘lxc info’ that it is probably due to the kernel version not supporting the seccomp_listener feature:

kernel_version: 4.15.0-66-generic
kernel_features:
seccomp_listener: “false”
seccomp_listener_continue: “false”
lxc_features:
seccomp_notify: “true”

Is there an easy way to identify which Ubuntu kernel versions (HWE included) have support for seccomp_listener? I have been sifting through changelogs, but it isn’t very clear.

I know that 5.3.0-40-generic through HWE does support it, but only because I have a running machine I could check it on. What about if I need to answer this question for a kernel version that isn’t actively running somewhere that I can do ‘lxc info’ to see?

amcduffee · April 17, 2020, 8:11pm

I don’t know if I am on a fool’s errand trying to get an NFS mount to work using:

security.syscalls.intercept.mount: “true”
security.syscalls.intercept.mount.allowed: nfs

I am on the machine with the 5.3.0-40-generic kernel and the container starts. However, when I run the following mount command in the container it just hangs:

mount nfs-server:/data /data

The container log shows the following:

lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:522 - Failed to resolve syscall “fsinfo”
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:523 - This syscall will NOT be handled by seccomp
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:522 - Failed to resolve syscall “fsinfo”
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:523 - This syscall will NOT be handled by seccomp
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:522 - Failed to resolve syscall “fsinfo”
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:523 - This syscall will NOT be handled by seccomp
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:522 - Failed to resolve syscall “fsinfo”
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:523 - This syscall will NOT be handled by seccomp
lxc test-intercept-mount 20200417200313.518 ERROR cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1143 - File exists - Failed to create directory “/sys/fs/cgroup/cpuset//lxc.monitor.test-intercept-mount”
lxc test-intercept-mount 20200417200313.519 ERROR cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1143 - File exists - Failed to create directory “/sys/fs/cgroup/cpuset//lxc.payload.test-intercept-mount”
lxc test-intercept-mount 20200417200313.520 ERROR utils - utils.c:lxc_can_use_pidfd:1855 - Invalid argument - Kernel does not support waiting on processes through pidfds
lxc test-intercept-mount 20200417200313.522 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1455 - No such file or directory - Failed to fchownat(17, memory.oom.group, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )

stgraber · April 17, 2020, 8:25pm

No easy way to tell, that support was backported to some but not all kernels, so we can’t even tell you what version would have it

stgraber · April 17, 2020, 8:27pm

So NFS is probably one of the most annoying filesystems to run through interception because of its network component and built-in uid/gid concept, so hmm, that may be a fun one.

I’m not sure why things would hang though, I’d just expect outright failure, or success with a potentially weird mount point.

@brauner

amcduffee · April 17, 2020, 8:28pm

I suspected that was the case, thanks for confirming it.

brauner · April 17, 2020, 8:38pm

Can I see the lxd debug log? That would log exactly what was executed and how…

amcduffee · April 17, 2020, 8:40pm

Sure, is that somewhere under /var/snap/lxd?

brauner · April 17, 2020, 8:42pm

@stgraber, what are the command to switch lxd into debug logging mode the snap?

brauner · April 17, 2020, 8:50pm

Ok, so you’d need to:

snap set lxd daemon.debug=true
systemctl reload snap.lxd.daemon

Then do the mount interception and give me:

/var/snap/lxd/common/lxd/logs/lxd.log

amcduffee · April 17, 2020, 8:50pm

I think the issue was actually a DNS resolution issue:

$ lxc exec test-intercept-mount – mount nfs-server:/data /data
mount.nfs: Connection timed out
$ lxc exec test-intercept-mount – ping nfs-server
ping: nfs-server: Temporary failure in name resolution

Sorry for the confusion!

amcduffee · April 17, 2020, 8:50pm

I replaced the hostname with the IP address and everything is working as expected.

stgraber · April 17, 2020, 8:53pm

That’s, hmm, surprising but nice I guess

Aren’t all the uid/gid looking wonky though or are you using shiftfs on top of that?

amcduffee · April 17, 2020, 9:10pm

It seems to work fine with nobody:nogroup ownership and the ‘all_squash’ option set on the server side export.

I disabled ‘all_squash’ on the server side and did chmod o+w on the exported path temporarily. Touching a file from within the container (client side) worked and shows as follows:

# touch test
# ls -la
total 3
drwxr-xrwx 2 nobody nogroup 3 Apr 17 21:00 .
drwxr-xr-x 22 root root 22 Apr 7 15:52 ..
-rw-rw-rw- 1 root root 0 Apr 17 21:04 test

On the server side the directory shows:

$ ls -la
total 2
drwxr-xrwx 2 nobody nogroup 3 Apr 17 14:00 .
drwxr-xr-x 3 root root 3 Apr 17 10:11 ..
-rw-rw-rw- 1 100000 100000 0 Apr 17 14:00 test

So, nothing appears abnormal as far as I can see.

stgraber · April 17, 2020, 10:55pm

Ok, that worked better than I expected then

amcduffee · April 17, 2020, 11:17pm

It largely seems to be working well.

Here is one funny where it appears the umask isn’t being honored. All commands below were executed from within the container and under the NFS mounted path:

# umask
0022
# touch test
# ls -la test
-rw-rw-rw- 1 root root 0 Apr 17 23:11 test
# rm test
# umask 0027
# umask
0027
# touch test
# ls -la test
-rw-rw-rw- 1 root root 0 Apr 17 23:11 test

Am I just misunderstanding a behavior of LXD and/or NFS or is the above not correct?

stgraber · April 17, 2020, 11:45pm

I’m not sure why the umask wouldn’t be respected, it may be a NFS thing.

Key	Type	Default	Live update	Condition	Description
security.syscalls.intercept.mount	boolean	false	no	container	Handles the mount system call
security.syscalls.intercept.mount.allowed	string	-	yes	container	Specify a comma-separated list of filesystems that are safe to mount for processes inside the instance
security.syscalls.intercept.mount.fuse	string	-	yes	container	Whether to mount shiftfs on top of filesystems handled through mount syscall interception
security.syscalls.intercept.mount.shift	boolean	false	yes	container	Whether to redirect mounts of a given filesystem to their fuse implemenation (e.g. ext4=fuse2fs)