Difference between security.syscalls.intercept.mount.* config options?

From doc/instances.md:

Key Type Default Live update Condition Description
security.syscalls.intercept.mount boolean false no container Handles the mount system call
security.syscalls.intercept.mount.allowed string - yes container Specify a comma-separated list of filesystems that are safe to mount for processes inside the instance
security.syscalls.intercept.mount.fuse string - yes container Whether to mount shiftfs on top of filesystems handled through mount syscall interception
security.syscalls.intercept.mount.shift boolean false yes container Whether to redirect mounts of a given filesystem to their fuse implemenation (e.g. ext4=fuse2fs)

I have a couple questions:

  1. Are the ‘mount.allowed’ and ‘mount.fuse’ options mutually exclusive? In other words, if ‘ext4’ is in the ‘mount.allowed’ list then ‘ext4=fuse2fs’ can’t be in ‘mount.fuse’ also, right?
  2. Are the comments for ‘mount.fuse’ and ‘mount.shift’ swapped? The one for shift makes mention of fuse and the one for fuse makes mention of shiftfs.

Correct, you can’t have a filesystem in both allowed and fuse.

Yes, looks like the comments have been swapped :slight_smile:

Sending a branch to fix that now.

Thanks for the confirmation.

Is there any extra documentation in regards to when/why to use allowed vs fuse? I presume fuse is a bit safer but at the cost of performance?

Also, I set security.syscalls.intercept.mount=true on a container and how it won’t start:

$ lxc start test-mount
Error: Common start logic: System doesn’t support syscall interception

I can see easily from ‘lxc info’ that it is probably due to the kernel version not supporting the seccomp_listener feature:

kernel_version: 4.15.0-66-generic
kernel_features:
seccomp_listener: “false”
seccomp_listener_continue: “false”
lxc_features:
seccomp_notify: “true”

Is there an easy way to identify which Ubuntu kernel versions (HWE included) have support for seccomp_listener? I have been sifting through changelogs, but it isn’t very clear.

I know that 5.3.0-40-generic through HWE does support it, but only because I have a running machine I could check it on. What about if I need to answer this question for a kernel version that isn’t actively running somewhere that I can do ‘lxc info’ to see?

I don’t know if I am on a fool’s errand trying to get an NFS mount to work using:

security.syscalls.intercept.mount: “true”
security.syscalls.intercept.mount.allowed: nfs

I am on the machine with the 5.3.0-40-generic kernel and the container starts. However, when I run the following mount command in the container it just hangs:

mount nfs-server:/data /data

The container log shows the following:

lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:522 - Failed to resolve syscall “fsinfo”
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:523 - This syscall will NOT be handled by seccomp
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:522 - Failed to resolve syscall “fsinfo”
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:523 - This syscall will NOT be handled by seccomp
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:522 - Failed to resolve syscall “fsinfo”
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:523 - This syscall will NOT be handled by seccomp
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:522 - Failed to resolve syscall “fsinfo”
lxc test-intercept-mount 20200417200313.518 WARN seccomp - seccomp.c:do_resolve_add_rule:523 - This syscall will NOT be handled by seccomp
lxc test-intercept-mount 20200417200313.518 ERROR cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1143 - File exists - Failed to create directory “/sys/fs/cgroup/cpuset//lxc.monitor.test-intercept-mount”
lxc test-intercept-mount 20200417200313.519 ERROR cgfsng - cgroups/cgfsng.c:mkdir_eexist_on_last:1143 - File exists - Failed to create directory “/sys/fs/cgroup/cpuset//lxc.payload.test-intercept-mount”
lxc test-intercept-mount 20200417200313.520 ERROR utils - utils.c:lxc_can_use_pidfd:1855 - Invalid argument - Kernel does not support waiting on processes through pidfds
lxc test-intercept-mount 20200417200313.522 WARN cgfsng - cgroups/cgfsng.c:fchowmodat:1455 - No such file or directory - Failed to fchownat(17, memory.oom.group, 65536, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )

No easy way to tell, that support was backported to some but not all kernels, so we can’t even tell you what version would have it :slight_smile:

So NFS is probably one of the most annoying filesystems to run through interception because of its network component and built-in uid/gid concept, so hmm, that may be a fun one.

I’m not sure why things would hang though, I’d just expect outright failure, or success with a potentially weird mount point.

@brauner

I suspected that was the case, thanks for confirming it.

Can I see the lxd debug log? That would log exactly what was executed and how…

Sure, is that somewhere under /var/snap/lxd?

@stgraber, what are the command to switch lxd into debug logging mode the snap?

Ok, so you’d need to:

snap set lxd daemon.debug=true
systemctl reload snap.lxd.daemon

Then do the mount interception and give me:

/var/snap/lxd/common/lxd/logs/lxd.log

I think the issue was actually a DNS resolution issue:

$ lxc exec test-intercept-mount – mount nfs-server:/data /data
mount.nfs: Connection timed out
$ lxc exec test-intercept-mount – ping nfs-server
ping: nfs-server: Temporary failure in name resolution

Sorry for the confusion!

I replaced the hostname with the IP address and everything is working as expected.

That’s, hmm, surprising but nice I guess :slight_smile:

Aren’t all the uid/gid looking wonky though or are you using shiftfs on top of that?

It seems to work fine with nobody:nogroup ownership and the ‘all_squash’ option set on the server side export.

I disabled ‘all_squash’ on the server side and did chmod o+w on the exported path temporarily. Touching a file from within the container (client side) worked and shows as follows:

# touch test
# ls -la
total 3
drwxr-xrwx 2 nobody nogroup 3 Apr 17 21:00 .
drwxr-xr-x 22 root root 22 Apr 7 15:52 ..
-rw-rw-rw- 1 root root 0 Apr 17 21:04 test

On the server side the directory shows:

$ ls -la
total 2
drwxr-xrwx 2 nobody nogroup 3 Apr 17 14:00 .
drwxr-xr-x 3 root root 3 Apr 17 10:11 ..
-rw-rw-rw- 1 100000 100000 0 Apr 17 14:00 test

So, nothing appears abnormal as far as I can see.

Ok, that worked better than I expected then :slight_smile:

It largely seems to be working well.

Here is one funny where it appears the umask isn’t being honored. All commands below were executed from within the container and under the NFS mounted path:

# umask
0022
# touch test
# ls -la test
-rw-rw-rw- 1 root root 0 Apr 17 23:11 test
# rm test
# umask 0027
# umask
0027
# touch test
# ls -la test
-rw-rw-rw- 1 root root 0 Apr 17 23:11 test

Am I just misunderstanding a behavior of LXD and/or NFS or is the above not correct?

I’m not sure why the umask wouldn’t be respected, it may be a NFS thing.