Is it possible to mount selinuxfs on unprivileged container?

Hello

I wonder whether it is possible to mount selinuxfs or read pci on unprivileged container.

I’ve been trying to run some low-level-related applications on an unprivileged container.
(It can run on a privileged container.)

When I started this,even on an unprivileged container, I assumed that applications could get a privileged escalation by using capabilities.

Just simple concept…
If user could do something, unprivileged container mapped by subuid/subgid of user could do them.

But after reading the next article and kernel code, I’ve realized that an unprivileged container could not be used for an application which requires a privileged escalation.
No matter what user’s capabilities are, a new user namespace is supposed to get a complete set of capabilities. however it doesn’t seem to allow to get a privileged escalation in the point of host’s view.

article

kernel code shows capable() is supposed to check capabilities only for init_user_ns.

bool capable(int cap)
{
    return ns_capable(&init_user_ns, cap);
}

So when capable() is called directly instead of ns_capable(), user_namespace’s capabilities seems to be useless.

For example, when it comes to system call for pciconfig_write or pciconfig_read, it seems to be impossible that unprivileged container does those systemcalls because there is a “capable(CAP_SYS_ADMIN)” .

In order to mount selinuxfs or do mknod, is it enough to apply the namespaced file capabilities mentioned above ?

Current version of kernel is 4.9.x and the name-spaced file capabilities is not applied.
The test result on unprivileged container is like the next.

lxc-u0@arm:~$  lxc-create --version
3.0.3
lxc-u0@arm:~$  id
uid=10000(lxc-u0) gid=10000(lxc-u0) groups=10000(lxc-u0),20000(lxc)
lxc-u0@arm:~$ grep cap /tmp/lxc/b1_lxc-u0/config
#lxc.cap.drop = sys_module mac_admin mac_override sys_time
lxc-u0@arm:~$  ps axf | grep -A 4 lxc-u[0]
 1431 ?        Ss     0:00 [lxc monitor] /tmp/lxc b1_lxc-u0
 1436 pts/1    Ss+    0:00  \_ /sbin/init
lxc-u0@arm:~$
lxc-u0@arm:~$ cat /proc/1436/status | grep -i cap
CapInh: 0000003fffffffff
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000003fffffffff
lxc-u0@arm:~$ cat /proc/1436/status | grep -i -E "(uid|cap)"
Uid:    165536  165536  165536  165536
CapInh: 0000003fffffffff
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000003fffffffff
lxc-u0@arm:~$ cat /proc/self/status  | grep -i -E "(uid|cap)"
Uid:    10000   10000   10000   10000
CapInh: 0000003fffe3ffff
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
lxc-u0@arm:~$ lxc-attach -nb1_lxc-u0 --clear-env -P/tmp/lxc -- /bin/cat /proc/self/status  | grep -i -E "(uid|cap)"
Uid:    0       0       0       0
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
lxc-u0@arm:~$   lxc-attach -nb1_lxc-u0 --clear-env -P/tmp/lxc
/ #
/ # id
uid=0(root) gid=0(root)
/ #
/ # mkdir aa
/ # mount -t tmpfs none aa
/ #  mount | grep aa
none on /aa type tmpfs (rw,relatime,uid=165536,gid=165536)
/ #
/ #
/ # mkdir bb
/ # mount -t selinuxfs none bb
mount: permission denied (are you root?)
/ #
/ # /usr/sbin/mount.util-linux -t selinuxfs none bb
mount.util-linux: permission denied
/ #
/ # /usr/sbin/getcap /usr/sbin/mount.util-linux
/ # /usr/sbin/getfattr -d -m . /usr/sbin/mount.util-linux
/ #
/ #
/ # echo "root do setcap"
root do setcap
/ # 
/ # /usr/sbin/getcap /usr/sbin/mount.util-linux
/usr/sbin/mount.util-linux = cap_sys_admin,cap_mac_override,cap_mac_admin+ei
/ # /usr/sbin/getfattr -d -m . /usr/sbin/mount.util-linux
getfattr: Removing leading '/' from absolute path names
# file: usr/sbin/mount.util-linux
security.capability=0sAQAAAgAAAAAAACAAAAAAAAMAAAA=

/ #
/ # /usr/sbin/mount.util-linux -t selinuxfs none bb
mount.util-linux: permission denied
/ #
/ #
/ # umount aa
/ #
/ #
/ #
/ #
/ # /usr/sbin/mount.util-linux -t devtmpfs none aa
mount.util-linux: permission denied
/ #
/ #
/ #
/ # /bin/busybox mknod vbb b 254 16
mknod: vbb: Operation not permitted
/ # /usr/sbin/getcap /bin/busybox
/ #
/ # echo "root do setcap"
root do setcap
/ # /usr/sbin/getcap /bin/busybox
/bin/busybox = cap_sys_admin,cap_mknod+eip
/ #
/ # /bin/busybox mknod vbb b 254 16
mknod: vbb: Operation not permitted
/ #
/ # uname -r
4.9.13
/ #

Happy New Year !!

Thanks