Assuming that root in your unprivileged container maps to 1000000 outside, it is absolutely correct for nsowner to be 1000000 on that file. It being 0 would actually be a security issue.
Can you test that file capabilities actually function in your container?
# cp /bin/ping /tmp
# su - x -c "/tmp/ping google.com"
/tmp/ping: socket: Address family not supported by protocol
# setcap cap_net_raw+ep /tmp/ping
# su - x -c "/tmp/ping google.com"
PING google.com (142.250.74.206) 56(84) bytes of data.
Recent kernels allow for capabilities to be used inside of user namespaces, when that happens, the uid of root (0) inside the user namespace is stored as part of the v3 capability format in the xattr.
That’s the nsowner you see in your output which is 1000000 in your case, indicating that uid 0 in your container is real user 1000000 outside of it.
The setcap/getcap kernel calls behave as expected, in this case it looks like libcap is going one step further and validating the on-disk xattr which does indeed include that id.
The kernel doesn’t mangle xattrs when read from within the container, so long as you’re allowed to read the xattr, you see its raw unmodified value.
I suspect libcap will need to learn that and if it sees a nsowner that’s not 0, then check whether the nsowner matches root in the current user namespace (which it can do by parsing /proc/self/uid_map).
Serge Hallyn says that, on his machine, the host finds v3 capabilities on a container file, while the container sees v2 capabilities only. That may well be why it works as expected for him.
From capabilities(7):
“Correspondingly, when a version 3 security.capability attribute is retrieved (getxattr(2)) by a process that resides inside a user namespace that was created by the root user ID (or a descendant of that user namespace), the returned attribute is (automatically) simplified to appear as a version 2 attribute (i.e., the returned value is the size of a version 2 attribute and does not include the root user ID). These automatic translations mean that no changes are required to user-space tools (e.g., setcap (1) and getcap (1)) in order for those tools to be used to create and retrieve version 3 security.capability attributes.”
Yeah, the kernel handles the normal getcap case. There are ways to get to the low level v3 cap struct, but stracing getcap, it seems to be doing the right thing here at least.
root@shell01:~# setcap cap_net_raw=pe a
root@shell01:~# setcap -v cap_net_raw=pe a
a: OK
root@shell01:~# uname -a
Linux shell01 5.4.0-40-generic #44~18.04.1-Ubuntu SMP Wed Jun 24 23:13:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
ok, here I’m on 2.25 and 2.32, I’ve got a gentoo container running emerge now to get it installed so I can see if the newer versions are what’s getting confused somehow or if it’s a kernel issue.
Have a look at the linked bugzilla report: Serge Hallyn posted a C source to tell which cap version we see, and in my case I see v3 within the container. If I understand correctly the manpage, I should see v2.