Ok, so this was fun to track down again. The problem is that by default RHEL7 and friends disallow a) creation of user namespaces and b) creation of mount namespaces in user namespaces. Both are problematic for unprivileged containers. To solve a) you have to boot your kernel with user_namespace.enable=1
on the kernel command line. Unless there’s a sysctl
I’m unaware of.
Now, about b). I took a closer look at the RHEL7 kernel sources. (Getting at the kernel sources for RHEL7 is… interesting. I think the only solid way is to use the Centos 7 kernel sources. That’s what I did.) If you look at:
linux-3.10.0-693.2.2.el7/fs/namespace.c
which is where new mount namespaces are created/copied etc. you’ll see:
/* namespace.unpriv_enable = 1 */
static bool enable_unpriv_mnt_ns_creation;
module_param_named(unpriv_enable, enable_unpriv_mnt_ns_creation, bool, 0444);
MODULE_PARM_DESC(unpriv_enable, "Enable unprivileged creation of mount namespaces");
struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
struct user_namespace *user_ns, struct fs_struct *new_fs)
{
struct mnt_namespace *new_ns;
struct vfsmount *rootmnt = NULL, *pwdmnt = NULL;
struct mount *p, *q;
struct mount *old;
struct mount *new;
int copy_flags;
BUG_ON(!ns);
if (likely(!(flags & CLONE_NEWNS))) {
get_mnt_ns(ns);
return ns;
}
/* Unprivileged creation currently tech preview in RHEL7 */
if (user_ns != &init_user_ns) {
static int __read_mostly called_mark_tech_preview = 0;
if (!enable_unpriv_mnt_ns_creation) {
return ERR_PTR(-EPERM);
}
if (!called_mark_tech_preview &&
!xchg(&called_mark_tech_preview, 1))
mark_tech_preview("unpriv mount namespace", NULL);
}
old = ns->root;
new_ns = alloc_mnt_ns(user_ns);
if (IS_ERR(new_ns))
return new_ns;
namespace_lock();
/* First pass: copy the tree topology */
copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE;
if (user_ns != ns->user_ns)
copy_flags |= CL_SHARED_TO_SLAVE | CL_UNPRIVILEGED;
new = copy_tree(old, old->mnt.mnt_root, copy_flags);
if (IS_ERR(new)) {
namespace_unlock();
free_mnt_ns(new_ns);
return ERR_CAST(new);
}
new_ns->root = new;
list_add_tail(&new_ns->list, &new->mnt_list);
/*
* Second pass: switch the tsk->fs->* elements and mark new vfsmounts
* as belonging to new namespace. We have already acquired a private
* fs_struct, so tsk->fs->lock is not needed.
*/
p = old;
q = new;
while (p) {
q->mnt_ns = new_ns;
if (new_fs) {
if (&p->mnt == new_fs->root.mnt) {
new_fs->root.mnt = mntget(&q->mnt);
rootmnt = &p->mnt;
}
if (&p->mnt == new_fs->pwd.mnt) {
new_fs->pwd.mnt = mntget(&q->mnt);
pwdmnt = &p->mnt;
}
}
p = next_mnt(p, old);
q = next_mnt(q, new);
if (!q)
break;
while (p->mnt.mnt_root != q->mnt.mnt_root)
p = next_mnt(p, old);
}
namespace_unlock();
if (rootmnt)
mntput(rootmnt);
if (pwdmnt)
mntput(pwdmnt);
return new_ns;
}
which means that RHEL7 is indead blocking the creation of mount namespaces in user namespaces. But it seems similar to the user namespace command line option there’s a command line option to enable creation of mount namespaces in user namespaces namespace.unpriv_enable = 1
.
TL;DR, boot your kernel with user_namespace.enable=1 namespace.unpriv_enable = 1
and you should be good to go.