Hello, I recently started having a problem with unprivileged containers on a bare-metal Proxmox (Debian Bullseye.)
I’ve had LXCs running for more than a year without issue. Then about 10 days ago they would not start. Creating new privileged containers works fine. VMs run fine. No other system issues that I see other than this. Underlying filesystems are zfs and scrub without issue. Nothing glaring in journalctl or dmesg other than indications directly related to the problem (which I’ll add below.)
I opened a thread on Proxmox forums and there’s a lot of details about the system config and errors, and followup from support personnel, but no solution as yet.
I can’t really correlate the problem with anything I’ve done on the system other than some hardware changes (PCIe card swaps for testing a U.2 SSD and an old Radeon GPU, enabling a 2nd nic on the motherboard, adding an additional spinning HD, etc.)
At the time, the system was mostly (within a couple weeks) updated with pve/debian packages, and is fully up-to-date now. No difference in resolving the issue.
Today I hit up irc and someone there very helpfully walked me through a bunch of trouble-shooting steps and asked me to followup here.
I will try to summarize but a lot of the walkthru was new territory for me, so please bear with me. Mostly this is just a copy/paste, some is edited for brevity.
<gl0woo> error on startup >> ()lxc_spawn: 1734 Operation not permitted - Failed to clone a new set of namespaces <gl0woo> also cannot create new unprivileged lxc >> ../src/lxc/cmd/lxc_usernsexec.c: 407: main - Operation not permitted - Failed to unshare mount and user namespace <gl0woo> & >> ../src/lxc/cmd/lxc_usernsexec.c: 452: main - Inappropriate ioctl for device - Failed to read from pipe file descriptor 3 <gl0woo> lxc-checkconfig >> LXC version 5.0.0 <gl0woo> the issues that program finds are as follows (seems the rest is ok) <gl0woo> Cgroup v1 systemd controller: missing <gl0woo> Cgroup v1 freezer controller: missing <gl0woo> Cgroup ns_cgroup: required
<amikhalitsyn> gl0woo: did you perform any update of your system recently (kernel, userspace)? Please check cat /proc/sys/kernel/unprivileged_userns_clone <gl0woo> i believe the problems started after a reboot, having made some hardware changes, swapping pcie cards, enabling a 2nd nic on the m/b <gl0woo> but no system package updates <gl0woo> i have since updated packages, debian bullseye <gl0woo> 'uname -a' >> Linux host 5.15.74-1-pve #1 SMP PVE 5.15.74-1 (Mon, 14 Nov 2022 20:17:15 +0100) x86_64 GNU/Linux <gl0woo> the contents of that file is '1' <gl0woo> >> root@host:~# ls -laFtrh /proc/sys/kernel/unprivileged_userns_clone <gl0woo> >> -rw-r--r-- 1 root root 0 Dec 1 07:02 /proc/sys/kernel/unprivileged_userns_clone
<gl0woo> there are some related messages seen with journalctl, these are the last four... <gl0woo> Dec 01 02:42:07 host systemd: firstname.lastname@example.org: Main process exited, code=exited, status=1/FAILURE <gl0woo> Dec 01 02:42:07 host systemd: email@example.com: Failed with result 'exit-code'. <gl0woo> Dec 01 02:42:15 host pvestatd: modified cpu set for lxc/123: 0-3 <gl0woo> Dec 01 02:42:15 host pvestatd: failed to open '/sys/fs/cgroup/lxc/123/cpuset.cpus' - Permission denied
<amikhalitsyn> gl0woo: so, probably you had kernel package update long time before reboot. So, after reboot you've got into the new kernel. <gl0woo> that's possible <gl0woo> but of all the people running proxmox i'm seemingly the only one with the issue. i've had a thread over there open for a week.
<amikhalitsyn> please, recheck and confirm that you have "1" in /proc/sys/kernel/unprivileged_userns_clone <amikhalitsyn> EPERM from unshare most probably comes from the check where this sysctl knob is involved <snip> <gl0woo> cat shows it contains '1'
<amikhalitsyn> okay, "1" is good. Strange. Then please check dmesg | grep audit, possibly you will notice some denials <gl0woo> [ 1652.595589] audit: type=1400 audit(1669903901.683:21): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-123_</var/lib/lxc>" pid=32783 comm="apparmor_parser" <gl0woo> [ 1652.745389] audit: type=1400 audit(1669903901.835:22): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-123_</var/lib/lxc>" pid=32785 comm="apparmor_parser" <amikhalitsyn> try dmesg | grep DENIED <gl0woo> nothing for that
<amikhalitsyn> okay, then we can just perform tracing of your kernel to understand what happens (-: <gl0woo> ok <amikhalitsyn> https://gist.github.com/mihalicyn/586a8650ca4ca782cf09a23f19cb0db2
<gl0woo> https://pastebin.com/ifBGq0px <amikhalitsyn> This trace describes successfull unshare() call. <amikhalitsyn> Have you reproduced EPERM failure during tracing?
<gl0woo> this is two different lxc's both failed >> https://pastebin.com/RYbXFG4k
<amikhalitsyn> perf probe 'unshare_userns%return $retval' <amikhalitsyn> perf probe 'unshare_nsproxy_namespaces%return $retval' <amikhalitsyn> perf probe 'ksys_unshare%return $retval' <amikhalitsyn> You need to execute this 3 commands, then run gftrace (as before) and reproduce the problem <amikhalitsyn> From your trace, I can't see that userns was allocated which is really strange. <amikhalitsyn> you can also run unshare -Um if it fails. It should be sufficient for our needs. <amikhalitsyn> Because unshare -Um should work.
<gl0woo> here you go >> https://pastebin.com/T9PYUBdQ <amikhalitsyn> /* ksys_unshare__return: (__x64_sys_unshare+0x12/0x20 <- ksys_unshare) arg1=0x0 */ <amikhalitsyn> so, unshare returned 0. it's successful run
<amikhalitsyn> okay. let's try from the other side. Run gftrace and in parallel strace -e unshare,setns -f unshare -mnU true and show output of strace+gftrace <amikhalitsyn> strace -e unshare,setns -f unshare -mnU true <amikhalitsyn> that's one command. <amikhalitsyn> I think you need to fill topic on our forum https://discuss.linuxcontainers.org/ <amikhalitsyn> And we'll continue investigation of your problem. <gl0woo> here's the gftrace >> https://pastebin.com/dQ7GsWUY <gl0woo> root@host:~# strace -e unshare,setns -f unshare -mnU true <gl0woo> unshare(CLONE_NEWNS|CLONE_NEWUSER|CLONE_NEWNET) = -1 EPERM (Operation not permitted) <gl0woo> unshare: unshare failed: Operation not permitted <gl0woo> +++ exited with 1 +++
Here is the full output for ’ lxc-checkconfig’…
root@host:~# lxc-checkconfig LXC version 5.0.0 Kernel configuration not found at /proc/config.gz; searching... Kernel configuration found at /boot/config-5.15.74-1-pve --- Namespaces --- Namespaces: enabled Utsname namespace: enabled Ipc namespace: enabled Pid namespace: enabled User namespace: enabled Network namespace: enabled --- Control groups --- Cgroups: enabled Cgroup namespace: enabled Cgroup v1 mount points: Cgroup v2 mount points: /sys/fs/cgroup Cgroup v1 systemd controller: missing Cgroup v1 freezer controller: missing Cgroup ns_cgroup: required Cgroup device: enabled Cgroup sched: enabled Cgroup cpu account: enabled Cgroup memory controller: enabled Cgroup cpuset: enabled --- Misc --- Veth pair device: enabled, not loaded Macvlan: enabled, not loaded Vlan: enabled, not loaded Bridges: enabled, not loaded Advanced netfilter: enabled, not loaded CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, not loaded CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, loaded FUSE (for use with lxcfs): enabled, not loaded --- Checkpoint/Restore --- checkpoint restore: enabled CONFIG_FHANDLE: enabled CONFIG_EVENTFD: enabled CONFIG_EPOLL: enabled CONFIG_UNIX_DIAG: enabled CONFIG_INET_DIAG: enabled CONFIG_PACKET_DIAG: enabled CONFIG_NETLINK_DIAG: enabled File capabilities: Note : Before booting a new kernel, you can check its configuration usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig
Again, there is more detail at the Proxmox thread as to system config, what version of packages are running, etc. Sorry for the very long-winded first post.