Loosing capabilities after reset uid/gid

Environment: Linux buildroot 5.10.131
LXC version: lxc-4.0.12

When I tried to run an unprivileged container started by a non-root user, and met some permission issues. In the stage of launching network, I saw this error messages.

lxc-start test_unpri 19700101000048.176 WARN     start - start.c:lxc_spawn:1835 - Operation not permitted - Failed to allocate new network namespace id
lxc-start test_unpri 19700101000048.179 INFO     network - network.c:lxc_create_network_unpriv_exec:2949 - Execing lxc-user-nic create /usr/var/lib/lxc_unpri/default/.local/share/lxc test_unpri 6855 veth lxcbr0 (null)
lxc-start test_unpri 19700101000048.276 ERROR    network - network.c:lxc_create_network_unpriv_exec:2977 - lxc-user-nic failed to configure requested network: cmd/lxc_user_nic.c: 474: instantiate_veth - Operation not permitted - Failed to create veth1000_MD05-veth1000_MD05p

This is a lacking capability issue, which requires cap_net_admin,cap_sys_admin, … , etc.
After I added these missing capabilities, it passed most of permission issues, but still had another permission error remained.

lxc-start test_unpri 19700101000138.372 INFO     network - network.c:lxc_create_network_unpriv_exec:2949 - Execing lxc-user-nic create /usr/var/lib/lxc_unpri/default/.local/share/lxc test_unpri 6849 veth lxcbr0 (null)
lxc-start test_unpri 19700101000138.591 ERROR    network - network.c:lxc_create_network_unpriv_exec:2977 - lxc-user-nic failed to configure requested network: cmd/lxc_user_nic.c: 886: lxc_secure_rename_in_ns - Operation not permitted - Failed to setns() to original network namespace of PID 3

This is really weird to me, setns() requres cap_sys_admin, and I already added cap_sys_admin before, didn’t know why I lost the capabilities added before.
I did investigate this issue and finally found it’s because capabilities would be clear if calling any api to change uid/gid. In my case it’s setresuid.

Refer to lxc-4.0.12/src/lxc/cmd/lxc_user_nic.c

static char *lxc_secure_rename_in_ns(int pid, char *oldname, char *newname,
				     int *container_veth_ifidx)
{
...

	ret = setresuid(ruid, ruid, 0);   // **++TY: Capabilities would be clear if calling setresuid**
	if (ret < 0) {
		CMD_SYSERROR("Failed to drop privilege by setting effective user id and real user id to %d, and saved user ID to 0\n", ruid);
		/*
		 * It's ok to jump to do_full_cleanup here since setresuid()
		 * will succeed when trying to set real, effective, and saved
		 * to values they currently have.
		 */
		goto out_setns;
	}

...

out_setns:
	ret = setns(ofd, CLONE_NEWNET);    // ++TY: And finally failed here, because of lacking cap_sys_admin**
	if (ret < 0)
		return cmd_error_errno(NULL, errno, "Failed to setns() to original network }

If I added prctl(PR_SET_KEEPCAPS, prctl_arg(1)) before any setresuid, I could keep all capabilities and successfully launch my unprivileged container, but I am not sure if it’s a proper and formal fix or not.

Could anyone give me any thoughts or suggestions?

Thanks in advance,
TengYang

I finally found the root cause, in src/lxc/Makefile, it already added suid permission into lxc-use-nic, there is no need to add any capabilities, suid has super user permission.

  chmod u+s $(DESTDIR)$(libexecdir)/lxc/lxc-user-nic

In my case, suid permission was removed by buildroot, then I met many permission deny issue, actually this isn’t a lxc issue, but a buildroot issue.