Container fails to start: newuidmap failed to write mapping "newuidmap: uid range [0-1000000000) -> [1000000-1001000000) not allowed

pgoetz · October 20, 2021, 2:44am

Last time I had this problem it was solved by creating /etc/subuid and /etc/subgid files with an appropriate root entry. No such luck this time:

OS: Arch Linux
LXD: 4.19

I created /etc/subuid and /etc/subgid files with this content:

root:100000:65536
lxd:100000:65536

I also added the following lines to /etc/lxc/default.conf:

lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536

which I didn’t do last time. I tried starting the container with and without these lines in /etc/lxc/default.conf and also tried placing them in /etc/default/lxc (not sure why I have both of these on my system). I did remember to restart lxd each time I changed the configuration.

However, with or without the lxc.idmap entries, when I try to start the container, the following errors ensue:

lxc samba-dc 20211020015746.465 ERROR    conf - conf.c:lxc_map_ids:3471 - newuidmap failed to write mapping "newuidmap: uid range [0-1000000000) -> [1000000-1001000000) not allowed": newuidmap 4259 0 1000000 1000000000
lxc samba-dc 20211020015746.465 ERROR    start - start.c:lxc_spawn:1774 - Failed to set up id       mapping.
lxc samba-dc 20211020015746.465 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:    868 - Received container state "ABORTING" instead of "RUNNING"
lxc samba-dc 20211020015746.465 ERROR    start - start.c:__lxc_start:2053 - Failed to spawn         container "samba-dc"
lxc samba-dc 20211020015746.465 WARN     start - start.c:lxc_abort:1050 - No such process - Failed  to send SIGKILL via pidfd 20 for process 4259
lxc 20211020015751.527 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:220 - Connection reset by peer - Failed to receive response
lxc 20211020015751.527 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:129 - Failed to receive  file descriptors

So, it looks like it’s trying to use the default uid mapping range despite the /etc/subuid, /etc/subgid, and /etc/lxd/default.conf entries? I must have the lxd.idmap syntax wrong or am including the instructions in the wrong location. Should I be using raw.idmap in the profile instead?

Also, systemctl restart lxd is extremely slow. I tried strace’ing the PID to see where it is getting stuck, but the strace shows nothing illuminating:

ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8

pgoetz · October 20, 2021, 10:40am

One detail I omitted from the description above is that I’m initializing/launching the container as an unprivileged user in the lxd group:

[pgoetz@gecko ~]$ whoami
pgoetz
$ lxc launch images:ubuntu/20.04 samba-dc

The system is configured to allow unprivileged users the create user namespaces:

[root@gecko default]# sysctl kernel.unprivileged_userns_clone
kernel.unprivileged_userns_clone = 1
[root@gecko default]# sysctl user.max_user_namespaces
user.max_user_namespaces = 127583

However my guess, based on the information provided here is that I’m unable to delegate a necessary cgroup as an unprivileged user. I’m not sure exactly what libpam-cgfs does, but this package doesn’t appear to be available on Arch, and this instruction from that page:

systemd-run --unit=myshell --user --scope -p "Delegate=yes" lxc-start <container-name>

isn’t sufficiently explained for me to feel comfortable using it. Given the complications, it occurs to me that I can’t think of any really good reason why I should be launching containers as myself? Since the root user can launch unprivileged containers, the path of least resistance (without, I believe, compromising security?) is to launch the container as root:

[root@gecko ~]# lxc launch images:ubuntu/20.04 samba-dc

That appears to just work:

[root@gecko ~]# lxc list
+----------+---------+----------------------+------+-----------+-----------+
|   NAME   |  STATE  |         IPV4         | IPV6 |   TYPE    | SNAPSHOTS |
+----------+---------+----------------------+------+-----------+-----------+
| samba-dc | RUNNING | 192.168.1.170 (eth0) |      | CONTAINER | 0         |
+----------+---------+----------------------+------+-----------+-----------+

Rather than use lxdbr0, I initialized lxd to use an existing bridge (with a NIC bound to it) because I want/need this container to be public facing. This is clearly working, as the IP address above was assigned by my DHCP server.

While not technically solved, I’m going to mark this topic as solved. There are way too many moving parts to getting this working with an unprivileged user launching containers, and I’m not seeing a single benefit of doing so. I’m going to repeat this installation in a production environment using the pre-installed 4.0/stable snap on Ubuntu 20.04. Presumably I’ll run into the same issues there and will need to similarly launch the container as root.

I wonder if these issues would be less of a headache with better utilization of linux capabilities? Anyway, not a problem I can solve right now.

mahervelous · June 8, 2022, 8:09pm

Let me know if I should start a new thread, but I’m having the same problem as is shown here, but I haven’t been able to resolve it in the same ways as you have.

OS: Arch Linux, uname -a:
Linux mongoes 5.18.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 30 May 2022 17:53:11 +0000 x86_64 GNU/Linux
LXD: 5.2-1

I’ve created /etc/subuid and /etc/subgid with the following (which is different from what you’ve done, but I don’t have the lxd user on my machine (is that a problem?), so I don’t see why it should be what you’ve put)

root:100000:65536

My /etc/lxc/default.conf is also:

lxc.net.0.type = empty
lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536

And in /etc/default/lxc, I have the same two lines, but after the rest of the autogenerated config file… As shown:

# LXC_AUTO - whether or not to start containers at boot
LXC_AUTO="true"

# BOOTGROUPS - What groups should start on bootup?
#	Comma separated list of groups.
#	Leading comma, trailing comma or embedded double
#	comma indicates when the NULL group should be run.
# Example (default): boot the onboot group first then the NULL group
BOOTGROUPS="onboot,"

# SHUTDOWNDELAY - Wait time for a container to shut down.
#	Container shutdown can result in lengthy system
#	shutdown times.  Even 5 seconds per container can be
#	too long.
SHUTDOWNDELAY=5

# OPTIONS can be used for anything else.
#	If you want to boot everything then
#	options can be "-a" or "-a -A".
OPTIONS=

# STOPOPTS are stop options.  The can be used for anything else to stop.
#	If you want to kill containers fast, use -k
STOPOPTS="-a -A -s"

USE_LXC_BRIDGE="false"  # overridden in lxc-net

[ ! -f /etc/default/lxc-net ] || . /etc/default/lxc-net

lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536

I have the exact same error as shown above when I try to launch it:

lxc u1 20220608195820.638 ERROR    conf - conf.c:lxc_map_ids:3668 - newuidmap failed to write mapping "newuidmap: uid range [0-1000000000) -> [1000000-1001000000) not allowed": newuidmap 1848 0 1000000 1000000000
lxc u1 20220608195820.638 ERROR    start - start.c:lxc_spawn:1791 - Failed to set up id mapping.
lxc u1 20220608195820.638 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING"
lxc u1 20220608195820.639 ERROR    start - start.c:__lxc_start:2074 - Failed to spawn container "u1"
lxc u1 20220608195820.639 WARN     start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 17 for process 1848
lxc 20220608195825.695 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220608195825.695 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"

My system is also configured to allow unprivileged users to create user namespaces:

[root@mongoes sean]# sysctl kernel.unprivileged_userns_clone
kernel.unprivileged_userns_clone = 1
[root@mongoes sean]# sysctl user.max_user_namespaces
user.max_user_namespaces = 256331

And, what’s more, as shown here, I’ve delegated unprivileged cgroups by creating a systemd unit:

[root@mongoes sean]# cat /etc/systemd/system/user@1000.service.d/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids

And yet after all this, the error remains.

I can’t even launch privileged containers, which has the exact same issue with the exact same error output:

[root@mongoes sean]# lxc start u1
Error: Failed to run: /usr/bin/lxd forkstart u1 /var/lib/lxd/containers /var/log/lxd/u1/lxc.conf: 
Try `lxc info --show-log u1` for more info
[root@mongoes sean]# lxc info --show-log u1
Name: u1
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/06/07 20:29 EDT
Last Used: 2022/06/08 16:09 EDT

Log:

lxc u1 20220608200932.469 ERROR    conf - conf.c:lxc_map_ids:3668 - newuidmap failed to write mapping "newuidmap: uid range [0-1000000000) -> [1000000-1001000000) not allowed": newuidmap 2361 0 1000000 1000000000
lxc u1 20220608200932.469 ERROR    start - start.c:lxc_spawn:1791 - Failed to set up id mapping.
lxc u1 20220608200932.469 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:877 - Received container state "ABORTING" instead of "RUNNING"
lxc u1 20220608200932.470 ERROR    start - start.c:__lxc_start:2074 - Failed to spawn container "u1"
lxc u1 20220608200932.470 WARN     start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 17 for process 2361
lxc 20220608200937.502 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20220608200937.502 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors for command "get_state"

Any idea what should be done?

stgraber · June 8, 2022, 8:11pm

Did you restart LXD after updating /etc/subuid and /etc/subgid?

mahervelous · June 8, 2022, 8:12pm

Yes, I’ve rebooted and restarted the daemon several times trying different things (as well as before collecting all this information for the post)

stgraber · June 8, 2022, 8:14pm

Did you also try creating a new instance instead of starting an existing one?

mahervelous · June 8, 2022, 8:18pm

Actually, that appears to have fixed it.

Thanks a lot for the help, I feel a bit embarrassed not having done that in the first place.

I can’t find how to mark the post as solved…

stgraber · June 8, 2022, 8:19pm

Good, that makes sense. LXD sets the map at instance creation time, so your instance already had its map stored in its volatile config (lxc config show would show you those) which would then keep failing even if LXD itself was now configured to use something different.