Unprivileged container from template

Hi there,

I’m relatively new to unprivileged lxc containers (only setup privileged ones in the past) and was following different howto’s like that one Linux Containers - LXC - Getting started so far.

I’m on …

Distributor ID: Debian                                                                                                      
Description:    Debian GNU/Linux 11 (bullseye)                                                                                           
Release:        11                                                                                                                       
Codename:       bullseye

lxc-checkconfig …

LXC version 4.0.6
Kernel configuration not found at /proc/config.gz; searching...
Kernel configuration found at /boot/config-5.10.0-27-amd64
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
Network namespace: enabled

--- Control groups ---
Cgroups: enabled

Cgroup v1 mount points: 


Cgroup v2 mount points: 
/sys/fs/cgroup

Cgroup v1 systemd controller: missing
Cgroup v1 freezer controller: missing
Cgroup namespace: required
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled

--- Misc ---
Veth pair device: enabled, loaded
Macvlan: enabled, not loaded
Vlan: enabled, not loaded
Bridges: enabled, loaded
Advanced netfilter: enabled, loaded
CONFIG_NF_NAT_IPV4: missing
CONFIG_NF_NAT_IPV6: missing
CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded
CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, loaded
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, not loaded
FUSE (for use with lxcfs): enabled, loaded

--- Checkpoint/Restore ---
checkpoint restore: enabled
CONFIG_FHANDLE: enabled
CONFIG_EVENTFD: enabled
CONFIG_EPOLL: enabled
CONFIG_UNIX_DIAG: enabled
CONFIG_INET_DIAG: enabled
CONFIG_PACKET_DIAG: enabled
CONFIG_NETLINK_DIAG: enabled
File capabilities: 

Note : Before booting a new kernel, you can check its configuration
usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig

/etc/lxc/default.conf looking like that …

lxc.idmap = u 0 100000 65536                                                                                                                                     
lxc.idmap = g 0 100000 65536      
                                                                                                                                           
lxc.net.0.type = veth
lxc.net.0.link = lxcbr0
lxc.net.0.flags = up
lxc.net.0.hwaddr = 00:16:3e:xx:xx:xx

/etc/sub{g,u}id

root:100000:65536

But when I create an unprivileged container from the “download” template with something like the following command (inspired by tutorial above) …

sudo lxc-create -n test2 -P /path/test2 --template download -- --dist debian --release bookworm --arch amd64

… it won’t get an IP after starting the container.

When I create a container from the debian template with sudo lxc-create -t debian -n test3 and start this one, it’ll obtain an IP without any other action taken and on lxc-ls --fancy it will appear as UNPRIVILEGED - Which seems like exactly what I want: having an IP assigned in an unprivileged debian container.

But … I then checked the debian template (/usr/share/lxc/templates/lxc-debian) and this comes up.

# Detect use under userns (unsupported)
for arg in "$@"; do
    [ "$arg" = "--" ] && break
    if [ "$arg" = "--mapped-uid" -o "$arg" = "--mapped-gid" ]; then
        echo "This template can't be used for unprivileged containers." 1>&2
        echo "You may want to try the \"download\" template instead." 1>&2
        echo "You can also use mmdebstrap --mode=unshare, and an example is found at" 1>&2
        echo "https://wiki.debian.org/LXC#Unprivileged_Debian_container_by_mmdebstrap_--mode.3Dunshare " 1>&2
        echo "or in /usr/share/doc/lxc-templates/README.Debian." 1>&2
        exit 1
    fi
done

I’m confused now. Can I use that template to create my unprivileged container? Or isn’t it really unprivileged when I use that template?

Thanks in advance

Can you enter the container with lxc-attach and then show:

  • ps fauxww
  • ip link
1 Like

Hi, thanks for your answer :slight_smile:

Sure, here from the download template container …

root@test2:/# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0@if42: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:16:3e:63:6c:7b brd ff:ff:ff:ff:ff:ff link-netnsid 0
root@test2:/# ps fauxww
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         198  0.0  0.1   7196  3808 pts/4    Ss   19:44   0:00 /bin/bash
root         206  0.0  0.1  11040  4320 pts/4    R+   19:50   0:00  \_ ps fauxww
root           1  0.0  0.2 167380 10232 ?        Ss   16:01   0:00 /sbin/init
root         127  0.0  0.1  32964  6420 ?        Ss   16:01   0:00 /lib/systemd/systemd-journald
message+     159  0.0  0.0   9132  2128 ?        Ss   16:01   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         167  0.0  0.0   5496   464 pts/3    Ss+  16:01   0:00 /sbin/agetty -o -p -- \u --noclear --keep-baud - 115200,38400,9600 linux
root         168  0.0  0.0   5496   444 pts/0    Ss+  16:01   0:00 /sbin/agetty -o -p -- \u --noclear - vt220
root         169  0.0  0.0   5496   472 pts/1    Ss+  16:01   0:00 /sbin/agetty -o -p -- \u --noclear - vt220
root         170  0.0  0.0   5496   492 pts/2    Ss+  16:01   0:00 /sbin/agetty -o -p -- \u --noclear - vt220
root         171  0.0  0.0   5496   504 pts/3    Ss+  16:01   0:00 /sbin/agetty -o -p -- \u --noclear - vt220

And here from the debian template container …

root@test1:/# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0@if43: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:16:3e:e9:4b:33 brd ff:ff:ff:ff:ff:ff link-netnsid 0
root@test1:/# ps fauxww
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         107  0.0  0.1   7196  3776 pts/4    Ss   19:52   0:00 /bin/bash
root         276  0.0  0.1  11040  4292 pts/4    R+   19:55   0:00  \_ ps fauxww
root           1  0.0  0.2  19572 11052 ?        Ss   19:50   0:00 /sbin/init
root          43  0.0  0.3  32832 11504 ?        Ss   19:50   0:00 /lib/systemd/systemd-journald
root          72  0.0  0.0   5740  3656 ?        Ss   19:50   0:00 dhclient -4 -v -i -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases -I -df /var/lib/dhcp/dhclient6.eth0.leases eth0
root         104  0.0  0.0   5496  1024 pts/4    Ss+  19:51   0:00 /sbin/agetty -o -p -- \u --noclear --keep-baud - 115200,38400,9600 vt220
root         105  0.0  0.2  15400  8328 ?        Ss   19:51   0:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups

Okay, I just saw the dhclient isn’t running in the download template container.
I executed the command above and I got my IP and internet access :see_no_evil:

I guess I can advance work from here (thanks for pointing in the right direction, wasn’t sure what I had to look for until I saw it) … but I’m still curious. The check for unprivileged-ness in the debian template, what exactly is it checking for? I mean, even it it says, it can’t be used for unprivileged containers lxc-ls says, that the container is unprivileged
image

Is the indication in lxc-ls the proof that it’s actually unprivileged or might it be misleading in some cases?
Or maybe the better question: How do I make sure, the container really is unprivileged? (As you might noticed I created and started booth containers as root)

cat /proc/self/uid_map from inside the container is the best way to be sure.

The DHCP thing is a bit odd. The downloadloaded Debian image should have systemd-networkd installed and normally would come with configuration to run DHCP on eth0 (/etc/systemd/network/eth0.network).

It may be interesting to look at systemctl --failed and networkctl for reasons why this didn’t run.

1 Like
root@test1:/# cat /proc/self/uid_map 
         0     100000      65536

root@test2:/# cat /proc/self/uid_map
         0     100000      65536

Guess both are unprivileged then.

root@test2:/# systemctl --failed
  UNIT                     LOAD   ACTIVE SUB    DESCRIPTION                   
● systemd-logind.service   loaded failed failed User Login Management
● systemd-networkd.service loaded failed failed Network Configuration
● systemd-resolved.service loaded failed failed Network Name Resolution       
● systemd-networkd.socket  loaded failed failed Network Service Netlink Socket

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
4 loaded units listed.
root@test2:/# networkctl
WARNING: systemd-networkd is not running, output will be incomplete.

IDX LINK TYPE     OPERATIONAL SETUP    
  1 lo   loopback -           unmanaged
  2 eth0 ether    -           unmanaged

2 links listed.

And root@test2:/# journalctl -xeu systemd-networkd.service returns following information:

Jan 24 20:45:18 test2 (networkd)[170]: systemd-networkd.service: Failed to set up mount namespacing: Permission denied
Jan 24 20:45:18 test2 (networkd)[170]: systemd-networkd.service: Failed at step NAMESPACE spawning /lib/systemd/systemd-networkd: Permission denied                                                                                                                       
░░ Subject: Process /lib/systemd/systemd-networkd could not be executed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ The process /lib/systemd/systemd-networkd could not be executed and failed.
░░ 
░░ The error number returned by this process is ERRNO.
Jan 24 20:45:18 test2 systemd[1]: systemd-networkd.service: Main process exited, code=exited, status=226/NAMESPACE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ An ExecStart= process belonging to unit systemd-networkd.service has exited.
░░ 
░░ The process' exit code is 'exited' and its exit status is 226.
Jan 24 20:45:18 test2 systemd[1]: systemd-networkd.service: Failed with result 'exit-code'.

It seems the downloaded debian is too unprivileged?

No sure about the namespacing spawning/mounting, but the systemd-networkd seem to have the same rights in both container:

-rwxr-xr-x 1 100000 100000 1.6M Nov 10 01:25 /Container/test2/test2/rootfs/usr/lib/systemd/systemd-networkd

-rwxr-xr-x 1 100000 100000 1.6M Nov 10 01:25 /var/lib/lxc/test1/rootfs/usr/lib/systemd/systemd-networkd

I’ll attach the configs of the two containers, I’d only expect the inclusion of the userns.conf to make a real change, but commenting it out doesn’t help with the problem either =/

$ sudo cat /Container/test2/test2/config
# Template used to create this container: /usr/share/lxc/templates/lxc-download
# Parameters passed to the template: --dist debian --release bookworm --arch amd64
# For additional config options, please look at lxc.container.conf(5)

# Uncomment the following line to support nesting containers:
#lxc.include = /usr/share/lxc/config/nesting.conf
# (Be aware this has security implications)


# Distribution configuration
lxc.include = /usr/share/lxc/config/debian.common.conf
lxc.include = /usr/share/lxc/config/userns.conf
lxc.arch = linux64

# Container specific configuration
lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536
lxc.rootfs.path = dir:/Container/test2/test2/rootfs
lxc.uts.name = test2

# Network configuration
lxc.net.0.type = veth
lxc.net.0.link = lxcbr0
lxc.net.0.flags = up
lxc.net.0.hwaddr = 00:16:3e:63:6c:7b
$ sudo cat /var/lib/lxc/test1/config
# Template used to create this container: /usr/share/lxc/templates/lxc-debian
# Parameters passed to the template:
# For additional config options, please look at lxc.container.conf(5)

# Uncomment the following line to support nesting containers:
#lxc.include = /usr/share/lxc/config/nesting.conf
# (Be aware this has security implications)

# Map user and group ids
lxc.include = /usr/share/lxc/config/debian.userns.conf
lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536


lxc.net.0.type = veth
lxc.net.0.link = lxcbr0
lxc.net.0.flags = up
lxc.net.0.hwaddr = 00:16:3e:e9:4b:33
lxc.rootfs.path = dir:/var/lib/lxc/test1/rootfs

# Common configuration
lxc.include = /usr/share/lxc/config/debian.common.conf

# Container specific configuration
lxc.tty.max = 4
lxc.uts.name = test1
lxc.arch = amd64
lxc.pty.max = 1024

Can you show systemctl cat systemd-networkd?

We have a bunch of extra logic and workarounds for things like that in Incus and have daily tests confirming those images work fine, so the same should be repeatable with LXC, just needs a bit more work :slight_smile:

I expect that in this case the issue is likely with apparmor. You could give a try to:

lxc.apparmor.profile = generated

If that alone doesn’t help, then also add:

lxc.apparmor.allow_nesting = 1

That latter one isn’t advisable for a privileged container but should be perfectly fine for unprivileged and actually pretty close to Incus’ default behavior.

1 Like

Sure, but I guess that’s just FYI now as the lxc.apparmor.profile = generated did the trick :slight_smile:

root@test2:/# systemctl cat systemd-networkd
# /lib/systemd/system/systemd-networkd.service
#  SPDX-License-Identifier: LGPL-2.1-or-later
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Network Configuration
Documentation=man:systemd-networkd.service(8)
Documentation=man:org.freedesktop.network1(5)
ConditionCapability=CAP_NET_ADMIN
DefaultDependencies=no
# systemd-udevd.service can be dropped once tuntap is moved to netlink
After=systemd-networkd.socket systemd-udevd.service network-pre.target systemd-sysusers.service systemd-sysctl.service
Before=network.target multi-user.target shutdown.target initrd-switch-root.target
Conflicts=shutdown.target initrd-switch-root.target
Wants=systemd-networkd.socket network.target

[Service]
AmbientCapabilities=CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_NET_BROADCAST CAP_NET_RAW
BusName=org.freedesktop.network1
CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_BIND_SERVICE CAP_NET_BROADCAST CAP_NET_RAW
DeviceAllow=char-* rw
ExecStart=!!/lib/systemd/systemd-networkd
ExecReload=networkctl reload
FileDescriptorStoreMax=512
LockPersonality=yes
MemoryDenyWriteExecute=yes
NoNewPrivileges=yes
ProtectProc=invisible
ProtectClock=yes
ProtectControlGroups=yes
ProtectHome=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectSystem=strict
Restart=on-failure
RestartKillSignal=SIGUSR2
RestartSec=0
RestrictAddressFamilies=AF_UNIX AF_NETLINK AF_INET AF_INET6 AF_PACKET
RestrictNamespaces=yes
RestrictRealtime=yes
RestrictSUIDSGID=yes
RuntimeDirectory=systemd/netif
RuntimeDirectoryPreserve=yes
SystemCallArchitectures=native
SystemCallErrorNumber=EPERM
SystemCallFilter=@system-service
Type=notify
User=systemd-network
WatchdogSec=3min

[Install]
WantedBy=multi-user.target
Also=systemd-networkd.socket
Alias=dbus-org.freedesktop.network1.service

# The output from this generator is used by udevd and networkd. Enable it by
# default when enabling systemd-networkd.service.
Also=systemd-network-generator.service

# We want to enable systemd-networkd-wait-online.service whenever this service
# is enabled. systemd-networkd-wait-online.service has
# WantedBy=network-online.target, so enabling it only has an effect if
# network-online.target itself is enabled or pulled in by some other unit.
Also=systemd-networkd-wait-online.service

# /run/systemd/system/service.d/zzz-lxc-service.conf
[Service]
ProcSubset=all
ProtectProc=default
ProtectControlGroups=no
ProtectKernelTunables=no
NoNewPrivileges=no
LoadCredential=

Does using “generated” mean anything bad considering the security of my container?

In the meantime, thank you very much for your time and help so far, I don’t think I would have figured this out anytime soon :sweat_smile:

It’s btw the first time I’ve heard of Incus, sounds interessting - guess I’ll have to take a look at that.