No network access in new Debian container

I’ve looked through several similar issues and have not managed to find a solution.

Steps I took so far:

sudo usermod --add-subuids 100000-165536 s
sudo usermod --add-subgids 100000-165536 s
echo 's veth lxcbr0 10' | sudo tee -a /etc/lxc/lxc-usernet

Created ~/.config/lxc/default.conf:

lxc.net.0.type = veth
lxc.net.0.link = lxcbr0
lxc.net.0.flags = up
lxc.net.0.hwaddr = 00:16:3e:xx:xx:xx
lxc.idmap = u 0 100000 1000
lxc.idmap = g 0 100000 1000
lxc.idmap = u 1000 1000 1
lxc.idmap = g 1000 1000 1
lxc.idmap = u 1001 101001 64535
lxc.idmap = g 1001 101001 64535

# GUI
lxc.mount.entry = /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry = /dev/snd dev/snd none bind,optional,create=dir
lxc.mount.entry = /tmp/.X11-unix tmp/.X11-unix none bind,optional,create=dir
lxc.mount.entry = /dev/video0 dev/video0 none bind,optional,create=file
lxc.mount.entry = /home/s/Hacking home/ubuntu/hacking none bind,create=dir
lxc.mount.entry = /home/s/Desktop home/ubuntu/desktop none bind,create=dir

Added s:1000:1 to /etc/subuid and /etc/subgid.

Then running:

lxc-create -t download -n mygui -- -d debian -r bookworm -a amd64
systemd-run --unit=myshell --user --scope -p "Delegate=yes" lxc-start -n mygui

After attaching, it seems the network is unreachable.
dnsmasq is running on the host. ip r returns nothing in the container.

Rereading the getting started guide, it says I should have used systemd-run --unit=myshell --user --scope -p "Delegate=yes" for lxc-create. I’ve recreated the container that way, but still the same result.

It has not received an IP address:

> lxc-info --name mygui
Name:           mygui
State:          RUNNING
PID:            57547
Link:           veth1000_mkeL
 TX bytes:      866 bytes
 RX bytes:      9.32 KiB
 Total bytes:   10.17 KiB
> lxc-ls --fancy
NAME  STATE   AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED 
mygui RUNNING 0         -      -    -    true

/var/lib/misc/dnsmasq.lxcbr0.leases is empty.

Can someone give some guidance on what to debug?

Tried an Ubuntu (oracular) image and the results are even worse, it won’t even start. I don’t understand why nothing works, this all worked fine on my previous laptop…

> systemd-run --unit=myshell --user --scope -p "Delegate=yes" -- lxc-start -n mygui
Running scope as unit: myshell.scope
> lxc-attach -n mygui
lxc-attach: mygui: attach.c: lxc_attach: 981 Failed to get init pid
> lxc-info --name mygui
Name:           mygui
State:          STOPPED

In the Ubuntu case that may be because systemd in Oracular doesn’t know how to do cgroup1 anymore, so if your host system isn’t on unified cgroup2 it’s likely to just fail to start.

For the earlier problem with missing DHCP, you should look at the journal or similar log inside the container to try to see why it’s not performing the DHCP request, or manually run dhclient to see if the problem isn’t on the server end.

I remember doing dhclient a couple of times on my previous machine, so I had already thought of that. But:

# dhclient
bash: dhclient: command not found

And I don’t have a network connection to apt install anything…

Is there a particular unit or anything in particular I should look for in the journal. I can see a bunch of errors, but not sure they are related.
e.g.
A whole bunch of errors along the lines of (for dozens of different things):
Nov 15 01:24:31 mygui udevadm[106]: intel_pmc_bxt: Failed to write 'add' to '/sys/module/intel_pmc_bxt/uevent': Permission denied
A few systemd related failures:

Nov 15 01:24:31 mygui systemd[1]: Failed to start systemd-udevd.service - Rule-based Manager for Device Events and Files.
Nov 15 01:24:31 mygui systemd-tmpfiles[114]: rm_rf(/tmp/.X11-unix): Operation not permitted
Nov 15 01:24:31 mygui (md-udevd)[116]: systemd-udevd.service: Failed to set up mount namespacing: Permission denied
Nov 15 01:24:31 mygui (md-udevd)[116]: systemd-udevd.service: Failed at step NAMESPACE spawning /lib/systemd/systemd-udevd: Permission denied
Nov 15 01:24:31 mygui systemd-tmpfiles[114]: fchownat() of /tmp/.X11-unix failed: Operation not permitted
Nov 15 01:24:31 mygui systemd[1]: Failed to start systemd-resolved.service - Network Name Resolution.
Nov 15 01:24:31 mygui systemd[1]: Failed to start systemd-networkd.service - Network Configuration.

^ I suppose one of these 2 are probably the critical one for networking…

Nov 15 01:24:31 mygui (crub_all)[141]: e2scrub_reap.service: Failed to set up mount namespacing: Permission denied
Nov 15 01:24:31 mygui (crub_all)[141]: e2scrub_reap.service: Failed at step NAMESPACE spawning /sbin/e2scrub_all: Permission denied
Nov 15 01:24:31 mygui (d-logind)[158]: systemd-logind.service: Failed to set up mount namespacing: Permission denied
Nov 15 01:24:31 mygui (d-logind)[158]: systemd-logind.service: Failed at step NAMESPACE spawning /lib/systemd/systemd-logind: Permission denied
Nov 15 01:24:31 mygui systemd[1]: Failed to start e2scrub_reap.service - Remove Stale Online ext4 Metadata Check Snapshots.
Nov 15 01:24:31 mygui systemd[1]: Failed to start systemd-logind.service - User Login Management.

I just realised that dhclient is installed, it’s just not on the path.

Running /usr/sbin/dhclient takes about a minute to complete. Despite there being no errors (and exit code 0), there is still no internet connection afterwards. Confirmed the same on Ubuntu jammy too.

@stgraber Anything else I can look at to debug?

Those startup errors suggest that the container doesn’t have or has an outdated systemd generator.

Our images usually include a systemd generate which on startup will generate some unit overrides to avoid those kind of issues.

distrobuilder/distrobuilder/lxc.generator at main · lxc/distrobuilder · GitHub is the current version of it. You’d want that as an executable script located at /etc/systemd/system-generators/lxc inside of the container.

It’s already there, is there something I need to do to execute it?

> cd .local/share/lxc/mygui/rootfs/etc/systemd/system-generators/
> ls -l
total 8
-rwxr-xr-x 1 100000 100000 7687 Dec  9 05:27 lxc*
> cat lxc 
#!/bin/sh
# NOTE: systemctl is not available for systemd-generators
set -eu

# disable localisation (faster grep)
export LC_ALL=C

## Helper functions
# is_lxc_container succeeds if we're running inside a LXC container
is_lxc_container() {
	grep -q --text container=lxc /proc/1/environ
}

is_lxc_privileged_container() {
	# The full positive 32-bit range is available
	grep -qw 4294967295$ /proc/self/uid_map
}

# is_incus_vm succeeds if we're running inside an Incus VM
is_incus_vm() {
	[ -e /dev/virtio-ports/org.linuxcontainers.incus ]
}

# is_in_path succeeds if the given file exists in on of the paths
is_in_path() {
	# Don't use $PATH as that may not include all relevant paths
	for path in /bin /sbin /usr/bin /usr/sbin /usr/local/bin /usr/local/sbin; do
		[ -e "${path}/$1" ] && return 0
	done

	return 1
}

## Fix functions
# fix_ro_paths avoids udevd issues with /sys and /proc being writable
fix_ro_paths() {
	mkdir -p "/run/systemd/system/$1.d"
	cat <<-EOF > "/run/systemd/system/$1.d/zzz-lxc-ropath.conf"
		# This file was created by distrobuilder
		[Service]
		BindReadOnlyPaths=/sys /proc
		EOF
}

# fix_nm_link_state forces the network interface to a DOWN state ahead of NetworkManager starting up
fix_nm_link_state() {
	[ -e "/sys/class/net/$1" ] || return 0
	ip_path=
	if [ -f /sbin/ip ]; then
		ip_path=/sbin/ip
	elif [ -f /bin/ip ]; then
		ip_path=/bin/ip
	else
		return 0
	fi
	cat <<-EOF > /run/systemd/system/network-device-down.service
		# This file was created by distrobuilder
		[Unit]
		Description=Turn off network device
		Before=NetworkManager.service
		Before=systemd-networkd.service

		[Service]
		# do not turn off if there is a default route to 169.254.0.1, i.e. the device is a routed nic
		ExecCondition=/bin/sh -c '! /usr/bin/grep -qs 00000000.0100FEA9 /proc/net/route'
		ExecStart=-${ip_path} link set $1 down
		Type=oneshot
		RemainAfterExit=true

		[Install]
		WantedBy=default.target
		EOF
	mkdir -p /run/systemd/system/default.target.wants
	ln -sf /run/systemd/system/network-device-down.service /run/systemd/system/default.target.wants/network-device-down.service
}

# fix_systemd_override_unit generates a unit specific override
fix_systemd_override_unit() {
	dropin_dir="/run/systemd/${1}.d"
	mkdir -p "${dropin_dir}"
	{
		echo "[Service]";
		[ "${systemd_version}" -ge 247 ] && echo "ProcSubset=all";
		[ "${systemd_version}" -ge 247 ] && echo "ProtectProc=default";
		[ "${systemd_version}" -ge 232 ] && echo "ProtectControlGroups=no";
		[ "${systemd_version}" -ge 232 ] && echo "ProtectKernelTunables=no";
		[ "${systemd_version}" -ge 239 ] && echo "NoNewPrivileges=no";
		[ "${systemd_version}" -ge 249 ] && echo "LoadCredential=";
		[ "${systemd_version}" -ge 254 ] && echo "PrivateNetwork=no";
		[ "${systemd_version}" -ge 256 ] && echo "ImportCredential=";

		# Additional settings for privileged containers
		if is_lxc_privileged_container; then
			echo "ProtectHome=no";
			echo "ProtectSystem=no";
			echo "PrivateDevices=no";
			echo "PrivateTmp=no";
			[ "${systemd_version}" -ge 244 ] && echo "ProtectKernelLogs=no";
			[ "${systemd_version}" -ge 232 ] && echo "ProtectKernelModules=no";
			[ "${systemd_version}" -ge 231 ] && echo "ReadWritePaths=";
			[ "${systemd_version}" -ge 254 ] && [ "${systemd_version}" -lt 256 ] && echo "ImportCredential=";
		fi

		true;
	} > "${dropin_dir}/zzz-lxc-service.conf"
}

# fix_systemd_mask masks the systemd unit
fix_systemd_mask() {
	ln -sf /dev/null "/run/systemd/system/$1"
}

# fix_systemd_udev_trigger overrides the systemd-udev-trigger.service to match the latest version
# of the file which uses "ExecStart=-" instead of "ExecStart=".
fix_systemd_udev_trigger() {
	cmd=
	if [ -f /usr/bin/udevadm ]; then
		cmd=/usr/bin/udevadm
	elif [ -f /sbin/udevadm ]; then
		cmd=/sbin/udevadm
	elif [ -f /bin/udevadm ]; then
		cmd=/bin/udevadm
	else
		return 0
	fi

	mkdir -p /run/systemd/system/systemd-udev-trigger.service.d
	cat <<-EOF > /run/systemd/system/systemd-udev-trigger.service.d/zzz-lxc-override.conf
		# This file was created by distrobuilder
		[Service]
		ExecStart=
		ExecStart=-${cmd} trigger --type=subsystems --action=add
		ExecStart=-${cmd} trigger --type=devices --action=add
		EOF
}

# fix_systemd_sysctl overrides the systemd-sysctl.service to use "ExecStart=-" instead of "ExecStart=".
fix_systemd_sysctl() {
	cmd=/usr/lib/systemd/systemd-sysctl
	! [ -e "${cmd}" ] && cmd=/lib/systemd/systemd-sysctl
	mkdir -p /run/systemd/system/systemd-sysctl.service.d
	cat <<-EOF > /run/systemd/system/systemd-sysctl.service.d/zzz-lxc-override.conf
		# This file was created by distrobuilder
		[Service]
		ExecStart=
		ExecStart=-${cmd}
		EOF
}

## Main logic
# Nothing to do in Incus VM but deployed in case it is later converted to a container
is_incus_vm && exit 0

# Exit immediately if not an Incus/LXC container
is_lxc_container || exit 0

# Check for NetworkManager
nm_exists=0

is_in_path NetworkManager && nm_exists=1

# Determine systemd version
for path in /usr/lib/systemd/systemd /lib/systemd/systemd; do
	[ -x "${path}" ] || continue

	systemd_version="$("${path}" --version | head -n1 | cut -d' ' -f2 | cut -d'~' -f1)"
	break
done

# Determine distro name and release
ID=""
if [ -e /etc/os-release ]; then
	# shellcheck disable=SC1091
	. /etc/os-release
fi

# Overriding some systemd features is only needed if security.nesting=false
# in which case, /dev/.lxc will be missing
if [ ! -d /dev/.lxc ]; then
	# Apply systemd overrides
	if [ "${systemd_version}" -ge 244 ]; then
		fix_systemd_override_unit system/service
	else
		# Setup per-unit overrides
		find /lib/systemd /etc/systemd /run/systemd /usr/lib/systemd -name "*.service" -type f | sed 's#/\(lib\|etc\|run\|usr/lib\)/systemd/##g'| while read -r service_file; do
			fix_systemd_override_unit "${service_file}"
		done
	fi

	# Workarounds for privileged containers.
	if { [ "${ID}" = "altlinux" ] || [ "${ID}" = "arch" ] || [ "${ID}" = "fedora" ]; } && ! is_lxc_privileged_container; then
		fix_ro_paths systemd-networkd.service
		fix_ro_paths systemd-resolved.service
	fi
fi

# Ignore failures on some units.
fix_systemd_udev_trigger
fix_systemd_sysctl

# Mask some units.
fix_systemd_mask dev-hugepages.mount
fix_systemd_mask run-ribchester-general.mount
fix_systemd_mask systemd-hwdb-update.service
fix_systemd_mask systemd-journald-audit.socket
fix_systemd_mask systemd-modules-load.service
fix_systemd_mask systemd-pstore.service
fix_systemd_mask ua-messaging.service
fix_systemd_mask systemd-firstboot.service
fix_systemd_mask systemd-binfmt.service
if [ ! -e /dev/tty1 ]; then
	fix_systemd_mask vconsole-setup-kludge@tty1.service
fi

if [ -d /etc/udev ]; then
	mkdir -p /run/udev/rules.d
	cat <<-EOF > /run/udev/rules.d/90-lxc-net.rules
		# This file was created by distrobuilder.
		#
		# Its purpose is to convince NetworkManager to treat the eth0 veth
		# interface like a regular Ethernet. NetworkManager ordinarily doesn't
		# like to manage the veth interfaces, because they are typically configured
		# by container management tooling for specialized purposes.

		ACTION=="add|change|move", ENV{ID_NET_DRIVER}=="veth", ENV{INTERFACE}=="eth[0-9]*", ENV{NM_UNMANAGED}="0"
		EOF
fi

# Workarounds for NetworkManager in containers
if [ "${nm_exists}" -eq 1 ]; then
	fix_nm_link_state eth0
fi

# Allow masking units created by the lxc system-generator.
for d in /etc/systemd/system /usr/lib/systemd/system /lib/systemd/system; do
	if ! [ -d "${d}" ]; then
		continue
	fi

	find "${d}" -maxdepth 1 -type l | while read -r f; do
		unit="$(basename "${f}")"

		if [ "${unit}" = "network-device-down.service" ] && [ "$(readlink "${f}")" = "/dev/null" ]; then
			fix_systemd_mask "${unit}"
		fi
	done
done

Inside container:

root@mygui:/# ls -l /etc/systemd/system-generators/lxc
-rwxr-xr-x 1 root root 7687 Dec  9 19:35 /etc/systemd/system-generators/lxc
find /run/systemd/ | grep zzz-lxc

I assume inside the container:

root@mygui:/# find /run/systemd/ | grep zzz-lxc
/run/systemd/system/systemd-sysctl.service.d/zzz-lxc-override.conf
/run/systemd/system/systemd-udev-trigger.service.d/zzz-lxc-override.conf
/run/systemd/system/service.d/zzz-lxc-service.conf

Okay, so the overrides appear to be in place.

Can you show systemctl cat systemd-logind?

root@mygui:/# systemctl cat systemd-logind
# /lib/systemd/system/systemd-logind.service
#  SPDX-License-Identifier: LGPL-2.1-or-later
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=User Login Management
Documentation=man:sd-login(3)
Documentation=man:systemd-logind.service(8)
Documentation=man:logind.conf(5)
Documentation=man:org.freedesktop.login1(5)

Wants=user.slice modprobe@drm.service
After=nss-user-lookup.target user.slice modprobe@drm.service
ConditionPathExists=|/lib/systemd/system/dbus.service
ConditionPathExists=|/lib/systemd/system/dbus-broker.service

# Ask for the dbus socket.
Wants=dbus.socket
After=dbus.socket

[Service]
BusName=org.freedesktop.login1
CapabilityBoundingSet=CAP_SYS_ADMIN CAP_MAC_ADMIN CAP_AUDIT_CONTROL CAP_CHOWN CAP_DAC_READ_SEARCH CAP_DAC_OVERRIDE CAP_FOWNER CAP_SYS_TTY_CONFIG CAP_LINUX_IMMUTABLE
DeviceAllow=block-* r
DeviceAllow=char-/dev/console rw
DeviceAllow=char-drm rw
DeviceAllow=char-input rw
DeviceAllow=char-tty rw
DeviceAllow=char-vcs rw
ExecStart=/lib/systemd/systemd-logind
FileDescriptorStoreMax=512
IPAddressDeny=any
LockPersonality=yes
MemoryDenyWriteExecute=yes
NoNewPrivileges=yes
PrivateTmp=yes
# We don't use ProtectProc= since we need to look for usernames and tty for wall messages
ProtectClock=yes
ProtectControlGroups=yes
ProtectHome=yes
ProtectHostname=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectSystem=strict
ReadWritePaths=/etc /run
Restart=always
RestartSec=0
RestrictAddressFamilies=AF_UNIX AF_NETLINK
RestrictNamespaces=yes
RestrictRealtime=yes
RestrictSUIDSGID=yes
RuntimeDirectory=systemd/sessions systemd/seats systemd/users systemd/inhibit systemd/shutdown
RuntimeDirectoryPreserve=yes
StateDirectory=systemd/linger
SystemCallArchitectures=native
SystemCallErrorNumber=EPERM
SystemCallFilter=@system-service
WatchdogSec=3min

# Increase the default a bit in order to allow many simultaneous logins since
# we keep one fd open per session.
LimitNOFILE=524288

# /run/systemd/system/service.d/zzz-lxc-service.conf
[Service]
ProcSubset=all
ProtectProc=default
ProtectControlGroups=no
ProtectKernelTunables=no
NoNewPrivileges=no
LoadCredential=

@stgraber Anything useful there?

It’s probably got something to do with the apparmor profile, that’s the main difference I can think of between what you’re getting with LXC here compared to what we’re getting with Incus.

Maybe try lxc.apparmor.profile=unchanged to see if that does the trick?

In the config file (~/.local/share/lxc/mygui/config), right?

It just fails to start after adding that in:

> systemd-run --unit=myshell --user --scope -p "Delegate=yes" lxc-start -n mygui -F
Running scope as unit: myshell.scope
lxc-start: mygui: utils.c: safe_mount: 1204 No such file or directory - Failed to mount "/dev/video0" onto "/usr/lib/x86_64-linux-gnu/lxc/rootfs/dev/video0"
                                                                            Failed to mount tmpfs at /dev/shm: Permission denied
Failed to mount tmpfs at /run: Permission denied
Failed to mount tmpfs at /run/lock: Permission denied
[!!!!!!] Failed to mount API filesystems.
Exiting PID 1...

Oh, because then it probably inherits the lxc-start profile…
Can you set lxc.apparmor.profile=unconfined instead? I don’t know if that’s allowed for unprivileged users though.

It starts up, but seems to be in the same state as before. Lots of errors and no networking.

Is there a way to verify it’s set correctly? The result is also the same if I just put something random as the profile.