No IPv4 on Arch Linux containers

Thanks for reporting this. My host is based on Arch Linux, do I need the older version of systemd on the host too?

@stgraber - You can add OpenSUSE Tumbleweed to the list.

Was clear as of yesterday’s test, did they update systemd yesterday/today?

Host version shouldn’t matter, only container version should.

Can confirm, same issue on Chrome OS 79 / Crostini with Arch Linux guest and systemd version above systemd 244 (244-1-arch).

Apologies, I thought the host and container issues were starting to come together - maybe not. My comment was strictly in reference to the host. Containers on my Tumbleweed host (systemd 243) on 5.3.12-1 are failing to get an IPv4 address over an Linux bridge. There were a few other reports of similar behavior on the forum, in addition to the Github issue opened that I linked earlier in this thread: LXD Container not getting ip address from DHCP using linux bridge, Container do not get IP addresses after a reboot — or internet connection

@parm

I find dubious that Fedora 31 could have the very same issue. Not getting an IP address can come from dozen of reasons, and Fedora 31 does definitely not run systemd 2.44.1, rather 2.43 with Fedora sauce like all ‘stable’ distros.

Can you try on your setup to edit the container config to add:

raw.lxc: lxc.mount.auto = proc:rw sys:ro

and then restart it, of course

@gpatel-fr

Adding that manual config override did the job, container has an IP now. Thanks!

Could there be any isolation issues with that override? Or should this override be part of default config?

While I’m not a LXD dev and you should take my advice with a pinch of salt, I’d say probably no. I don’t see how making something readonly instead of readwrite could be a security risk. If anything, it could remove existing capabilities of container. I only tested a dnf installation, maybe there are some stuff that will break.

probably no. My impression is that the proper fix could be to make udev work correctly inside the container instead of forcing it to be ignored at the link creation.
From what I have seen on the net, docker is doing this mount ro and that’s why systemd has done this hideous change that I have tried to work around so the issue is muddy.

Well not the mud is getting deeper, the origin is the same for arch and fedora (and probably other distros as well). The commit text is trying to say that the goal is to disable udev in containers. So making lxc config conform to the systemd change is having the unfortunate side-effect of disabling udev, that is, if one is plugging an Usb device to the physical computer a running container will never see it.

A guess could be that is the very reason (enabling udev) for lxc to mounting /sys read-write contrary to the explicit instructions of the Book of SystemD, see Execution Environment …,

Just wanted to give an update in this thread as we now think we fully understand the problem.

The regression was indeed introduced through a bugfix in systemd 244.1 as linked in the post above.
The systemd developers refused to back this change to fix our users arguing that the new logic is correct and that the problem is that /sys is writable in our containers.

We can’t make /sys read-only as we specifically need it writable for a number of other network operations (bridges for libvirt and the like). We also care about having udev running in containers to handle our device hotplug logic for which we’ve done kernel work in the past few years.

So this gets us in a bit of a stuck situation as far as easy fixes are concerned. We have identified one kernel issue which prevents udev from behaving in the way networkd expects it and @brauner is working on fixing this upstream, though as with any kernel change, this will take time to roll out to all distros.

The issue can be worked around a few other ways in the mean time:

  • Have individual distros revert the systemd change (we will push for Ubuntu to do that)
  • Use raw.lxc to force /sys to be read-only (as suggested above)
  • Use a systemd override on the systemd-networkd unit to give it a read-only /sys

It’s that last option we’re now investigating for our own images. The plan is to ship a very small systemd unit override in all affected images to make networkd behave as it did previously. Once our kernel change is widely available, this workaround can then be removed.

4 Likes

For those wanting to apply the resulting override on old containers, it looks like this -

/etc/systemd/system/systemd-networkd.service.d/lxc.conf

[Service]
BindReadOnlyPaths=/sys
3 Likes

Did this issue break very recently in Fedora/Arch again? I Have been deploying containers on my test cluster with fan networking, and recently Fedora and Arch containers have not been getting ipv4 addresses, even if I launch with -c security.privileged=true

Ubuntu and voidlinux containers get an IP.

Can you check you’ve not been affected by LXD container stuck in "RUNNING" without IP address

I’ve looked through that forum post, and I think it’s something different going on for me.

I’m running lxd 4.0.3


I have just launched two new arch containers. One with -c security.privileged=true and one without.

The unprivileged container gets an IP, and has internet access (I have configured NAT on the bridge with nftables).

The privileged container does not get an IP. Here are the logs from systemd-networkd

[root@positive-pika ~]# systemctl status systemd-networkd
WARNING: terminal is not fully functional
-  (press RETURN)● systemd-networkd.service - Network Service
     Loaded: loaded (/usr/lib/systemd/system/systemd-networkd.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/systemd-networkd.service.d
             └─lxc.conf
     Active: failed (Result: exit-code) since Wed 2020-09-23 22:33:08 UTC; 5min ago
TriggeredBy: ● systemd-networkd.socket
       Docs: man:systemd-networkd.service(8)
    Process: 53 ExecStart=/usr/lib/systemd/systemd-networkd (code=exited, status=226/NAMESPACE)
   Main PID: 53 (code=exited, status=226/NAMESPACE)
Sep 23 22:41:54 positive-pika systemd[117]: systemd-networkd.service: Failed to set up mount namespacing: Permission denied
Sep 23 22:41:54 positive-pika systemd[117]: systemd-networkd.service: Failed at step NAMESPACE spawning /usr/lib/systemd/systemd-networkd: Permission denied

I have re-created this issue:

lxc launch images:archlinux c1
lxc launch images:archlinux c2 -c security.privileged=true
lxc ls
| NAME |  STATE  |         IPV4          |                     IPV6                      |      TYPE       | SNAPSHOTS |
+------+---------+-----------------------+-----------------------------------------------+-----------------+-----------+
| c1   | RUNNING | 10.205.185.120 (eth0) | fd42:3e5a:c8df:cf17:216:3eff:fe0d:7378 (eth0) | CONTAINER       | 0         |
+------+---------+-----------------------+-----------------------------------------------+-----------------+-----------+
| c2   | RUNNING |                       | fd42:3e5a:c8df:cf17:216:3eff:fe4e:c6ee (eth0) | CONTAINER       | 0         |

In c2 the logs show:

Sep 25 15:12:03 c2 systemd[79]: systemd-networkd.service: Failed to set up mount namespacing: Permission denied
Sep 25 15:12:03 c2 systemd[79]: systemd-networkd.service: Failed at step NAMESPACE spawning /usr/lib/systemd/systemd-networkd: Permission denied

@brauner @stgraber I remember we saw an issue with systemd-networkd not starting on arch before, is this the same issue or another one?

Seems we have seen the same error on this thread: https://github.com/lxc/lxc/issues/2778

1 Like

@tomp can you check dmesg | grep FAILED for an apparmor denial?

Nothing for FAILED, but DENIED is showing:

[ 8520.220751] audit: type=1400 audit(1601046945.020:9636): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-c2_</var/lib/lxd>" name="/run/systemd/unit-root/" pid=75596 comm="(networkd)" srcname="/" flags="rw, rbind"
[ 8520.474519] audit: type=1400 audit(1601046945.272:9637): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-c2_</var/lib/lxd>" name="/run/systemd/unit-root/" pid=75681 comm="(resolved)" srcname="/" flags="rw, rbind"
[ 8520.481921] audit: type=1400 audit(1601046945.280:9638): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-c2_</var/lib/lxd>" name="/run/systemd/unit-root/" pid=75685 comm="(resolved)" srcname="/" flags="rw, rbind"
[ 8520.487700] audit: type=1400 audit(1601046945.288:9639): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-c2_</var/lib/lxd>" name="/run/systemd/unit-root/" pid=75687 comm="(d-logind)" srcname="/" flags="rw, rbind"

Is it related to https://github.com/lxc/lxd/issues/5439#issuecomment-461257784?

Ah yeah, I meant DENIED.

Okay, so this particular case cannot be allowed safely as adding a rule to allow the above would allow bypassing confinement and escape to the host.

Your best bet I think is to add an override to the unit, disabling the namespacing used here.

Hi,

Any tailwind on this? Strangely, am getting the same problem on a Debian host with Ubuntu 20.04 guest. The guest has an ipv6 address but no ipv4. I’ve got two profiles, a macvlan profile and a default profile. Default profile has not been able to allocate ipv4 to the guest.

Name: grfna
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/10/01 03:45 UTC
Status: Running
Type: container
Profiles: default
Pid: 27220
Ips:
eth0: inet6 fd42:189b:283b:f067:216:3eff:fe96:f450 veth97687de6
eth0: inet6 fe80::216:3eff:fe96:f450 veth97687de6
lo: inet 127.0.0.1
lo: inet6 ::1
Resources:
Processes: 54
CPU usage:
CPU usage (in seconds): 4
Memory usage:
Memory (current): 456.63MB
Memory (peak): 463.02MB
Network usage:
eth0:
Bytes received: 1.71kB
Bytes sent: 5.46kB
Packets received: 14
Packets sent: 38
lo:
Bytes received: 612B
Bytes sent: 612B
Packets received: 8
Packets sent: 8

dmesg | grep DENIED

[35742.355421] audit: type=1400 audit(1601522910.735:126): apparmor=“DENIED” operation=“mount” info=“failed flags match” error=-13 profile=“lxd-hello_</var/snap/lxd/common/lxd>” name=“/run/” pid=24251 comm=“mount” flags=“rw, nosuid, nodev, remount”