No IPv4 on Arch Linux containers

gpatel-fr · December 23, 2019, 4:42pm

While I’m not a LXD dev and you should take my advice with a pinch of salt, I’d say probably no. I don’t see how making something readonly instead of readwrite could be a security risk. If anything, it could remove existing capabilities of container. I only tested a dnf installation, maybe there are some stuff that will break.

probably no. My impression is that the proper fix could be to make udev work correctly inside the container instead of forcing it to be ignored at the link creation.
From what I have seen on the net, docker is doing this mount ro and that’s why systemd has done this hideous change that I have tried to work around so the issue is muddy.

gpatel-fr · January 3, 2020, 9:16am

Well not the mud is getting deeper, the origin is the same for arch and fedora (and probably other distros as well). The commit text is trying to say that the goal is to disable udev in containers. So making lxc config conform to the systemd change is having the unfortunate side-effect of disabling udev, that is, if one is plugging an Usb device to the physical computer a running container will never see it.

A guess could be that is the very reason (enabling udev) for lxc to mounting /sys read-write contrary to the explicit instructions of the Book of SystemD, see Execution Environment …,

stgraber · January 8, 2020, 1:31pm

Just wanted to give an update in this thread as we now think we fully understand the problem.

The regression was indeed introduced through a bugfix in systemd 244.1 as linked in the post above.
The systemd developers refused to back this change to fix our users arguing that the new logic is correct and that the problem is that /sys is writable in our containers.

We can’t make /sys read-only as we specifically need it writable for a number of other network operations (bridges for libvirt and the like). We also care about having udev running in containers to handle our device hotplug logic for which we’ve done kernel work in the past few years.

So this gets us in a bit of a stuck situation as far as easy fixes are concerned. We have identified one kernel issue which prevents udev from behaving in the way networkd expects it and @brauner is working on fixing this upstream, though as with any kernel change, this will take time to roll out to all distros.

The issue can be worked around a few other ways in the mean time:

Have individual distros revert the systemd change (we will push for Ubuntu to do that)
Use raw.lxc to force /sys to be read-only (as suggested above)
Use a systemd override on the systemd-networkd unit to give it a read-only /sys

It’s that last option we’re now investigating for our own images. The plan is to ship a very small systemd unit override in all affected images to make networkd behave as it did previously. Once our kernel change is widely available, this workaround can then be removed.

C0rn3j · February 4, 2020, 12:21pm

For those wanting to apply the resulting override on old containers, it looks like this -

/etc/systemd/system/systemd-networkd.service.d/lxc.conf

[Service]
BindReadOnlyPaths=/sys

dontlaugh · September 21, 2020, 10:43pm

Did this issue break very recently in Fedora/Arch again? I Have been deploying containers on my test cluster with fan networking, and recently Fedora and Arch containers have not been getting ipv4 addresses, even if I launch with -c security.privileged=true

Ubuntu and voidlinux containers get an IP.

tomp · September 22, 2020, 8:04am

Can you check you’ve not been affected by LXD container stuck in "RUNNING" without IP address

dontlaugh · September 23, 2020, 10:42pm

I’ve looked through that forum post, and I think it’s something different going on for me.

I’m running lxd 4.0.3

I have just launched two new arch containers. One with -c security.privileged=true and one without.

The unprivileged container gets an IP, and has internet access (I have configured NAT on the bridge with nftables).

The privileged container does not get an IP. Here are the logs from systemd-networkd

[root@positive-pika ~]# systemctl status systemd-networkd
WARNING: terminal is not fully functional
-  (press RETURN)● systemd-networkd.service - Network Service
     Loaded: loaded (/usr/lib/systemd/system/systemd-networkd.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/systemd-networkd.service.d
             └─lxc.conf
     Active: failed (Result: exit-code) since Wed 2020-09-23 22:33:08 UTC; 5min ago
TriggeredBy: ● systemd-networkd.socket
       Docs: man:systemd-networkd.service(8)
    Process: 53 ExecStart=/usr/lib/systemd/systemd-networkd (code=exited, status=226/NAMESPACE)
   Main PID: 53 (code=exited, status=226/NAMESPACE)

Sep 23 22:41:54 positive-pika systemd[117]: systemd-networkd.service: Failed to set up mount namespacing: Permission denied
Sep 23 22:41:54 positive-pika systemd[117]: systemd-networkd.service: Failed at step NAMESPACE spawning /usr/lib/systemd/systemd-networkd: Permission denied

tomp · September 25, 2020, 3:11pm

I have re-created this issue:

lxc launch images:archlinux c1
lxc launch images:archlinux c2 -c security.privileged=true
lxc ls
| NAME |  STATE  |         IPV4          |                     IPV6                      |      TYPE       | SNAPSHOTS |
+------+---------+-----------------------+-----------------------------------------------+-----------------+-----------+
| c1   | RUNNING | 10.205.185.120 (eth0) | fd42:3e5a:c8df:cf17:216:3eff:fe0d:7378 (eth0) | CONTAINER       | 0         |
+------+---------+-----------------------+-----------------------------------------------+-----------------+-----------+
| c2   | RUNNING |                       | fd42:3e5a:c8df:cf17:216:3eff:fe4e:c6ee (eth0) | CONTAINER       | 0         |

In c2 the logs show:

Sep 25 15:12:03 c2 systemd[79]: systemd-networkd.service: Failed to set up mount namespacing: Permission denied
Sep 25 15:12:03 c2 systemd[79]: systemd-networkd.service: Failed at step NAMESPACE spawning /usr/lib/systemd/systemd-networkd: Permission denied

@brauner @stgraber I remember we saw an issue with systemd-networkd not starting on arch before, is this the same issue or another one?

Seems we have seen the same error on this thread: https://github.com/lxc/lxc/issues/2778

stgraber · September 25, 2020, 3:14pm

@tomp can you check dmesg | grep FAILED for an apparmor denial?

tomp · September 25, 2020, 3:17pm

Nothing for FAILED, but DENIED is showing:

[ 8520.220751] audit: type=1400 audit(1601046945.020:9636): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-c2_</var/lib/lxd>" name="/run/systemd/unit-root/" pid=75596 comm="(networkd)" srcname="/" flags="rw, rbind"
[ 8520.474519] audit: type=1400 audit(1601046945.272:9637): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-c2_</var/lib/lxd>" name="/run/systemd/unit-root/" pid=75681 comm="(resolved)" srcname="/" flags="rw, rbind"
[ 8520.481921] audit: type=1400 audit(1601046945.280:9638): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-c2_</var/lib/lxd>" name="/run/systemd/unit-root/" pid=75685 comm="(resolved)" srcname="/" flags="rw, rbind"
[ 8520.487700] audit: type=1400 audit(1601046945.288:9639): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-c2_</var/lib/lxd>" name="/run/systemd/unit-root/" pid=75687 comm="(d-logind)" srcname="/" flags="rw, rbind"

Is it related to https://github.com/lxc/lxd/issues/5439#issuecomment-461257784?

stgraber · September 25, 2020, 3:18pm

Ah yeah, I meant DENIED.

Okay, so this particular case cannot be allowed safely as adding a rule to allow the above would allow bypassing confinement and escape to the host.

Your best bet I think is to add an override to the unit, disabling the namespacing used here.

eaojnr · October 1, 2020, 3:53am

Hi,

Any tailwind on this? Strangely, am getting the same problem on a Debian host with Ubuntu 20.04 guest. The guest has an ipv6 address but no ipv4. I’ve got two profiles, a macvlan profile and a default profile. Default profile has not been able to allocate ipv4 to the guest.

Name: grfna
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/10/01 03:45 UTC
Status: Running
Type: container
Profiles: default
Pid: 27220
Ips:
eth0: inet6 fd42:189b:283b:f067:216:3eff:fe96:f450 veth97687de6
eth0: inet6 fe80::216:3eff:fe96:f450 veth97687de6
lo: inet 127.0.0.1
lo: inet6 ::1
Resources:
Processes: 54
CPU usage:
CPU usage (in seconds): 4
Memory usage:
Memory (current): 456.63MB
Memory (peak): 463.02MB
Network usage:
eth0:
Bytes received: 1.71kB
Bytes sent: 5.46kB
Packets received: 14
Packets sent: 38
lo:
Bytes received: 612B
Bytes sent: 612B
Packets received: 8
Packets sent: 8

dmesg | grep DENIED

[35742.355421] audit: type=1400 audit(1601522910.735:126): apparmor=“DENIED” operation=“mount” info=“failed flags match” error=-13 profile=“lxd-hello_</var/snap/lxd/common/lxd>” name=“/run/” pid=24251 comm=“mount” flags=“rw, nosuid, nodev, remount”

tomp · October 1, 2020, 8:12am

Hi, I’ve not been able to reproduce the issue for debian, here’s my test:

Launch a debian buster VM and login to it:

lxc launch images:debian/buster vdebian --vm
lxc shell vdebian

Install LXD and launch ubuntu container:

apt install snapd -y
snap install core
snap install lxd
lxd init --auto
lxc launch images:ubuntu/focal c1
lxc ls
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| NAME |  STATE  |         IPV4          |                     IPV6                      |   TYPE    | SNAPSHOTS |
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+
| c1   | RUNNING | 10.137.173.103 (eth0) | fd42:a2bd:13e2:62f9:216:3eff:fee3:955b (eth0) | CONTAINER | 0         |
+------+---------+-----------------------+-----------------------------------------------+-----------+-----------+

Please can you provide the config of your container using lxc config show <container> --expanded as well as your kernel and distribution version.

Cacophony · December 7, 2020, 8:27pm

Hi,

Recently i have this issue with arch linux containers.
Host: manjaro x64

Container info

systemd-networkd error

This rule is present:
/etc/systemd/system/systemd-networkd.service.d/lxc.conf

[Service]
BindReadOnlyPaths=/sys

I found a workaround with enablind dhcpcd and adding nodev to dhcpcd.conf

tomp · December 7, 2020, 8:43pm

Could it be related to Systemd 247 with LXD 4.04 breaks systemd-networkd?

chrisoldwood · January 28, 2021, 10:25am

This is cross-posted from here to make it clear the same workaround applies here too.

Anyone like myself stumbling across this topic after a similar issue cropped up back in early December 2020, I can confirm that the advice in this comment on another topic about setting security.nesting=true still applies today with LXD 4.x, unprivileged containers, and the latest:archlinux image, e.g.

lxc launch images:archlinux -c security.nesting=true

or

lxc init images:archlinux $container
lxc config set $container security.nesting true

Note: I have no idea why this works or what the security ramifications are but it suffices for the use case I have.