Systemd update (networkd restart) breaks routed containers

VinceHillier · January 19, 2021, 7:09am

Ubuntu: 18.04
LXD: Snap/4.3 (routed containers)

Ubuntu released an update for systemd , (237-3ubuntu10.44) which triggered a restart of networkd.service.

Jan 19 06:02:16 core01 systemd[1]: Stopped Network Service.
Jan 19 06:02:16 core01 systemd[1]: Starting Network Service...
Jan 19 06:02:16 core01 systemd-timesyncd[620]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).
Jan 19 06:02:16 core01 systemd-networkd[2650]: ens4: Gained IPv6LL
Jan 19 06:02:16 core01 systemd-networkd[2650]: ens3: Gained IPv6LL
Jan 19 06:02:16 core01 systemd-timesyncd[620]: Network configuration changed, trying to establish connection.
Jan 19 06:02:16 core01 systemd-networkd[2650]: Enumeration completed
Jan 19 06:02:16 core01 systemd[1]: Started Network Service.
Jan 19 06:02:16 core01 systemd[1]: Starting Wait for Network to be Configured...
Jan 19 06:02:16 core01 systemd-networkd[2650]: ens4: IPv6 successfully enabled
Jan 19 06:02:16 core01 systemd-networkd[2650]: veth8222ae36: Link is not managed by us
Jan 19 06:02:16 core01 systemd-networkd[2650]: ens3: Link is not managed by us
Jan 19 06:02:16 core01 systemd-networkd[2650]: vethfdef6896: Link is not managed by us
Jan 19 06:02:16 core01 systemd-networkd[2650]: vethbbe48b6e: Link is not managed by us
Jan 19 06:02:16 core01 systemd-networkd[2650]: veth53f3592c: Link is not managed by us
Jan 19 06:02:16 core01 systemd-networkd[2650]: veth1ab77c6b: Link is not managed by us
Jan 19 06:02:16 core01 systemd-networkd[2650]: lo: Link is not managed by us
Jan 19 06:02:16 core01 systemd-networkd-wait-online[2651]: ignoring: lo
....

Restarting systemd-networkd.service effectively breaks all routed containers (tested and verified this is reproducible). The veth devices remained the same both pre/post restart as did the routing table, but no traffic hits the containers until they’re restarted.

This happened a few months ago as well, we weren’t able to isolate the cause at that time.

Wondering if this is a systemd / lxd issue?

tomp · January 19, 2021, 9:54am

Is this sytemd-networkd restarting on the host or inside the containers?

I’ve just tried this on the host and in my setup restarting systemd-networkd doesn’t break routed NIC connectivity.

Can you also advise in what way does it ‘break’, do the containers lose their IPs, or do the static routes or IP neighbour proxy entries get removed?

You can see this by running ip r and ip neigh show proxy before and after restarting systemd-networkd and seeing what, if anything, changes.

VinceHillier · January 19, 2021, 1:54pm

Hi Tom,

Restarting systemd-networkd.service on the LXD host results in the logs above, and all traffic to the containers ceasing.

The containers do not lose their IPs, they are statically assigned via container config. However, they do stop receiving traffic until they are restarted. I will verify after hours if the traffic makes it to the LXD host or not - it might be an arp issue.

lxc config show c1

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 18.04 LTS amd64 (release) (20200519.1)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20200519.1"
  image.type: squashfs
  image.version: "18.04"
  volatile.base_image: 70d3dcaabcffb1aa1644d0ce866efcb141742179e94ad72aefb8d3502338a71f
  volatile.eth0.host_name: veth3e31990d
  volatile.eth0.hwaddr: 00:16:3e:d7:de:f0
  volatile.eth0.last_state.created: "false"
  volatile.eth0.name: eth0
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
devices:
  eth0:
    ipv4.address: xx.xx.xx.xx
    nictype: routed
    parent: ens3
    type: nic
  root:
    path: /
    pool: default
    size: 30GB
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

The routing table does not get modified, admittedly, I didn’t check the neighbour entries. I’ll verify that after hours too.

tomp · January 19, 2021, 1:55pm

It would be useful to post your networkd config too as certainly with a default networkd config on an ubuntu focal system the IP neighbour entries are not being cleaned on restart. But perhaps something in your config is causing it to.

VinceHillier · January 19, 2021, 2:26pm

Mostly default config here:

networkctl list
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback carrier unmanaged
2 ens3 ether routable configured
3 ens4 ether routable configured
43 veth3e31990d ether routable unmanaged
44 veth2b484851 ether routable unmanaged
45 vethf1a93644 ether routable unmanaged
46 vethfadaf2b4 ether routable unmanaged
47 veth4cf95496 ether routable unmanaged

/etc/systemd/network is empty, /run/systemd/network contains 3 files:

10-netplan-ens3.link
[Match]
MACAddress=fa:16:3e:4c:85:47

[Link]
Name=ens3
WakeOnLan=off

10-netplan-ens3.network
[Match]
MACAddress=fa:16:3e:4c:85:47
Name=ens3

[Network]
DHCP=ipv4
LinkLocalAddressing=ipv6

[DHCP]
RouteMetric=100
UseMTU=true

10-netplan-ens4.network
[Match]
Name=ens4

[Network]
DHCP=ipv4
LinkLocalAddressing=ipv6

[DHCP]
RouteMetric=100
UseMTU=true

netplan configuration w/ cloud init disabled:

# This file is generated from information provided by
# the datasource.  Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    version: 2
    ethernets:
        ens3:
            dhcp4: true
            match:
                macaddress: fa:16:3e:4c:85:47
            set-name: ens3
        ens4:
            dhcp4: true

tomp · January 19, 2021, 2:27pm

Which interface is used as the routed NIC parent?

VinceHillier · January 19, 2021, 2:28pm

ens3

tomp · January 19, 2021, 5:43pm

I will try this on 18.04, but on 20.04 at least this is the behaviour I’m observing.

I’m not using netplan either, but direct config files in /etc/systemd/network:

/etc/systemd/network/enp1s0f0.network

[Match]
Name=enp1s0f0

[Network]
LinkLocalAddressing=ipv6
Address=192.168.1.1/24
Address=2a02:xxx:xxx:1::1/64
IPv6AcceptRA=false
IPv6ProxyNDP=true
IPForward=true
Gateway=192.168.1.2
Gateway=2a02:xxx:xxx:1::2

lxc init images:ubuntu/focal c1
lxc config device add c1 eth0 nic nictype=routed parent=enp1s0f0 ipv4.address=192.168.1.201
lxc file delete c1/etc/netplan/10-lxc.yaml
lxc start c1

ip neigh show proxy
169.254.0.1 dev vethea039440  proxy
192.168.1.201 dev enp1s0f0  proxy

ip r
default via 192.168.1.2 dev enp1s0f0 proto static 
192.168.1.0/24 dev enp1s0f0 proto kernel scope link src 192.168.1.1 
192.168.1.201 dev vethea039440 scope link

Now restart systemd-networkd:

sudo systemctl restart systemd-network

Check routes and neighbour proxy entries are present:

ip neigh show proxy
169.254.0.1 dev vethea039440  proxy
192.168.1.201 dev enp1s0f0  proxy

ip r
default via 192.168.1.2 dev enp1s0f0 proto static 
192.168.1.0/24 dev enp1s0f0 proto kernel scope link src 192.168.1.1 
192.168.1.201 dev vethea039440 scope link

tomp · January 19, 2021, 5:58pm

Ah reproduced it, it happens on ubuntu 18.04 the ip neighbour proxy entries are removed for the netplan managed interface.

tomp · January 19, 2021, 6:02pm

If you use a static config and not DHCP then systemd/netplan doesn’t appear to wipe out the IP neighbour entries:

E.g.

network:
  version: 2
  ethernets:
    enp5s0:
            addresses:
                - 10.102.242.9/24
            gateway4: 10.102.242.1
            nameservers:
                addresses: [10.102.242.1]

VinceHillier · January 20, 2021, 10:11am

Indeed, the neighbor proxy entries get zapped.

Thanks for your testing and solution.