Systemd-networkd-wait-online timed out on lxd 5.11 when bridged to ipv4 only net

hi-ko · March 2, 2023, 10:10am

we migrated our lxd from a Ubuntu 18 to 22.04 running lxd 5.11

There is a strange issue with containers connected to a (non managed) bridge on the local network: On very first boot of a container on this host it takes minutes until the container gets an IP via dhcp. Any following dhcp request and also restart of the container is as fast as expected.

in the journal I see:

Mar 02 09:40:23 test systemd[1]: sys-subsystem-net-devices-eth0.device: Job sys-subsystem-net-devices-eth0.device/start timed out.
Mar 02 09:40:23 test systemd[1]: Timed out waiting for device sys-subsystem-net-devices-eth0.device.
Mar 02 09:40:23 test systemd[1]: sys-subsystem-net-devices-eth0.device: Job sys-subsystem-net-devices-eth0.device/start failed with re
Mar 02 09:40:23 test systemd[1]: Startup finished in 1min 30.548s.

the container have attached the following profile to use the bridge:

devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
name: bridged

and my netplan config looks like:

network:
  ethernets:
    ens160:
      dhcp4: false
      dhcp6: false
  bridges:
    br0:
      dhcp4: yes
      dhcp-identifier: mac
      interfaces:
          - ens160
  version: 2

As anybody a hint or idea howto analysis / fix this issue?

hi-ko · March 2, 2023, 10:42am

the issue seems to be realted to copying the container using --refresh

How I was able to reproduce the issue:

lxc launch ubuntu:20.04 test -p default -p bridged
container starts as expected and gets an IP from the bridged network
lxc copy test lxd04: --refresh
lxc stop test

on lxd04

lxc config show test

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20230209)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20230209"
  image.type: squashfs
  image.version: "20.04"
  volatile.base_image: e90761e627debd09431b580cb87d485c4ae1f657ecceaacda17ce729156e6f7e
  volatile.cloud-init.instance-id: 163ff071-433b-4182-a4da-141e99a52c2e
  volatile.eth0.host_name: veth00b80583
  volatile.eth0.hwaddr: 00:16:3e:2e:57:44
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: cad89d6f-45b2-452b-b835-4923ca4b2be7
devices: {}
ephemeral: false
profiles:
- default
- bridged
stateful: false
description: ""

lxc start test
in container test’s journal:

Mar 02 10:38:10 test systemd-networkd-wait-online[162]: Event loop failed: Connection timed out
Mar 02 10:38:10 test systemd-journald[59]: Forwarding to syslog missed 82 messages.
Mar 02 10:38:10 test systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Mar 02 10:38:10 test systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Mar 02 10:38:10 test systemd[1]: Failed to start Wait for Network to be Configured.
Mar 02 10:38:10 test systemd[1]: Starting Initial cloud-init job (metadata service crawler)...
Mar 02 10:38:11 test cloud-init[182]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init' at Thu, 02 Mar 2023 10:38:10 +0000. Up 125.93 seco

hi-ko · March 2, 2023, 5:04pm

It seems systemd-networkd-wait-online is the issue waiting on a ipv6 address but there is no ipv6 on that bridged network.
So systemd gets into State: degraded

systemctl --failed
  UNIT                                 LOAD   ACTIVE SUB    DESCRIPTION
● systemd-networkd-wait-online.service loaded failed failed Wait for Network to be Configured

systemctl status systemd-networkd-wait-online.service
× systemd-networkd-wait-online.service - Wait for Network to be Configured
     Loaded: loaded (/lib/systemd/system/systemd-networkd-wait-online.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Thu 2023-03-02 16:56:39 UTC; 3min 44s ago
       Docs: man:systemd-networkd-wait-online.service(8)
    Process: 174 ExecStart=/lib/systemd/systemd-networkd-wait-online (code=exited, status=1/FAILURE)
   Main PID: 174 (code=exited, status=1/FAILURE)
        CPU: 14ms

Mar 02 16:54:39 test1 systemd[1]: Starting Wait for Network to be Configured...
Mar 02 16:56:39 test1 systemd-networkd-wait-online[174]: Timeout occurred while waiting for network connectivity.
Mar 02 16:56:39 test1 systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Mar 02 16:56:39 test1 systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Mar 02 16:56:39 test1 systemd[1]: Failed to start Wait for Network to be Configured.

I tried

lxc network set br0 ipv6.address none
Error: Only managed networks can be modified

Is there another way to prevent the container to insist on ipv6?

A work around I found so far was to add link-local: [ ] to the containers netplan yml to overwrite the default fallback on ipv6 (found in systemd issue 6441) e.g. in your container’s netplan yml

        eth0:
            dhcp4: true
            link-local: [ ]

Does anybody know a better way not to configure every container’s network?

tomp · March 17, 2023, 9:16am

Does this happen for a newly launched instance of the same OS type/version?

hi-ko · March 17, 2023, 1:35pm

Hi @tomp,
yes this behavior happens also on newly launched containers. e.g.

launch images:ubuntu/22.04/cloud test -p default -p bridged

running lxc exec test -- journalctl -f I then see:

Mar 17 13:32:18 test systemd-networkd-wait-online[139]: Timeout occurred while waiting for network connectivity.
Mar 17 13:32:18 test systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Mar 17 13:32:18 test systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Mar 17 13:32:18 test systemd[1]: Failed to start Wait for Network to be Configured.

tomp · April 27, 2023, 9:07am

I just tried this and didn’t observe it, can you try this?

lxc network create foo ipv6.address=none
lxc launch images:ubuntu/22.04/cloud cfoo -n foo

And then see if you get the same log errors.

hi-ko · April 27, 2023, 10:35am

Using the managed foo network works as expected. The issue is only when using the profile bridged mentioned at the beginning to use the non lxd managed bridge which makes the container transparently visible in the local network. I prefer that way to run containers transparently in the same vlan/network with other services.

To reproduce: in netplan you define the bridge on the lxd host only containing the the one network interface you want to connect the containers:

  bridges:
    br0:
      dhcp4: yes
      dhcp-identifier: mac
      interfaces:
        - enp0s31f6.1234

and then you attach for the container the profile bridged mentioned above.

lxc launch images:ubuntu/22.04/cloud cfoo2 -p default -p bridged

systemd will end in degraded state due to failed systemd-networkd-wait-online.service.
(s. also systemd issue 6441).

tomp · April 27, 2023, 12:26pm

Ah so sounds like a systemd/networkd bug when IPv6 is disabled entirely, rather than just not having a DHCPv6/SLAAC setup.