DHCP addresses not re-used on Bionic+ containers

Hi folks,

I’m using LXD (v3, but I have tested it on v4 and other intermediate versions as well) with the default lxdbr0 networking, dnsmasq, etc all the usual, nothing out of the ordinary as far as I can tell. I’m using Ubuntu “minimal” images, 16.04, 18.04, and 20.04, though I’ve also tried the ubuntu-provided “cloud” images as well. I originally ran into issues where if the LXD host was down for say 30 minutes, some containers would arbitrarily “forget” their IPs and swap IPs with one another, and this is new behaviour for me. The machine has been in service for a couple of years and only started doing it in the last few months when I switched the container base from Ubuntu to Focal.

This seemed obviously DHCP-related - LXD defaults to 1h leases per dnsmasq, and the renew time is 30 minutes… so a 30 minute outage is entirely long enough where some containers could allow their leases to expire. After troubleshooting it for some time, I bisected the change in behaviour to Xenial->Bionic, and I believe it’s caused by the switch to netplan/systemd-networkd - Xenial uses dhclient instead and it acts correctly.

If the lease expires while the container is down, dhclient will issue a dhcp “discover” message with option 50 set, requesting the old IP address it formerly had:

Dynamic Host Configuration Protocol (Discover)
    Option: (53) DHCP Message Type (Discover)
        Length: 1
        DHCP: Discover (1)
    Option: (50) Requested IP Address (172.16.0.150)
        Length: 4
        Requested IP Address: 172.16.0.150
    Option: (12) Host Name
        Length: 13
        Host Name: ruling-collie

… the Bionic containers don’t do this, they just send the same “discover” request without option 50, and are given the next available IP. If the containers start out of order compared to when they were created, they occasionally end up swapping IP addresses with each other. I don’t know why, but it only seems to ever swap IPs that are adjacent to each other, but that’s a head-scratcher for another day.

Ignoring for the time being the obvious solutions to the issue such as:

  • Increasing the DHCP lease time.
  • Don’t route based off IP address (instead route via DNS hostname which dnsmasq will provide correctly etc).
  • Manually configuring IP addresses.
  • Ditching netplan and systemd-networkd completely.
  • Avoiding outages lasting more than 29 minutes.

… I’d really like to know how to get it to have the same behaviour as dhclient. It looks like systemd-networkd will not attempt to re-use the old connection if Anonymize is set to true (because the other options that are new on the dchp discover seem to match those described in systemd-networkd’s documentation, emulating a windows client for counter-fingerprinting purposes), however I can’t see where LXD would be configuring that (and it makes zero sense for containers unless you’re going to make the image public). Oh, and it’s worth noting that out of the box, a physical Bionic/Focal machine or a Bionic/Focal VM will not exhibit this behaviour - it requests the last IP address it had even some weeks after the last time it was used.

I thought maybe it was cloud-init’s fault, so I disabled cloud-init’s networking and configured netplan manually, but it’s still doing it.

Any ideas where else to look? I’d considered posting this to a general Ubuntu forum, but inside LXD seems to be the only place I can reproduce this so I thought I’d start here.

I’m not sure if this is one of the options you’d already discounted, but if you need your containers to have a static IP then you can assign one in LXD and it will create a static DHCP lease for the container’s MAC address that so that it gets it every time.

E.g.

lxc config device override <container> eth0 ipv4.address=n.n.n.n

This avoids needing to setup the IPs manually inside the container.

Regarding your specific question about DHCP option 50 in Ubuntu Focal images, I’ve just tried this now with a fresh image on LXD 4.4:

On my lxdbr0 bridge, setup tcpdump:

sudo tcpdump -pvnl -i lxdbr0 port 67 and port 68

In a separate terminal, launch a new focal container:

lxc launch images:ubuntu/focal c1

This records:

09:26:30.235578 IP (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 302)
    0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:16:3e:d1:3b:30, length 274, xid 0x19490c97, secs 1, Flags [none]
	  Client-Ethernet-Address 00:16:3e:d1:3b:30
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Discover
	    Client-ID Option 61, length 7: ether 00:16:3e:d1:3b:30
	    Parameter-Request Option 55, length 11: 
	      Subnet-Mask, Default-Gateway, Hostname, Domain-Name
	      Domain-Name-Server, MTU, Static-Route, Classless-Static-Route
	      Option 119, NTP, Option 120
	    MSZ Option 57, length 2: 576
	    Hostname Option 12, length 2: "c1"
09:26:30.235746 IP (tos 0xc0, ttl 64, id 15859, offset 0, flags [none], proto UDP (17), length 329)
    10.109.89.1.67 > 10.109.89.111.68: BOOTP/DHCP, Reply, length 301, xid 0x19490c97, secs 1, Flags [none]
	  Your-IP 10.109.89.111
	  Server-IP 10.109.89.1
	  Client-Ethernet-Address 00:16:3e:d1:3b:30
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Offer
	    Server-ID Option 54, length 4: 10.109.89.1
	    Lease-Time Option 51, length 4: 3600
	    RN Option 58, length 4: 1800
	    RB Option 59, length 4: 3150
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    BR Option 28, length 4: 10.109.89.255
	    Default-Gateway Option 3, length 4: 10.109.89.1
	    Domain-Name-Server Option 6, length 4: 10.109.89.1
	    Domain-Name Option 15, length 3: "lxd"
	    Hostname Option 12, length 2: "c1"
09:26:30.237641 IP (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 314)
    0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:16:3e:d1:3b:30, length 286, xid 0x19490c97, secs 1, Flags [none]
	  Client-Ethernet-Address 00:16:3e:d1:3b:30
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Request
	    Client-ID Option 61, length 7: ether 00:16:3e:d1:3b:30
	    Parameter-Request Option 55, length 11: 
	      Subnet-Mask, Default-Gateway, Hostname, Domain-Name
	      Domain-Name-Server, MTU, Static-Route, Classless-Static-Route
	      Option 119, NTP, Option 120
	    MSZ Option 57, length 2: 576
	    Server-ID Option 54, length 4: 10.109.89.1
	    Requested-IP Option 50, length 4: 10.109.89.111
	    Hostname Option 12, length 2: "c1"
09:26:30.241105 IP (tos 0xc0, ttl 64, id 15860, offset 0, flags [none], proto UDP (17), length 329)
    10.109.89.1.67 > 10.109.89.111.68: BOOTP/DHCP, Reply, length 301, xid 0x19490c97, secs 1, Flags [none]
	  Your-IP 10.109.89.111
	  Server-IP 10.109.89.1
	  Client-Ethernet-Address 00:16:3e:d1:3b:30
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: ACK
	    Server-ID Option 54, length 4: 10.109.89.1
	    Lease-Time Option 51, length 4: 3600
	    RN Option 58, length 4: 1800
	    RB Option 59, length 4: 3150
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    BR Option 28, length 4: 10.109.89.255
	    Default-Gateway Option 3, length 4: 10.109.89.1
	    Domain-Name-Server Option 6, length 4: 10.109.89.1
	    Domain-Name Option 15, length 3: "lxd"
	    Hostname Option 12, length 2: "c1"

Then if I restart container lxc restart c1:

09:30:04.115671 IP (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 302)
    0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:16:3e:d1:3b:30, length 274, xid 0xa0e6bef2, secs 1, Flags [none]
	  Client-Ethernet-Address 00:16:3e:d1:3b:30
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Discover
	    Client-ID Option 61, length 7: ether 00:16:3e:d1:3b:30
	    Parameter-Request Option 55, length 11: 
	      Subnet-Mask, Default-Gateway, Hostname, Domain-Name
	      Domain-Name-Server, MTU, Static-Route, Classless-Static-Route
	      Option 119, NTP, Option 120
	    MSZ Option 57, length 2: 576
	    Hostname Option 12, length 2: "c1"
09:30:04.115884 IP (tos 0xc0, ttl 64, id 19835, offset 0, flags [none], proto UDP (17), length 329)
    10.109.89.1.67 > 10.109.89.111.68: BOOTP/DHCP, Reply, length 301, xid 0xa0e6bef2, secs 1, Flags [none]
	  Your-IP 10.109.89.111
	  Server-IP 10.109.89.1
	  Client-Ethernet-Address 00:16:3e:d1:3b:30
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Offer
	    Server-ID Option 54, length 4: 10.109.89.1
	    Lease-Time Option 51, length 4: 3600
	    RN Option 58, length 4: 1800
	    RB Option 59, length 4: 3150
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    BR Option 28, length 4: 10.109.89.255
	    Default-Gateway Option 3, length 4: 10.109.89.1
	    Domain-Name-Server Option 6, length 4: 10.109.89.1
	    Domain-Name Option 15, length 3: "lxd"
	    Hostname Option 12, length 2: "c1"
09:30:04.116799 IP (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 314)
    0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from 00:16:3e:d1:3b:30, length 286, xid 0xa0e6bef2, secs 1, Flags [none]
	  Client-Ethernet-Address 00:16:3e:d1:3b:30
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: Request
	    Client-ID Option 61, length 7: ether 00:16:3e:d1:3b:30
	    Parameter-Request Option 55, length 11: 
	      Subnet-Mask, Default-Gateway, Hostname, Domain-Name
	      Domain-Name-Server, MTU, Static-Route, Classless-Static-Route
	      Option 119, NTP, Option 120
	    MSZ Option 57, length 2: 576
	    Server-ID Option 54, length 4: 10.109.89.1
	    Requested-IP Option 50, length 4: 10.109.89.111
	    Hostname Option 12, length 2: "c1"
09:30:04.120258 IP (tos 0xc0, ttl 64, id 19836, offset 0, flags [none], proto UDP (17), length 329)
    10.109.89.1.67 > 10.109.89.111.68: BOOTP/DHCP, Reply, length 301, xid 0xa0e6bef2, secs 1, Flags [none]
	  Your-IP 10.109.89.111
	  Server-IP 10.109.89.1
	  Client-Ethernet-Address 00:16:3e:d1:3b:30
	  Vendor-rfc1048 Extensions
	    Magic Cookie 0x63825363
	    DHCP-Message Option 53, length 1: ACK
	    Server-ID Option 54, length 4: 10.109.89.1
	    Lease-Time Option 51, length 4: 3600
	    RN Option 58, length 4: 1800
	    RB Option 59, length 4: 3150
	    Subnet-Mask Option 1, length 4: 255.255.255.0
	    BR Option 28, length 4: 10.109.89.255
	    Default-Gateway Option 3, length 4: 10.109.89.1
	    Domain-Name-Server Option 6, length 4: 10.109.89.1
	    Domain-Name Option 15, length 3: "lxd"
	    Hostname Option 12, length 2: "c1"

So its only sending the option 50, requested IP, after the initial discover.

I’ve also recorded the desired behaviour you describe when using a Xenial image instead.

In netplan based containers, you can see the networkd config file generated by netplan by looking at:

/run/systemd/network/10-netplan-eth0.network

[Match]
Name=eth0

[Network]
DHCP=ipv4
LinkLocalAddressing=ipv6

[DHCP]
ClientIdentifier=mac
RouteMetric=100
UseMTU=true

But there is nothing obvious there that is setting this behaviour. And Anonymize is defaulting to false, so unlikely to be caused by that.

I’ve also observed what you said about physical Focal machines using networkd sending the Option 50 on initial Discover. Although in my case, there was no networkd config file in /run/systemd/network/ and instead it was being controlled by NetworkManager (as this was a desktop machine).

In your observations with a physical machine, was this using networkd with a config file or NetworkManager?

Thanks @tomp for taking a look!

I included that in the “manually configuring IP addresses” - it’s not super desirable as it’s additional administrative overhead. With dhclient everything works how I want it to.

Did you wait for the DHCP lease to expire before starting c1 again? That’s the difference - they get IP addresses back if they’re up the whole time and restarted, no issues there… the only difference is the behaviour if the addresses expire while the machine is off. It looks like your container was issued a 3600 second lease - that’s 30 minutes and would not have expired in the ~4 minutes your container was down.

I think what you’re seeing is the expected behaviour with the Anonymize mode on - it sends the initial DHCP “discover” message, then realizes it’s talking to the same DHCP server, so it sends back an “actually may I have address X instead” which the server OKs. My containers don’t do this, but I think it’s because the lease is expired.

My physical Bionic machine is netplan/systemd-networkd, not network manager, and I’d have to check but I’m 99% sure that the Focal VM I spun up was too - I don’t run the desktop version, which I think is where network-manager comes from?

Even with restarting the containers I didnt see the option 50 being sent in the Discover request with focal/netplan (container) only with xenial (container) and focal/networkmanager (physical desktop). So it sounds like it mirrors what you were seeing with lease expiring anyway.

The other thing that complicates this somewhat is that dnsmasq also uses an internal algorithm to hash the mac address to try and allocate the same IP to the machine each time. But this only works if another node hasn’t taken it. So as the active node count increases the chances of getting a stable IP that way reduces too (which may or may not be a factor here).

Yes, but what I meant was this is the correct behaviour for systemd-networkd per the documentation when Anonymize is on (though why it’s on, or if this behaviour is being caused by something else inside LXD, I do not know). It doesn’t immediately send the old IP, in case you connected to a different AP (eg took your machine to Starbucks and connected to their wifi, then your machine would disclose your LAN IP and maybe some other information to the entire broadcast domain). Once it’s sure it’s the same DHCP server listening, then it sends option 50.

dhclient does not have these privacy enhancements (rfc7844), so naturally it sends option 50 immediately… but the crux of the matter is that it sends option 50 even if it’s local copy of the lease is expired (“trying it’s luck”) whereas it seems that systemd-networkd does not.

I can’t see any sign that Anonymize is on inside the container (the generated networkd config file by netplan doesnt specify it at least and the default is false according to docs).

Maybe worth reporting this to systemd upstream and see what they say.

I’ll keep digging at it tomorrow - it might not be tied to Anonymize, it might be that for some reason it’s deciding to not request for another reason? From what I could tell Anonymize also sends a pile of extra parameters in option 55 to try and look like Windows, I’m sure I saw this in wireshark earlier but maybe I’m imagining it because I’m not seeing it now (it’s been a long day).

The other thing that complicates this somewhat is that dnsmasq also uses an internal algorithm to hash the mac address to try and allocate the same IP to the machine each time.

BTW thank you for pointing out the hashing thing, I’m pretty sure that explains why I’m only seeing it with containers that have adjacent IPs… I’m guessing what happens is that the containers collide to the same IP, so it just increments the IP for the second one, which is why they swap when they expire and the containers start in a different order?

Hmm, double checking my work, I think I was mistaken on the physical and VMs being different from the containers… it seems like from reading the code and experimenting that systemd-networkd has no facility to remember it’s leases between boots, and it seems that the reason my VMs got the same IP each time is that my home network also runs dnsmasq, so per the hashing mechanism it’s given the same IP address as well. :frowning:

Looking at the code more closely, the last_addr isn’t used between reboots, it’s used if the carrier disappears and then comes back (eg disconnecting from a wifi network, and reconnecting, and the anonymize feature relates to that). So yeah, it definitely doesn’t look to be anonymize related, it’s just a missing feature in systemd.

So it seems that there’s really no fixing this issue with systemd-networkd from the look of things, which is a bummer, but not much I can do… at least it’s not an LXD issue.

1 Like