IPv4 conflict between all copied VMs using macvlan

enzoaguado · March 23, 2021, 7:49am

I have a VM which I use as a base each time I need a clean dev environment for testing. That base VM is a regular Ubuntu 20.04 image with additional software installed, which I copy with lxc copy my-base-vm some-test-feature. Network is configured to use macvlan

$ lxc profile show macvlan 
config: {}
description: Gives the container an IP address on LAN via DHCP
devices:
  eth0:
    nictype: macvlan
    parent: enp8s0
    type: nic
name: macvlan

Every time I copy that VM, the IPv4 address is the same on all machines. IPv6 is different.
Refreshing the DHCP lease manually does give a new IP, but when leaving the VMs overnight, they go back to the same IP.
ip addr show a different mac address on all containers.
Trying to reach those containers via IP yields mixed results: things either work for one container, or network can be completely unreliable

enzoaguado@chopin:~$ lxc ls
+------------------+---------+------------------------------+----------------------------------------------+-----------------+-----------+----------+
|       NAME       |  STATE  |             IPV4             |                     IPV6                     |      TYPE       | SNAPSHOTS | LOCATION |
+------------------+---------+------------------------------+----------------------------------------------+-----------------+-----------+----------+
| base-vm          | STOPPED |                              |                                              | VIRTUAL-MACHINE | 1         | chopin   |
+------------------+---------+------------------------------+----------------------------------------------+-----------------+-----------+----------+
| ha-anbox-cloud   | RUNNING | 192.168.2.75 (enp5s0)        | fd4e:10c3:a671:4:216:3eff:fe31:c3c7 (enp5s0) | VIRTUAL-MACHINE | 1         | chopin   |
|                  |         | 10.245.1.1 (lxdbr0)          |                                              |                 |           |          |
+------------------+---------+------------------------------+----------------------------------------------+-----------------+-----------+----------+
| mass-app-update  | RUNNING | 192.168.2.75 (enp5s0)        | fd4e:10c3:a671:4:216:3eff:fea7:e045 (enp5s0) | VIRTUAL-MACHINE | 1         | chopin   |
+------------------+---------+------------------------------+----------------------------------------------+-----------------+-----------+----------+

tomp · March 23, 2021, 9:02am

Can you show lxc config show <instance> --expanded for the original instance and the copy please?

enzoaguado · March 23, 2021, 9:11am

Base VM

enzoaguado@chopin:~$ lxc config show base-vm --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20210201)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20210201"
  image.type: disk-kvm.img
  image.version: "20.04"
  limits.cpu: "4"
  limits.memory: 8GB
  security.secureboot: "false"
  user.user-data: |
    #cloud-config
    apt_mirror: http://fr.archive.ubuntu.com/ubuntu/
    ssh_pwauth: yes
    users:
      - name: ubuntu
        passwd: "\$6\$s.w...T22qGFl/"
        lock_passwd: false
        groups: lxd
        shell: /bin/bash
        sudo: ALL=(ALL) NOPASSWD:ALL
  volatile.base_image: 72e36c2e7f8253d6e966d9478a6db8f7165aca4e6216df8cd5bb38057fad51e8
  volatile.eth0.host_name: macf2b2dbd7
  volatile.eth0.hwaddr: 00:16:3e:fe:b9:d5
  volatile.eth0.last_state.created: "false"
  volatile.last_state.power: RUNNING
  volatile.uuid: ea134810-5e7a-4b9d-8ea4-80c6998cc756
devices:
  eth0:
    nictype: macvlan
    parent: enp8s0
    type: nic
  root:
    path: /
    pool: tank
    size: 50GB
    type: disk
ephemeral: false
profiles:
- default
- official-image-vm
- anbox-cloud
- macvlan
stateful: false
description: ""

Copied VM

enzoaguado@chopin:~$ lxc config show ha-anbox-cloud --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20210201)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20210201"
  image.type: disk-kvm.img
  image.version: "20.04"
  limits.cpu: "4"
  limits.memory: 8GB
  security.secureboot: "false"
  user.user-data: |
    #cloud-config
    apt_mirror: http://fr.archive.ubuntu.com/ubuntu/
    ssh_pwauth: yes
    users:
      - name: ubuntu
        passwd: "\$6\$s.wX...22qGFl/"
        lock_passwd: false
        groups: lxd
        shell: /bin/bash
        sudo: ALL=(ALL) NOPASSWD:ALL
  volatile.base_image: 72e36c2e7f8253d6e966d9478a6db8f7165aca4e6216df8cd5bb38057fad51e8
  volatile.eth0.host_name: mac1c087517
  volatile.eth0.hwaddr: 00:16:3e:67:03:e9
  volatile.eth0.last_state.created: "false"
  volatile.last_state.power: RUNNING
  volatile.uuid: b3501c33-7f95-4993-8708-e86baa30824f
devices:
  eth0:
    nictype: macvlan
    parent: enp8s0
    type: nic
  root:
    path: /
    pool: tank
    size: 50GB
    type: disk
ephemeral: false
profiles:
- default
- official-image-vm
- anbox-cloud
- macvlan
stateful: false
description: ""

tomp · March 23, 2021, 9:18am

OK so they have different MAC addresses and UUIDs, thats good.

And how are you doing DHCP inside the instances? Can you show the network config that performs the DHCP request?

Also, what DHCP server are you using, is it possible to show the DHCP lease(s) for the instances requested?

enzoaguado · March 23, 2021, 9:32am

If you are talking about the manual renewal I’m running dhclient -r enp5s0; dhclient enp5s0

It should be the router’s DHCP (UDM).
It shows both machines with different mac addresses, but lists them with the same IP. On both machines, the hostname is the same and the uptime is incorrect but the interface is known for being unreliable so hard to know what are router bugs or not

tomp · March 23, 2021, 9:36am

I meant are you using netplan for configuring the interfaces on boot?

Is it possible you’re running out of DHCP leases on your DHCP server?

If you create a fresh instance does the issue not appear?

enzoaguado · March 23, 2021, 9:46am

It’s a standard ubuntu image, so yes netplan is being used:

network:
    version: 2
    ethernets:
        enp5s0:
            dhcp4: true

There are plenty of available IPs, which gets attributed when I manually refresh the lease in the instance.
Creating a fresh instance with the same network configuration does not trigger the issue. The new instance gets a different IP. It’s also worth to note that this doesn’t happen with containers, the issue only affects VMs

cemzafer · March 23, 2021, 9:48am

@enzoaguado, what about if you kill the dhclient process from the instances.

tomp · March 23, 2021, 9:49am

Can you add to your netplan config in each VM:

dhcp-identifier: mac

To rule out issues with DUID.

E.g.

network:
    version: 2
    ethernets:
        enp5s0:
            dhcp4: true
            dhcp-identifier: mac

enzoaguado · March 23, 2021, 9:53am

Adding dhcp-identifier: mac followed by a netplan apply solves the issue. It does renew the DHCP lease with a different IP between all containers
I’ll edit this post in the future if the ip addresses revert back to being the same

enzoaguado · March 23, 2021, 9:54am

It’s the same behavior as asking for a new lease, it does change the IP address for a bit, but a few hours later it reverts back to using the same IP as other VMs

tomp · March 23, 2021, 9:55am

Excellent thanks.

The VMs are supposed to get a new DUID when copied based on the VM’s UUID (which is why I checked they were different), so suggests something has gone wrong their with the DUID regeneration.

@monstermunchkin @stgraber do you know how the DUID regeneration in the VM is supposed to work?

stgraber · March 23, 2021, 12:20pm

I suspect that without the mac stanza in netplan, the VM may rely on /etc/machine-id to derive their DUID. It’d normally be a good idea to clear /etc/machine-id whenever you clone an instance (and if cloud-init is used, I thought it happened automatically through it) as otherwise various bits on the system will use the now duplicated UUID.

tomp · March 23, 2021, 12:21pm

Yeah I thought the copy templates would reset it from the volatile UUID key (which we’ve confirmed is changed on copy).

stgraber · March 23, 2021, 12:24pm

The volatile UUID is exposed to the VM through DMI, it’s not exposed to the container.
We could in theory ship a template for /etc/machine-id which aligns it with volatile.uuid but this may also conflict with what some distros may do to setup that file or with manual user actions…

When cloning systems there are a variety of files that need to be reset, regardless of if the system is a container, VM or physical box, so it’s probably best to leave that to the user or to specialized tools like cloud-init. The usual suspects there are:

machine-id
any lease file in /var/lib/dhcp
host ssh keys in /etc/ssh/

tomp · March 23, 2021, 12:31pm

Yeah in this case its VMs that are having the issue. I thought the DUID was generated from the DMI machine ID, but maybe its written to /etc/machine-id and then not modified on copy.

tomp · March 23, 2021, 12:32pm

@stgraber maybe we should add this to the image template to catch that particular issue at least?

stgraber · March 23, 2021, 12:32pm

Right, the DMI UUID is used as initial seed for virtual machines but is then kept constant through /etc/machine-id possibly because of some systems incorrectly exposing a different identifier every boot.

stgraber · March 23, 2021, 12:35pm

We do already, all distrobuilder images are supposed to have dhcp-identifier set.
I suspect the image used in this case is a cloud image instead, may be worth filing a bug on Launchpad to get the default network-config updated.