BUG: lxc copy virtual-machines get the same ipv4 address from dnsmasq

Using the currently latest lxd v5.6-794016a rev 23680, I have the situation that virtual machines created with the lxc copy command are getting the same ipv4 address as the source vm.

In fact, dnsmasq seems to just provide the same ipv4-address to the last-recently-started vm, overriding the existing lease with its correct MAC address.

Am I missing some steps in the cloning workflow or is that a current misbehavior in lxd/dnsmasq?
Can I do something to fix/workaround this, so I can keep the cloning operations automated?

Quick facts:

  • source & destination vms have different MAC addresses
  • it doesn’t matter if source is a vm instance or a vm snapshot
  • it doesn’t matter if source instance is stopped or running at copy time

Detailed tests

initial state:

lxc list sqlnode2
+----------+---------+--------------------------+------+-----------------+-----------+
|   NAME   |  STATE  |           IPV4           | IPV6 |      TYPE       | SNAPSHOTS |
+----------+---------+--------------------------+------+-----------------+-----------+
| sqlnode2 | RUNNING | 192.168.111.106 (enp5s0) |      | VIRTUAL-MACHINE | 2         |
+----------+---------+--------------------------+------+-----------------+-----------+

grep sqlnode2 /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases
1665826039 00:16:3e:31:ec:db 192.168.111.106 sqlnode2 ff:49:72:1f:47:00:02:00:00:ab:11:d1:ae:8f:7f:50:7e:b1:88

cat /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts/sqlnode2*
00:16:3e:31:ec:db,sqlnode2

lxc exec sqlnode2 -- ip address show dev enp5s0
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:16:3e:31:ec:db brd ff:ff:ff:ff:ff:ff
    inet 192.168.111.106/24 brd 192.168.111.255 scope global dynamic enp5s0
       valid_lft 3471sec preferred_lft 3471sec
    inet6 fe80::216:3eff:fe31:ecdb/64 scope link
       valid_lft forever preferred_lft forever

After lxc copy --instance-only sqlnode2 sqlnode21 :

lxc config get sqlnode2 volatile.eth0.hwaddr
00:16:3e:31:ec:db

lxc config get sqlnode21 volatile.eth0.hwaddr
00:16:3e:f7:39:36

cat /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts/sqlnode2*
00:16:3e:f7:39:36,sqlnode21
00:16:3e:31:ec:db,sqlnode2

After starting the newly created sqlnode21:

lxc list sqlnode2
+-----------+---------+--------------------------+------+-----------------+-----------+
|   NAME    |  STATE  |           IPV4           | IPV6 |      TYPE       | SNAPSHOTS |
+-----------+---------+--------------------------+------+-----------------+-----------+
| sqlnode2  | RUNNING | 192.168.111.106 (enp5s0) |      | VIRTUAL-MACHINE | 2         |
+-----------+---------+--------------------------+------+-----------------+-----------+
| sqlnode21 | RUNNING | 192.168.111.106 (enp5s0) |      | VIRTUAL-MACHINE | 0         |
+-----------+---------+--------------------------+------+-----------------+-----------+

lxc exec sqlnode21 -- ip address show dev enp5s0
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:16:3e:f7:39:36 brd ff:ff:ff:ff:ff:ff
    inet 192.168.111.106/24 brd 192.168.111.255 scope global dynamic enp5s0
       valid_lft 3554sec preferred_lft 3554sec
    inet6 fe80::216:3eff:fef7:3936/64 scope link
       valid_lft forever preferred_lft forever

grep sqlnode2 /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases
1665827202 00:16:3e:f7:39:36 192.168.111.106 sqlnode21 ff:49:72:1f:47:00:02:00:00:ab:11:d1:ae:8f:7f:50:7e:b1:88

In the last output, we see that dnsmasq just provided the ipv4 address 192.168.111.106 to the new vm

Ok. the root cause seems to be that the dhcp-client-identifier (last column in dnsmasq.leases file) is still the same for both source and copied vm.

As I’m using a Debian os inside of the VM, it seems to come from the /etc/machine-id file, which indeed has the sam content on the both vms.

So I’m currently trying to find a way to fixed that. I’ve tried following approaches so far, but without success yet:

My solution/workaround so far is to use the current workflow:

clone vm

lxc copy --instance-only sqlnode2 sqlnode21

regenerate machine-id on new vm

lxc exec sqlnode21 -- rm -v /var/lib/dbus/machine-id /etc/machine-id
lxc exec sqlnode21 -- dbus-uuidgen --ensure
lxc exec sqlnode21 -- systemd-machine-id-setup
lxc exec sqlnode21 -- grep "[a-z]" /var/lib/dbus/machine-id /etc/machine-id

start source vm first

so it can get back its original ipv4 address:

lxc start sqlnode2

wait for leases file to be updated:

while true; do grep " sqlnode2 " /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases && break;sleep 5;done

restart new vm

it will now get its own ipv4 address:

lxc restart sqlnode21

Can you show lxc config show <instance> --expanded for source and copy please.

here it is for the source:

architecture: x86_64
config:
  environment.TZ: Europe/Vienna
  image.architecture: amd64
  image.description: Debian bullseye amd64 (20220715_06:06)
  image.os: Debian
  image.release: bullseye
  image.serial: "20220715_06:06"
  image.type: disk-kvm.img
  image.variant: default
  limits.cpu: "4"
  limits.memory: 8GB
  user.user-data: |
    ssh_pwauth: yes
    users:
      - name: automation
        passwd: 'xxx'  # Use a pw in /etc/shadow format!
        lock_passwd: false
        groups: lxd
        shell: /bin/bash
        sudo: ALL=(ALL) NOPASSWD:ALL
    # autogrow root partition
    growpart:
      mode: auto
      devices: ['/']
      ignore_growroot_disabled: false
  volatile.base_image: b6c7a8f75b2cabe42c8bbf13ebc2c364005327e1a18c6dc61fe08a2e43a4fdbf
  volatile.cloud-init.instance-id: 2d90ef76-5240-41e4-99e9-abc2b4dcc840
  volatile.eth0.host_name: tap91b7fa4b
  volatile.eth0.hwaddr: 00:16:3e:31:ec:db
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: 630aa3f5-1f39-4be9-b88d-f19d1494aa90
  volatile.vsock_id: "49"
devices:
  config:
    source: cloud-init:config
    type: disk
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    size: 100GB
    type: disk
ephemeral: false
profiles:
- vms
stateful: false
description: ""

And here for the cloned vm:

architecture: x86_64
config:
  environment.TZ: Europe/Vienna
  image.architecture: amd64
  image.description: Debian bullseye amd64 (20220715_06:06)
  image.os: Debian
  image.release: bullseye
  image.serial: "20220715_06:06"
  image.type: disk-kvm.img
  image.variant: default
  limits.cpu: "4"
  limits.memory: 8GB
  user.user-data: |
    ssh_pwauth: yes
    users:
      - name: automation
        passwd: 'xxx'  # Use a pw in /etc/shadow format!
        lock_passwd: false
        groups: lxd
        shell: /bin/bash
        sudo: ALL=(ALL) NOPASSWD:ALL
    # autogrow root partition
    growpart:
      mode: auto
      devices: ['/']
      ignore_growroot_disabled: false
  volatile.base_image: b6c7a8f75b2cabe42c8bbf13ebc2c364005327e1a18c6dc61fe08a2e43a4fdbf
  volatile.cloud-init.instance-id: d6a536f4-f9ff-472c-b69f-364b250e48a5
  volatile.eth0.host_name: tap8ec6a839
  volatile.eth0.hwaddr: 00:16:3e:3b:48:1e
  volatile.last_state.power: RUNNING
  volatile.uuid: 1448c33e-1f3f-4434-a3fe-11ea2d776fd1
  volatile.vsock_id: "60"
devices:
  config:
    source: cloud-init:config
    type: disk
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    size: 100GB
    type: disk
ephemeral: false
profiles:
- vms
stateful: false
description: ""

I’ve just tested this using images:ubuntu/jammy (which doesn’t have cloud-init installed and doesn’t exhibit the problem) and images:ubuntu/jammy/cloud (which has cloud-init installed and does exhibit the problem).

So this seems to be an issue with cloud-init not regenerating the machine-id.

You can see from the configs you pasted above that LXD is doing the right thing as both VMs have different:

  • volatile.uuid - which is provided to QEMU to set machine ID.
  • volatile.cloud-init.instance-id - which should be used by cloud-init to trigger regenerating configs if it changes.
  • volatile.eth0.hwaddr - MAC address, which if the DHCP client used this as its identifier would exhibit the issue.

This appears to be the same issue as:

Which then links to:

Perhaps you could post on there with your use case.

Thx very much @tomp for your investigations!
I’ll try to switch to the non-cloud image for now, but I’ll add my use-case to the cloud-init issue as well.

1 Like