Using the currently latest lxd v5.6-794016a rev 23680, I have the situation that virtual machines created with the lxc copy command are getting the same ipv4 address as the source vm.
In fact, dnsmasq seems to just provide the same ipv4-address to the last-recently-started vm, overriding the existing lease with its correct MAC address.
Am I missing some steps in the cloning workflow or is that a current misbehavior in lxd/dnsmasq?
Can I do something to fix/workaround this, so I can keep the cloning operations automated?
Quick facts:
source & destination vms have different MAC addresses
it doesn’t matter if source is a vm instance or a vm snapshot
it doesn’t matter if source instance is stopped or running at copy time
Detailed tests
initial state:
lxc list sqlnode2
+----------+---------+--------------------------+------+-----------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+----------+---------+--------------------------+------+-----------------+-----------+
| sqlnode2 | RUNNING | 192.168.111.106 (enp5s0) | | VIRTUAL-MACHINE | 2 |
+----------+---------+--------------------------+------+-----------------+-----------+
grep sqlnode2 /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases
1665826039 00:16:3e:31:ec:db 192.168.111.106 sqlnode2 ff:49:72:1f:47:00:02:00:00:ab:11:d1:ae:8f:7f:50:7e:b1:88
cat /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts/sqlnode2*
00:16:3e:31:ec:db,sqlnode2
lxc exec sqlnode2 -- ip address show dev enp5s0
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:16:3e:31:ec:db brd ff:ff:ff:ff:ff:ff
inet 192.168.111.106/24 brd 192.168.111.255 scope global dynamic enp5s0
valid_lft 3471sec preferred_lft 3471sec
inet6 fe80::216:3eff:fe31:ecdb/64 scope link
valid_lft forever preferred_lft forever
After lxc copy --instance-only sqlnode2 sqlnode21 :
lxc config get sqlnode2 volatile.eth0.hwaddr
00:16:3e:31:ec:db
lxc config get sqlnode21 volatile.eth0.hwaddr
00:16:3e:f7:39:36
cat /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts/sqlnode2*
00:16:3e:f7:39:36,sqlnode21
00:16:3e:31:ec:db,sqlnode2
After starting the newly created sqlnode21:
lxc list sqlnode2
+-----------+---------+--------------------------+------+-----------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+-----------+---------+--------------------------+------+-----------------+-----------+
| sqlnode2 | RUNNING | 192.168.111.106 (enp5s0) | | VIRTUAL-MACHINE | 2 |
+-----------+---------+--------------------------+------+-----------------+-----------+
| sqlnode21 | RUNNING | 192.168.111.106 (enp5s0) | | VIRTUAL-MACHINE | 0 |
+-----------+---------+--------------------------+------+-----------------+-----------+
lxc exec sqlnode21 -- ip address show dev enp5s0
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:16:3e:f7:39:36 brd ff:ff:ff:ff:ff:ff
inet 192.168.111.106/24 brd 192.168.111.255 scope global dynamic enp5s0
valid_lft 3554sec preferred_lft 3554sec
inet6 fe80::216:3eff:fef7:3936/64 scope link
valid_lft forever preferred_lft forever
grep sqlnode2 /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases
1665827202 00:16:3e:f7:39:36 192.168.111.106 sqlnode21 ff:49:72:1f:47:00:02:00:00:ab:11:d1:ae:8f:7f:50:7e:b1:88
In the last output, we see that dnsmasq just provided the ipv4 address 192.168.111.106 to the new vm
I’ve just tested this using images:ubuntu/jammy (which doesn’t have cloud-init installed and doesn’t exhibit the problem) and images:ubuntu/jammy/cloud (which has cloud-init installed and does exhibit the problem).
So this seems to be an issue with cloud-init not regenerating the machine-id.
You can see from the configs you pasted above that LXD is doing the right thing as both VMs have different:
volatile.uuid - which is provided to QEMU to set machine ID.
volatile.cloud-init.instance-id - which should be used by cloud-init to trigger regenerating configs if it changes.
volatile.eth0.hwaddr - MAC address, which if the DHCP client used this as its identifier would exhibit the issue.
This appears to be the same issue as:
Which then links to:
Perhaps you could post on there with your use case.
Thx very much @tomp for your investigations!
I’ll try to switch to the non-cloud image for now, but I’ll add my use-case to the cloud-init issue as well.
This means that the DHCP client uses /etc/machine-id for its identifier (which is not changing between copies inside the guest).
In comparison the images:ubuntu/22.04 image from the LXD project uses dhcp-identifier: mac and does not experience this issue.
Now, LXD does generate and pass a different UUID to QEMU as the VM identifier, and this appears to be being used inside the guest initially for /etc/machine-id, but during a copy, despite the UUID being passed to the VM changing, it doesn’t appear to trigger a refresh of the /etc/machine-id file.
@Chad_Smith do you know if cloud-init should refresh /etc/machine-id if the QEMU UUID changes?