Broken dns, dnsmasq.leases

votsalo · January 16, 2022, 1:25pm

I have a broken lxd dns situation where if a dig or ping a.lxd from another container , I get the wrong ip address. When I look at the dnsmasq.leases file, the last field of the problem container (a) is ‘*’:

cat /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases
1642337775 00:16:3e:c0:70:b3 10.79.171.31 ssa 01:00:16:3e:c0:70:b3
1642337765 00:16:3e:06:03:6a 10.79.171.57 haproxy 01:00:16:3e:06:03:6a
1642337966 00:16:3e:15:da:52 10.79.171.73 a *
duid 00:01:00:01:24:db:c9:92:96:00:00:2d:0e:8a

(I replaced the real container name with “a”, for posting this).

The ip address for container a is correct in the file above, but it resolves as 10.79.171.86 from other containers, which can therefore not reach a.lxd by name. As a workaround, I put the correct ip address in the /etc/hosts file of the container that needs to use it. 10.79.171.86 used to be the ip address of the container, until it got changed somehow.

How can I fix this? Rebooting didn’t help.

What does the last field in the LXD dnsmasq.leases file mean, and why would it ever be *? In another LXD host, there are no containers with * in that field.

Can I edit the file by hand and replace ‘*’ with what I guess it should be: 01:00:16:3e:15:da:52 ? And then reboot?

I got into this situation by copying a snapshot of the container, deleting the copy, and copying it again. The second time, I set its hardware address to the same value it had after the first copy, so that it would get the same ip address. The steps are something like this, but they were done with the LXD API, except for lxc copy:

lxc snapshot a 20220115143000
lxc copy a/20220115143000 b
lxc start b
… test something in b
… find out hwaddr of b
lxc stop b
lxc delete b
lxc copy a/20220115143000 b
… set volatile.eth0.hwaddr of b to the value it had after the first copy.
lxc start b

I’ve been using this method successfully to rebuild a copy of a container snapshot, while keeping the same ip address that it (the copy) had before. But this time it seems to have corrupted the LXD DNS state somehow.