I have a container that is no longer accessible via its .lxd domain name.
I investigated and found out that the LXD dhcp server lists this container with two different hwaddr.
Using ip -4 link in a container shell, I find that its hwaddr is 00:16:3e:67:78:71 lxc network list-leases lxdbr0 -f compact | grep 00:16:3e:67:78:71 shows:
Note the ‘*’ in the HOSTNAME column. Other containers show the expected hostname in both the ipv4 and ipv6 rows. The name d9-67 does not appear anwhere else in the list-leases output.
/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts/d9-67.eth0 has a different hwaddr:
00:16:3e:a4:a5:fc,d9-67
This happened when I deleted and recreated the container, with the same name.
In order to keep the same ip address it had before, I configured the new container to have the hwaddr of the old container. I have been doing this regularly with most of my containers, and it usually works (the new container gets the ip addresses of the old container).
How can I fix this, and why did it happen? Is it wrong to reuse an old hwaddr the way I did?
Unfortunately no, and this is a production container that I don’t want to disrupt too much.
I’ll see if I can reproduce the steps that I took on a test LXD install.
In the meantime, is it safe to fix or remove the incorrect dnsmasq.hosts/ file?
lxc list shows the other hwaddr:
lxc list -f compact -c n,volatile.eth0.hwaddr d9-67
NAME VOLATILE ETH0 HWADDR
d9-67 00:16:3e:67:78:71
I will probably delete the container and then recreate it without preserving the hwaddr. It will get a new ip address, but hopefully it will get a working dns name. All I need is to make sure a haproxy container can connect to this and other containers by name.
I fixed it by deleting and recreating the container without preserving its hwaddr. It got a new ip address and can be accessed by its .lxd hostname.
I also wrote a test that launches ~100 containers, deletes them, launches them again with their previous hwaddr and pings them from another container by name. It didn’t cause a problem, after a few repetitions of the rebuilds. I might mix container creation without preserving hwaddr. If the problem is hwaddr collision, it may take a long time to detect it by testing. It seems that LXD keeps the first 3 bytes of the hwaddr fixed, so this leaves the other 3 bytes to avoid collisions (16 million distinct hw addresses). I would need to rebuild millions of containers to have a good chance of getting a duplicate hwaddr. It would be more productive to look at the LXD code.