.lxd domain name does not work for a container

I have a container that is no longer accessible via its .lxd domain name.
I investigated and found out that the LXD dhcp server lists this container with two different hwaddr.

Using ip -4 link in a container shell, I find that its hwaddr is 00:16:3e:67:78:71
lxc network list-leases lxdbr0 -f compact | grep 00:16:3e:67:78:71 shows:

	  *                     00:16:3e:67:78:71  10.131.182.42                           DYNAMIC  
	  d9-67                 00:16:3e:67:78:71  fd42:7a0b:c86c:b8b8:216:3eff:fe67:7871  DYNAMIC  

Note the ‘*’ in the HOSTNAME column. Other containers show the expected hostname in both the ipv4 and ipv6 rows. The name d9-67 does not appear anwhere else in the list-leases output.

/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts/d9-67.eth0 has a different hwaddr:
00:16:3e:a4:a5:fc,d9-67

This happened when I deleted and recreated the container, with the same name.
In order to keep the same ip address it had before, I configured the new container to have the hwaddr of the old container. I have been doing this regularly with most of my containers, and it usually works (the new container gets the ip addresses of the old container).

How can I fix this, and why did it happen? Is it wrong to reuse an old hwaddr the way I did?

Do you have exact reproducer steps I can try to get dnsmasq into this state?

Thanks

Unfortunately no, and this is a production container that I don’t want to disrupt too much.
I’ll see if I can reproduce the steps that I took on a test LXD install.

In the meantime, is it safe to fix or remove the incorrect dnsmasq.hosts/ file?
lxc list shows the other hwaddr:

lxc list -f compact -c n,volatile.eth0.hwaddr d9-67
  NAME   VOLATILE ETH0 HWADDR  
  d9-67  00:16:3e:67:78:71

I will probably delete the container and then recreate it without preserving the hwaddr. It will get a new ip address, but hopefully it will get a working dns name. All I need is to make sure a haproxy container can connect to this and other containers by name.

I fixed it by deleting and recreating the container without preserving its hwaddr. It got a new ip address and can be accessed by its .lxd hostname.

I also wrote a test that launches ~100 containers, deletes them, launches them again with their previous hwaddr and pings them from another container by name. It didn’t cause a problem, after a few repetitions of the rebuilds. I might mix container creation without preserving hwaddr. If the problem is hwaddr collision, it may take a long time to detect it by testing. It seems that LXD keeps the first 3 bytes of the hwaddr fixed, so this leaves the other 3 bytes to avoid collisions (16 million distinct hw addresses). I would need to rebuild millions of containers to have a good chance of getting a duplicate hwaddr. It would be more productive to look at the LXD code.

1 Like