We build one-shot instances (containers) to test the installability of our software. It runs on clusters, so we create many instances. We’re reaching the physical limit of single hosts, so we’re trying to come up with an LXD cluster. The old setup is as follows:
We have a lxd bridge network called lxd-provision
. LXD provides a dnsmasq instance running there and we use it to provide random IPs to the instances and Ansible runs through it to prepare the nodes, install our soft and it desps, and run soime basic tests.
We also create our_cluster specific networks (meaning, I’m talking about our product’s cluster, not LXD’s) to link the nodes; typically, they have 2 such networks (one backend, one frontend), but the can be anything above one. We also run dnsmasq on one of these networks to provide internal DNS that’s associated to the cluster.
One of the aims of this project is to reproduce customer’s setup, so the networking and DNS setup can get quite complex.
I started building an LXD cluster last week. Initially it all seems easy, specially after watching Stéphane’s videos, but the devil is in the details. First I didn’t have a free NIC to use OVN on top of (FAN was discarded because of the constraints on the networks), so I had to setup the OVN networks in NAT’ed mode so they could speak to each other through the lxd-provision
bridge. This worked, but Ansible was failing because it runs in one of the hosts ans can’t reach the lxd-provision
network on the other hosts.
Thanks to netplan try
I managed to build a bridge on top of the single NIC on each host, son now OVN is not working in NAT’ed mode. This also allowed me to build the lxd-provision
network on top of OVN.
But OVN networks do not come with a dnsmasq instance, so I tried to run one myself. Now I find that the OVN networks are not seen from the hosts, so I can’t run dnsmasq on the hosts to provide the features listed above.
So, what are our options now? Do we have to spin up an instance and (dis)connect it to all the networks just to run dnsmasq on it? Can’t we make the OVN network be reachable from the hosts? I know that will impose some limitations on the networking, but we have been living with it so far.
PS: yes, I know this setup has many SPOFs, but we don’t care much. Since everything is (semi)a automated through some scripts and everything else is handled via a git repo, we can replace those SPOF very quickly if needed.