Serving DNS over OVN networks and accessing the instances from the hosts

mdione · April 21, 2022, 9:17am

We build one-shot instances (containers) to test the installability of our software. It runs on clusters, so we create many instances. We’re reaching the physical limit of single hosts, so we’re trying to come up with an LXD cluster. The old setup is as follows:

We have a lxd bridge network called lxd-provision. LXD provides a dnsmasq instance running there and we use it to provide random IPs to the instances and Ansible runs through it to prepare the nodes, install our soft and it desps, and run soime basic tests.

We also create our_cluster specific networks (meaning, I’m talking about our product’s cluster, not LXD’s) to link the nodes; typically, they have 2 such networks (one backend, one frontend), but the can be anything above one. We also run dnsmasq on one of these networks to provide internal DNS that’s associated to the cluster.

One of the aims of this project is to reproduce customer’s setup, so the networking and DNS setup can get quite complex.

I started building an LXD cluster last week. Initially it all seems easy, specially after watching Stéphane’s videos, but the devil is in the details. First I didn’t have a free NIC to use OVN on top of (FAN was discarded because of the constraints on the networks), so I had to setup the OVN networks in NAT’ed mode so they could speak to each other through the lxd-provision bridge. This worked, but Ansible was failing because it runs in one of the hosts ans can’t reach the lxd-provision network on the other hosts.

Thanks to netplan try I managed to build a bridge on top of the single NIC on each host, son now OVN is not working in NAT’ed mode. This also allowed me to build the lxd-provision network on top of OVN.

But OVN networks do not come with a dnsmasq instance, so I tried to run one myself. Now I find that the OVN networks are not seen from the hosts, so I can’t run dnsmasq on the hosts to provide the features listed above.

So, what are our options now? Do we have to spin up an instance and (dis)connect it to all the networks just to run dnsmasq on it? Can’t we make the OVN network be reachable from the hosts? I know that will impose some limitations on the networking, but we have been living with it so far.

PS: yes, I know this setup has many SPOFs, but we don’t care much. Since everything is (semi)a automated through some scripts and everything else is handled via a git repo, we can replace those SPOF very quickly if needed.

mdione · April 21, 2022, 9:20am

BTW, thanks a lot to tomp for all the support in the IRC channel.

tomp · April 21, 2022, 9:25am

Lets start with understanding why you need a dnsmasq instance rather than relying on OVN’s DNS services (which provides DNS records for each instance NIC, along with external DNS resolution by way of forwarding requests to specified upstream DNS servers)?

mdione · April 21, 2022, 9:32am

We also run dnsmasq on one of these networks to provide internal DNS that’s associated to the cluster.

One of the aims of this project is to reproduce customer’s setup, so the networking and DNS setup can get quite complex.

I don’t need DNS to resolve single instance’s names (although it’s very welcome), but we need to resolve DNS names associated to HTTPS endpoints in our cluster so our tests can run and check that services are answering where they should and to the names they should. These names can even be names that would be resolved by upstream servers, but not to IPs in our instances; f.i., if Canonical was one of our clients and we wanted to test a deployment in your DNS domains, we would need that names like s3-uk.ubuntu.com to be resolved within the instances pointing to them.

mdione · April 21, 2022, 9:34am

hmm, so if I could run a DNS server that could be modified on the fly… I could add the arbitrary names there and forward all traffic there?

mdione · April 21, 2022, 9:44am

This is an example of a ‘simple’ DNS setup, but the names can be completely arbitrary, even outside the ‘base’ domain:

        "raw.dnsmasq" = <<EOF
address=/s3-admin.singlenode.cloudian.eu/10.11.1.151
address=/cmc.singlenode.cloudian.eu/10.11.1.151
address=/iam.singlenode.cloudian.eu/10.11.1.151
address=/s3-sqs.singlenode.cloudian.eu/10.11.1.151
address=/s3-eu-1.singlenode.cloudian.eu/10.11.1.151
address=/the-one.singlenode.cloudian.eu/10.11.1.151
address=/s3-website-eu-1.singlenode.cloudian.eu/10.11.1.151
address=/completely.arbitrary.name.com/10.11.1.151
address=/singlenode/10.11.1.151
EOF

mdione · April 21, 2022, 10:07am

From the docs:

The built-in DNS server supports only zone transfers through AXFR. It cannot be directly queried for DNS records. Therefore, the built-in DNS server must be used in combination with an external DNS server

Sounds like now I have to run a DNS server that can do AXFR and the tell OVN to use that, instead of a cheapo dnsmasq running on the network

tomp · April 21, 2022, 12:35pm

If you don’t mind the names being resovable from all of the OVN networks, then you could use the existing dnsmasq of a LXD managed bridge (i.e lxdbr0) that you’re using as the uplink network for your OVN networks.

Please can you show lxc network ls along with lxc network show <net> for the uplink and ovn networks?

tomp · April 21, 2022, 12:37pm

That is for exporting LXD dns records (including manually added ones) to an upstream DNS server.
In principle you could use that functionality and then setup a dns server that syncs from it to be used for upstream DNS requests from your OVN networks.

mdione · April 21, 2022, 12:43pm

cloudian@uk-lxd2:~/mdione/automation-frameworks/terrible$ lxc network ls
+---------------+----------+---------+-------------------+---------------------------+-------------+---------+---------+
|     NAME      |   TYPE   | MANAGED |       IPV4        |           IPV6            | DESCRIPTION | USED BY |  STATE  |
+---------------+----------+---------+-------------------+---------------------------+-------------+---------+---------+
| br-int        | bridge   | NO      |                   |                           |             | 0       |         |
+---------------+----------+---------+-------------------+---------------------------+-------------+---------+---------+
| c2r2dc3p3n    | ovn      | YES     | 10.254.254.254/24 |                           |             | 4       | CREATED |
+---------------+----------+---------+-------------------+---------------------------+-------------+---------+---------+
| enp2s0        | physical | NO      |                   |                           |             | 0       |         |
+---------------+----------+---------+-------------------+---------------------------+-------------+---------+---------+
| lan0          | bridge   | NO      |                   |                           |             | 1       |         |
+---------------+----------+---------+-------------------+---------------------------+-------------+---------+---------+
| lxd-provision | ovn      | YES     | 10.127.117.1/24   | fd42:2ff9:9655:63c9::1/64 |             | 4       | CREATED |
+---------------+----------+---------+-------------------+---------------------------+-------------+---------+---------+
| lxdovn2       | bridge   | NO      |                   |                           |             | 0       |         |
+---------------+----------+---------+-------------------+---------------------------+-------------+---------+---------+
| ovn-overlay   | physical | YES     |                   |                           |             | 2       | CREATED |
+---------------+----------+---------+-------------------+---------------------------+-------------+---------+---------+

cloudian@uk-lxd2:~/mdione/automation-frameworks/terrible$ lxc network show lxd-provision
config:
  bridge.mtu: "1442"
  ipv4.address: 10.127.117.1/24
  ipv4.dhcp: "true"
  ipv4.nat: "true"
  ipv6.address: fd42:2ff9:9655:63c9::1/64
  ipv6.nat: "true"
  network: ovn-overlay
description: ""
name: lxd-provision
type: ovn
used_by:
- /1.0/instances/c2r2dc3p3n-node1
- /1.0/instances/c2r2dc3p3n-node1
- /1.0/instances/c2r2dc3p3n-node2
- /1.0/instances/c2r2dc3p3n-node2
managed: true
status: Created
locations:
- uk-lxd1
- uk-lxd2.cloudian.com

That’s how I had it at some point:

Or should I just create a bridged network in one of the hosts, then the lxd-provision OVN network ‘on top of it’ (but kinda ‘barely touching it’) in the whole cluster, and keep using that for Ansible (meanwhile, I’m testing using teh LXD connector for Ansible, but I’m hitting issues at another point of the stack, namely, Terraform).

But I don’t need that, it’s just more work to get the same I had before

tomp · April 21, 2022, 12:44pm

lxc network show ovn-overlay please

tomp · April 21, 2022, 12:45pm

I know

mdione · April 21, 2022, 12:46pm

… or I could create a bridged network with dnamasq running on it as I was doing before and then the OVN networks (remember they can be several per our_cluster!) as uplink network? Mind you, these internal networks were never routing outside, the nodes always used the lxd-provider network as default GW and general inet/DNS access.

tomp · April 21, 2022, 12:46pm

I dont know about Ansible, I’m just focusing on the DNS record issue first.

tomp · April 21, 2022, 12:47pm

Yes you could do that. That is what I was getting at, then the OVN networks will use the managed bridge uplink’s dnsmasq for upstream DNS.

mdione · April 21, 2022, 12:48pm

Ansible just needs ssh, or the LXD connector, working from the (primary) host to each instance.

mdione · April 21, 2022, 12:50pm

config:
  ipv4.ovn.ranges: 10.11.12.101-10.11.12.199
  volatile.last_state.created: "false"
description: ""
name: ovn-overlay
type: physical
used_by:
- /1.0/networks/c2r2dc3p3n
- /1.0/networks/lxd-provision
managed: true
status: Created
locations:
- uk-lxd2.cloudian.com
- uk-lxd1

I used lan0 as the parent for this network. That’s a linux bridge on top of the NIC. I think @stgraber will appreciate the phrase “usine à gaz”

tomp · April 21, 2022, 12:52pm

OK so you won’t be able to do that by default because OVN provides private networks.
You would then need to have a per-instance IP or port reachable from the host, which means not using NAT, which contradicts with the original discussion we had on IRC about not needing inbound requests to those instances in the OVN network.

tomp · April 21, 2022, 12:57pm

Using ansible with lxc exec seems like the best way as it avoids networking altogether.

mdione · April 21, 2022, 1:01pm

Right, I must have lost it in translation (I was thinking of HTTP requests from 3rd party nodes, not TCP/IP connections). I also thought OVN networks would be reachable from the hosts that, well, host them.