DNS operations in tunnel network mode

Lester · June 13, 2019, 9:35am

Hi there,

I’m having occasional issues with DNS resolution of containers hostnames between LXD nodes, and I’m wondering how dnsmasq is supposed to operate in tunnel network mode (GRE).

From my understanding, only one node is supposed to run dnsmasq and allocate IPs. However, one dnsmasq instance is automatically started in every node in my setup (usually happens on lxd or container start). This makes DNS resolution highly unreliable since every container seems to obtain a lease from the local, non-synchronized dnsmasq.

What is the expected dnsmasq operation in this setup? Only one instance or multiple running?
How does dnsmasq and lxd communicate? Is it only via --dhcp-leasefile and --dhcp-hostsfile files and regular restarts?
Is the recent work https://github.com/lxc/lxd/pull/5823 changing the current behavior in any way?

For now, I kill misbehaving dnsmasq instances and containers are happy with it, but I would be very happy to dig into that issue

stgraber · June 13, 2019, 9:46pm

Can you show lxc network list and lxc network show for all managed networks on all nodes?

You indeed shouldn’t be running dnsmasq on the nodes which are attached to the one running dnsmasq.
The normal way to do this is to set both ipv4.address and ipv6.address to none on those nodes, only having the subnet defined on the main one.

This should ensure that DHCP is only served by the main one over GRE and keeps things consistent.

Lester · June 14, 2019, 8:08am

I’m testing in a small environment of 2 nodes, “alice” and “bob”. LXD version 3.13 (snap).
I’m not sure how to set ipv4.address to none for only one node.

alice $ lxc network list
+-----------------+----------+---------+-------------+---------+---------+
|      NAME       |   TYPE   | MANAGED | DESCRIPTION | USED BY |  STATE  |
+-----------------+----------+---------+-------------+---------+---------+
| enx001e0630fa86 | physical | NO      |             | 0       |         |
+-----------------+----------+---------+-------------+---------+---------+
| mytunnel        | bridge   | YES     |             | 14      | CREATED |
+-----------------+----------+---------+-------------+---------+---------+

bob $ lxc network list
+-----------------+----------+---------+-------------+---------+---------+
|      NAME       |   TYPE   | MANAGED | DESCRIPTION | USED BY |  STATE  |
+-----------------+----------+---------+-------------+---------+---------+
| enx001e0630cdc7 | physical | NO      |             | 0       |         |
+-----------------+----------+---------+-------------+---------+---------+
| mytunnel        | bridge   | YES     |             | 14      | CREATED |
+-----------------+----------+---------+-------------+---------+---------+

alice $ lxc network show mytunnel
config:
  ipv4.address: 10.25.10.1/24
  ipv4.dhcp.expiry: 12h
  ipv4.nat: "true"
  ipv6.address: fd42:da74:f51:7a04::1/64
  ipv6.dhcp.expiry: 12h
  ipv6.nat: "true"
  tunnel.bob.local: 192.168.1.8
  tunnel.bob.protocol: gre
  tunnel.bob.remote: 192.168.1.9
  tunnel.alice.local: 192.168.1.9
  tunnel.alice.protocol: gre
  tunnel.alice.remote: 192.168.1.8
description: ""
name: mytunnel
type: bridge
used_by:
<redacted>
managed: true
status: Created
locations:
- alice
- bob

bob $ lxc network show mytunnel
config:
  ipv4.address: 10.25.10.1/24
  ipv4.dhcp.expiry: 12h
  ipv4.nat: "true"
  ipv6.address: fd42:da74:f51:7a04::1/64
  ipv6.dhcp.expiry: 12h
  ipv6.nat: "true"
  tunnel.bob.local: 192.168.1.8
  tunnel.bob.protocol: gre
  tunnel.bob.remote: 192.168.1.9
  tunnel.alice.local: 192.168.1.9
  tunnel.alice.protocol: gre
  tunnel.alice.remote: 192.168.1.8
description: ""
name: mytunnel
type: bridge
used_by:
<redacted>
managed: true
status: Created
locations:
- alice
- bob

Thanks for your support!

stgraber · June 14, 2019, 1:27pm

Oh, you forgot to mention that this is a LXD cluster, that does indeed make this setup a bit more problematic as in clusters the network config is shared on all nodes causing all of them to run dnsmasq…

We’ll need to think about the best way to define which node is acting as router for the network in such setups and then have LXD only operate dnsmasq on that one node.

@tomp something to keep in mind, not yet sure what will end up being our solution for this

tomp · June 14, 2019, 2:12pm

Would it be viable to assign a separate /24 to each node (like the fan does) and then setup routes over the tunnels? That way the dhcp would be local to machine and we can relay dns requests like we do for fan network. Not to mention the benefits of not having a large layer 2 broadcast domain.

stgraber · June 15, 2019, 6:40am

The fan approach certainly avoids a number of issues but it also makes things like live migration or even retaining address during cold migration impossible.

I think there’s value in having a way to setup a virtual shared L2 and this can be made to work just fine currently outside of a cluster environment. Clustering breaks this by requiring that ipv4.address and ipv6.address, … be identical on all nodes. This normally makes sense but we may have to think about how to handle exceptions.

Lester · June 17, 2019, 7:47am

Many thanks for your answers.
I’ve tried to reconfigure the network using the fan bridge, but encountered a similar problem (race condition between multiple dnsmasq answers). My setup is quite special, since I’m using low frequency nodes and high speed network (hence frequent races).

I’ve tried several approaches to fix the issue: with the fan bridge, I’ve added a ebtables rule to drop multicast traffic crossing the network between nodes. This forces dhcp requests to be broadcasted only in local bridges. More over, outbound traffic is no longer redirected to one single “gateway” node, as it was the case with GRE tunnels.

$ sudo ebtables -A FORWARD -p IPv4 -d Multicast -o mytunnel-fan -j DROP

(-p IPv4 used to allow ARP requests passing through. I’ve tried to block only requests on port 67-68 but it does not seem to work properly, I must have missed something)