Intermittent failure to resolve hostnames on OVN

I’m trying to understand an intermittent DNS failure I’m seeing on an OVN-backed Incus cluster, and whether this is expected behavior or a misconfiguration on my end.

I’m under the impression that OVN works by intercepting UDP DNS requests headed to the DNS server and respond with its own response.

I’m running a 3-node Incus cluster (all hosts and affected instances on Debian trixie). The upstream DNS server is outside the OVN network. Instances use systemd-resolved.

What I’m seeing intermittently is:

  • Instance-to-instance name resolution for .incus names suddenly starts returning NXDOMAIN.
  • The NXDOMAIN result is then cached by systemd-resolved.
  • Once cached, instances fail to connect to each other by name until I manually run: resolvectl flush-caches.

My questions are:

  1. Is this behavior expected given OVN’s UDP-only DNS interception model?
  2. Is there a recommended configuration to avoid negative caching of .incus records in this setup (for example, exposing an authoritative DNS service for .incus over both UDP and TCP)?
  3. Or is this a case where relying on OVN DNS interception for .incus names is not intended to be robust with systemd-resolved?

Details

All relevant instances are on the same OVN network:

caddy      10.22.22.100  (OVN)
pocket-id  10.22.22.114  (OVN)
lldap      10.22.22.103  (OVN)

(Networking works by IP; the issue is name resolution.)


Observed failure

From one instance (caddy), name resolution for another instance intermittently fails:

incus exec caddy -- ping -c3 pocket-id
ping: pocket-id: Temporary failure in name resolution

At the same time, resolution for another .incus name succeeds:

incus exec caddy -- ping -c3 lldap
PING lldap.incus (10.22.22.103) ...
64 bytes from 10.22.22.103: icmp_seq=1 ttl=64 time=2.95 ms

(This shows OVN connectivity is fine and .incus resolution is not globally broken.)


Resolver state during failure

Querying the resolver directly shows NXDOMAIN for the failing name:

incus exec caddy -- resolvectl query pocket-id
pocket-id: Name 'pocket-id' not found

While a working name is served from cache:

incus exec caddy -- resolvectl query lldap
lldap: 10.22.22.103 -- link: eth0
       (lldap.incus)

-- Data from: cache

Proof of negative caching

The NXDOMAIN result appears to be cached by systemd-resolved:

incus exec caddy -- resolvectl statistics
incus exec caddy -- resolvectl query pocket-id
incus exec caddy -- resolvectl statistics

Relevant delta:

Cache Hits:   1340 → 1342
Cache Misses: 441  → 441

(This indicates the NXDOMAIN is being served from cache, with no new upstream lookup.)


Resolver configuration context

incus exec caddy -- resolvectl status
Current DNS Server: 192.168.1.2
DNS Domain: incus

(The upstream DNS server is outside the OVN network.)


Cache flush immediately resolves the issue

incus exec caddy -- resolvectl flush-caches
incus exec caddy -- ping -c3 pocket-id
PING pocket-id.incus (10.22.22.114) ...
64 bytes from 10.22.22.114: icmp_seq=1 ttl=64 time=1.92 ms

(This demonstrates direct causality: cached NXDOMAIN → failure; cache flush → resolution restored.)

Yeah, you’re correct that OVN implements DNS by generating flow rules which catch specific UDP DNS queries and returns a pre-determined response.

TCP DNS queries bypass this mechanism, so a caching resolver performing both UDP and TCP queries would indeed end up with an inconsistent cache.

Given OVN’s DNS acts as some kind of overlay, anything that doesn’t have a flow rule matching it will head straight for the upstream DNS server which will then determine the response, whether it be NXDOMAIN or something else.

You may be able to tweak your resolver’s configuration to only do UDP queries and to not cache NXDOMAIN for that particular domain, but having to configure that in every instance will get annoying.

For production environments, you’re usually better off using network zones with an external authoritative DNS server, then either serving those zones publicly or having your local recursive DNS server handle those zones.