And you see the same issue when running dig rather than systemd-resolve - the reason I ask is that if you can recreate with the dig command we can then explore directly querying the other DNS servers in the cluster to try and locate the issue.
Global
LLMNR setting: no
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: no
DNSSEC supported: no
DNSSEC NTA: 10.in-addr.arpa
16.172.in-addr.arpa
168.192.in-addr.arpa
17.172.in-addr.arpa
18.172.in-addr.arpa
19.172.in-addr.arpa
20.172.in-addr.arpa
21.172.in-addr.arpa
22.172.in-addr.arpa
23.172.in-addr.arpa
24.172.in-addr.arpa
25.172.in-addr.arpa
26.172.in-addr.arpa
27.172.in-addr.arpa
28.172.in-addr.arpa
29.172.in-addr.arpa
30.172.in-addr.arpa
31.172.in-addr.arpa
corp
d.f.ip6.arpa
home
internal
intranet
lan
local
private
test
Link 32 (eth0)
Current Scopes: DNS
DefaultRoute setting: yes
LLMNR setting: yes
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: no
DNSSEC supported: no
Current DNS Server: 240.198.210.1
DNS Servers: 240.198.210.1
DNS Domain: lxd
The process of resolving DNS queries in a cluster is as follows:
Container -> local DNSmasq process.
DNSMasq process tries to answer the query for local instances, and for non-local instances forwards to the local forkdns process.
The local forkdns process then forwards the DNS request to each of the remote cluster host’s forkdns processes sequentially trying to get an answer.
When the request is received by a remote forkdns process it inspects the dnsmasq’s leases file on the host and tries to answer the query which is then returned back to the original forkdns process and back to the local dnsmasq process which relays it to the requestor.
So you can see there are a few moving parts there and so using dig will allow you to use the @ notation to query each part directly.
For instance, dig @<lxd host IP> -p 1053 will allow you to query the local and remote forkdns processes directly.
Yep, I’ve got it listening on the correct IP and port. I see /var/snap/lxd/common/lxd/logs/forkdns.lxdfan0.log is empty on all the hosts, is it possible to increase log level perhaps?
We do this to stop loops in requests. When a query is forwarded from dnsmasq to forkdns, the recursion flag is enabled. Then local forkdns strips it and marks recursion as disabled when forwarding to the the remote forkdns process.
The remote forkdns process will only answer non-recursion requests from its local database.
Ok, seems like each forkdns does correctly give the IP for the container running on the host in the last few manual tests I ran. Going to try automate test now
while :; do
output=$(dig minio-01.lxd -p 1053 +norecurse +short A @240.197.151.1)
if [ -z "$output" ]; then echo minio-01.lxd / hetzner-01.lxd fails; fi
output=$(dig minio-02.lxd -p 1053 +norecurse +short A @240.198.210.1)
if [ -z "$output" ]; then echo minio-02.lxd / hetzner-02.lxd fails; fi
output=$(dig minio-03.lxd -p 1053 +norecurse +short A @240.200.13.1)
if [ -z "$output" ]; then echo minio-03.lxd / hetzner-03.lxd fails; fi
done