Ubuntu 18.04 infinite Loop dnsmasq systemd-resolved

Hi guys

I’ve been heavily tinkering with this during the last month - basically how to configure ubuntu-18.04 lxdHost so it can resolve the container-names from dnsmasq. The new thing in 18.04 (and also some 17.xx) is that it uses the new systemd-resolved, which has to be configured to “also use” dnsmasq.

I think I found the solution, but am still trying to organize the info how I made the tests, and the results, to be able to share here in the forum. I’ve made the tests on ub18.04, with ansible (automates ssh execution on remote ips) so it can be easily changed and reproduced. I plan to clean-up the tests-code and then upload it all to github, so others can repeat and extend it as needed. Still ongoing this part.

In any case, the final conclusions were around having the following config done in the lxcHost, to resolve the containers connected to lxdbr0:

set -o errexit
set -o pipefail
set -o nounset
                                                                                                                                                               
[[ -r /etc/systemd/resolved.conf.d/lxdbr0.conf ]] && exit 0
                                                                                                                                                               
# Config systemd-resolve, via drop-in file /etc/systemd/resolved.conf.d/lxdbr0.conf (better than via global /etc/systemd/resolved.conf)
# to also use the lxdbr0-dnsmasq as an additional dns
                                                                                                                                                               
DNSMASQ_LISTENNING_IP=$(/snap/bin/lxc network get lxdbr0 ipv4.address | sed 's_/.*__g')
  # 10.99.99.1

mkdir -p /etc/systemd/resolved.conf.d

cat <<EOT > /etc/systemd/resolved.conf.d/lxdbr0.conf
[Resolve]
DNS=${DNSMASQ_LISTENNING_IP} 8.8.8.8
  # ${DNSMASQ_LISTENNING_IP} - the ip of the lxbr0, where dnsmasq for lxdbr0 is listening
  #
  # 8.8.8.8    - a usable internet dns server
  #              Because setting this DNS= option will disables the implicit
  #              default-fallback dns (that is only in effect when DNS= was never defined)
  #              So now that we define DNS= ,the default-fallback dns will not be used
  #              anymore and so we should also provide a usable internet dns server
  #              as for example, the internet-public-dns of google 8.8.8.8

#Domains=lxd
  # When this option is set, "nslookup C1" and "nslookup C1.lxd" will both work
  # When this option is not set, "nslookup C1" will not work, and only "nslookup C1.lxd" will work
  # I prefer to use fqdn C1.lxd to avoid possible confusions
EOT

# Apply config changes 
systemctl restart systemd-resolved.service
                                                                                                                                                               
# Ugly hack: restart systemd-resolved.service again after a short time
# It seems that after the previous restart, sometimes serviced-resolved becomes aware of xxx.lxd and work normally.
# But sometimes (many times) it will not yet resolve the xxx.lxd domains consistently (or it will work for a while and then start failing) but 
# after a second restart, it seems to then work correctly consistently and resolve xxx.lxd as expected (and without fails)
# So I'll just add here a "safeguard" second-restart, even though it should not be necessary and is a ugly-hack
# My personal guess, is that maybe there is some minor bug inside the systemd-resolved.service in its current version. Its just a crazy unfounded guess though
sleep 10 ; systemctl restart systemd-resolved.service

After the configuration is done, it can be tested with the following:

set -o errexit
set -o pipefail
set -o nounset
#
# Check changes made
cat /etc/systemd/resolved.conf.d/lxdbr0.conf
#journalctl -xeu systemd-resolved.service
systemd-resolve --status
                                                                                                                                                        
# Check resolution works both for xxx.lxd and internet hostnames
# For that, we will use www.google.com, and also create a group of ephemeral alpine containers to resolve their names
systemd-resolve www.google.com
TmpTestContainerList=$(echo ResolveTmpTest{1..10})
echo "${TmpTestContainerList}" | xargs -n1 /snap/bin/lxc launch --ephemeral images:alpine/edge
sleep 10 ; lxc list
for container_running in $(/snap/bin/lxc list -c=ns4 --format=csv | grep RUNNING | cut -f1 -d,); do 
  # C1
  systemd-resolve ${container_running}.lxd
done
echo "${TmpTestContainerList}" | xargs -n1 /snap/bin/lxc stop 
sleep 10 ; lxc list
                                                                                                                                                        
####### NOTES
## NOTE1: 
##   When LXD snap is disabled and then reenabled, its then also necessary to restart systemd-resolved.service to make resolution of xxx.lxd work again
##   This also happens if lxd is disabled, then the lxcHost is rebooted, and the lxd is enabled again - it will then be necessary to 
##   "systemctl restart systemd-resolved" for resolution of .lxd to work again
#
#       ub@lxcHost:~$ nslookup C1.lxd
#       Server:         127.0.0.53
#       Address:        127.0.0.53#53
#       
#       Non-authoritative answer:
#       Name:   C1.lxd
#       Address: 10.99.99.209
#       
#       ub@lxcHost:~$
#       ub@lxcHost:~$ sudo snap disable lxd
#       lxd disabled
#       ub@lxcHost:~$ nslookup C1.lxd
#       Server:         127.0.0.53
#       Address:        127.0.0.53#53
#       
#       ** server can't find C1.lxd: NXDOMAIN
#       
#       ub@lxcHost:~$ sudo snap enable lxd
#       lxd enabled
#       ub@lxcHost:~$
#       ub@lxcHost:~$ nslookup C1.lxd
#       Server:         127.0.0.53
#       Address:        127.0.0.53#53
#       
#       ** server can't find C1.lxd: NXDOMAIN
#       
#       ub@lxcHost:~$ sudo systemctl restart systemd-resolved
#       ub@lxcHost:~$ nslookup C1.lxd
#       Server:         127.0.0.53
#       Address:        127.0.0.53#53
#       
#       Non-authoritative answer:
#       Name:   C1.lxd
#       Address: 10.99.99.209
#       ** server can't find C1.lxd: NXDOMAIN
#       
#       ub@lxcHost:~$ nslookup C1.lxd
#       Server:         127.0.0.53
#       Address:        127.0.0.53#53
#       
#       Non-authoritative answer:
#       Name:   C1.lxd
#       Address: 10.99.99.209
#       
#       ub@lxcHost:~$ nslookup C1.lxd
#       Server:         127.0.0.53
#       Address:        127.0.0.53#53
#       
#       Non-authoritative answer:
#       Name:   C1.lxd
#       Address: 10.99.99.209

Basically I got the impression that the documentatino for systemd-resolved is not yet all-as-good-as-it-should-be for something of this importance, but can be done.

Also, when a dns is defined to systemd-resolved, then the default-implicit-invisile-publicDnss are not used anymore, and so we need to add also 8.8.8.8 (for example). On the other hand, didnt detect any problems with high-cpu, but did noticed (and commented about it) that under certain situations systemd-resolved needs to be restarted for the changes to be applied (hopefully interacting via DBUS as simo mentioned, would work better and avoid it… but in any case, should work this way too without restarts, but reality is that is does need to be restarted in some situations :confused:

Trying to get some free time to clean it up and then upload, to share the details and hopefully make it possible for others to test it as needed.

Just saw now that there was some activity in this thread, and wanted to share this upfront.

Br

Hi @rudiservo, I would like to enable DNS for a 3 node cluster with a FAN network?
Any further steps I should do to enable your solution on my 3 nodes?
Thank you!