Ubuntu 18.04 infinite Loop dnsmasq systemd-resolved

Hi
I followed Simos blog on how to add hostname resolve from your host to the container.

It works fine but one issue, it goes on infinite loop between dnsmasq and resolved and uses quite a lot of the cpu.

Anyone has a fix for this?

1 Like

Hi!

You are the second person reporting such a result. It happened to work for me when I tested it but did not keep it running to notice if there was a CPU-load issue.

What’s needed, is to figure out why this issue is really happening so that to find what to do to fix it.

Ok lets do it!

It was a fresh instalation of Xubuntu 18.04 64bit with Luks and ext4.

installed with apt
lxd, strongsan

did had docker but purged it just in case, tried debuging installing dnsmasq daemon, purge it after.

These are my config files

/etc/dnsmasq.d/lxd
server=/lxd/10.146.38.1
bind-interfaces
except-interface=lxdbr0

lxd network
config:
ipv4.address: 10.146.38.1/24
ipv4.nat: “true”
ipv6.address: fd42:82c5:af81:fa56::1/64
ipv6.nat: “true”
description: “”
name: lxdbr0
type: bridge
used_by:

  • /1.0/containers/d0
  • /1.0/containers/d1
  • /1.0/containers/d2
    managed: true
    status: Created
    locations:
  • none

just tried in a fresh system with Xubuntu 18.04 and same thing, just installed LXD package nothing else.
Something to do with network manager maybe?

Already did try out you approach on DNS for LXC containers
but to it didn’t work.
My limitation is not knowing a thing about systemd-resolved or network-manager and how they interact, my only other choice that I am not very fan of is disabling systemd-resolved and installing dnsmasq for the hole system.

dnsmasq has got a loop-detection option but I do not know how to activate it on the LXD side of things.

My understanding of the problem would be that resolved asks dnsmasq, and since dnsmasq also resolves names for the containers it might ask again to resolved and get on a loop, I will wireshark the lxdbr0 port to confirm my suspition.
This does not happens only when do dig d0.lxd, it goes on a loop just by dig google.com or any not cached dns record.

I will search for an option to only query a dns for a specific TLD.

Found something, the loop is an IPv6 query that if it can’t find (because it does not exists) and goes on a infinite loop between 127.0.0.1, 127.0.0.53 and lxdbr0 ip.
Also there are and extra set of initial queries between these 3 of IPv4 with correct answers, so a dns query is made to everyone regardless.

Nice!

I just tried with IPv6 working, and I did not see any extra queries.

So the issue is to figure what types of IPv6 misconfiguration would make this problem appear.

So disable ipv6 on the network-manager?

ok so I found these two bug reports on ubuntu.

https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1694156
https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1672099

one solution provided was to use

no-resolv
bind-interfaces
interface=lo
server=127.0.0.53

Issue is I don’t have dnsmasq installed on my machine, I have dnsmasq-base that comes has a lxd dependency.
I trie every configuration possible in /etc/dnsmasq.d/lxd and it doesnt seem to grab any of it

so how do I solve this? do I try and do some stuff in /var/lib/lxd/networks/lxdbr0/dnsmasq.raw
is this the “correct” config file to mess around or do I install dnsmasq package and mess around in /etc/dnsmasq.d/lxd

dnsmasq --strict-order --bind-interfaces --pid-file=/var/lib/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.146.38.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/lib/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/lib/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.146.38.2,10.146.38.254,1h -s lxd -S /lxd/ --conf-file=/var/lib/lxd/networks/lxdbr0/dnsmasq.raw -u lxd

sorry for being a pain, just been around this for the last couple of days.

I had a look at those two bug reports on Launchpad. The first references an additional report,

which describes a similar issue that we have with LXD but they get this with libvirt.
Here’s the text,

In debugging bug #1694156, I found that ultimately my problem was triggered by a hard-coded /etc/resolvconf/resolv.conf.d/tail I had set once upon a time pointing to my libvirt dnsmasq server. It should not be necessary to manually edit /etc/resolvconf/resolv.conf.d/tail to register dnsmasq; instead, on a system where systemd-resolved is running, libvirt should use the DBUS protocol to register its dnsmasq with systemd-resolved, specifying both SetLinkDNS and SetLinkDomains. This would enable properly-scoped DNS lookups for only the hosts on the libvirt bridge, avoiding any possibility of DNS loops and avoiding the need for manual configuration.

To do this properly, libvirt does need to declare a link domain (SetLinkDomains) that doesn’t conflict with other public DNS, or other non-authoritative DNS that may be configured on the system. I would suggest using just ‘libvirt.’ as a TLD, by default.

For example implementation, please see ./src/dns-manager/nm-dns-systemd-resolved.c:send_updates() in the network-manager source.

It describes how to do this programmatically, which means that there should be a way to interpret as additional command-line options to the configuration we do for LXD.

a quick fix for now would be to add to the this “–dns-loop-detect” to the dnsmasq at startup.

As for the rest, I do not know where to start fixing in LXD, has much has I would like to, it would probably take me more then a few weeks full time.

Should I open a ticket in github?

Here is a similar report regarding K8s,

It explains the issue and it’s relevant to LXD.

Below is the man-page of dnsmasq that explains what --dns-loop-detect does.

–dns-loop-detect

Enable code to detect DNS forwarding loops; ie the situation where a query sent to one of the upstream server eventually returns as a new query to the dnsmasq instance. The process works by generating TXT queries of the form <hex>.test and sending them to each upstream server. The hex is a UID which encodes the instance of dnsmasq sending the query and the upstream server to which it was sent. If the query returns to the server which sent it, then the upstream server through which it was sent is disabled and this event is logged. Each time the set of upstream servers changes, the test is re-run on all of them, including ones which were previously disabled.
Source: Man page of DNSMASQ

I do not think that adding ‘–dns-loop-detect’ would have a detrimental effect to the dnsmasq in LXD.

(The rest is about being able to access programmatically systemd-resolved over DBUS and set the configuration there to fix this.)

Therefore, do open a ticket on github.

As title, you may use something like: Please add --dns-loop-detect option to dnsmasq run by LXD
Mention that you have tried this and it worked for you. Also, point to this discussion.

Thanks for the help, will do :wink:

1 Like

The solution posted by Stuart Langridge on stackexchange worked for me:

lxc network edit lxdbr0:

config:
  ipv4.address: 10.216.134.1/24
  ipv4.nat: "true"
  ipv6.address: none
  ipv6.nat: "true"
  raw.dnsmasq: |
    auth-zone=lxd
    dns-loop-detect
name: lxdbr0
type: bridge

Add the 3 lines starting with raw.dnsmasq.

@simos,

the /lib/systemd/system/lxd-host-dns.service syntax you put in your blogpost didn’t work for me under Ubuntu 18.04. Here is what worked:

[Unit]
Description=LXD host DNS service
After=multi-user.target

[Service]
Type=simple
ExecStart=/usr/local/bin/lxdhostdns_start.sh
RemainAfterExit=true
ExecStop=/usr/local/bin/lxdhostdns_stop.sh
StandardOutput=journal

[Install]
WantedBy=multi-user.target

Notice the Type=simple and After= change.

Thank you for your blog. There are little information about LXD and what you provide is very informative.

Thanks for this.

I updated the post at https://blog.simos.info/how-to-use-lxd-container-hostnames-on-the-host-in-ubuntu-18-04/
to include

  1. the additions of auth-zone=lxd and detect-dns-loop in the LXD managed network interface lxdbr0.
  2. the changes to the systemd service file.

Please report if I missed anything from the blog post or I need to explain something better.

you can do this and it will work out of the box without any services for resolved.

mkdir /etc/systemd/resolved.conf.d
nano /etc/systemd/resolved.conf.d/lxdbr0.conf

and paste this inside

[Resolve]
DNS=10.146.38.1 #the ip of the lxbr0
Domains=lxd

also what i found that work was to add nameserver to raw.dnsmasq server=8.8.8.8 because with the loop-dettect will also block normal query to any other dns, google, ubuntu, centos, etc, havent tried the auth-zone=lxd yet.

1 Like

I tried the solutions in the immediately-previous 2 comments in this page.

@simos, the (permanent, updated) solution that appears in your blogpost failed (on my fresh Ubuntu MATE 18.04 with snap LXD 3.6).

@rudiservo, your 3-line solution seems to work on my (completely fresh, again) system.

@simos, the failure of the “permanent solution” on the blogpost can be seen here (I executed this command right after a boot):

root@ubuntu:~# systemctl status lxd-host-dns.service 
● lxd-host-dns.service - LXD host DNS service
   Loaded: loaded (/etc/systemd/system/lxd-host-dns.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sat 2018-10-20 10:46:39 PDT; 4min 39s ago
  Process: 1745 ExecStart=/usr/local/bin/lxdhostdns_start.sh (code=exited, status=1/FAILURE)
 Main PID: 1745 (code=exited, status=1/FAILURE)

Oct 20 10:46:39 ubuntu systemd[1]: Started LXD host DNS service.
Oct 20 10:46:39 ubuntu lxdhostdns_start.sh[1745]: Device "lxdbr0" does not exist.
Oct 20 10:46:39 ubuntu lxdhostdns_start.sh[1745]: Unknown interface lxdbr0: No such device
Oct 20 10:46:39 ubuntu systemd[1]: lxd-host-dns.service: Main process exited, code=exited, status=1/FAILURE
Oct 20 10:46:39 ubuntu systemd[1]: lxd-host-dns.service: Failed with result 'exit-code'.
root@ubuntu:~# 

@simos, would it be possible please to examine @rudiservo’s solution, if it’s good?

It says that it cannot find a lxdbr0 network interface. Perhaps it is lxdbr1 in your case or something else?

No, that is not the case. See here for proof (operating on the VM that was based on your blogpost). Also, your temporary solution works, just not your permanent one:

user@ubuntu:~$ systemctl status lxd-host-dns.service 
● lxd-host-dns.service - LXD host DNS service
   Loaded: loaded (/etc/systemd/system/lxd-host-dns.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sat 2018-10-20 12:46:19 PDT; 1min 5s ago
  Process: 1645 ExecStart=/usr/local/bin/lxdhostdns_start.sh (code=exited, status=1/FAILURE)
 Main PID: 1645 (code=exited, status=1/FAILURE)

Oct 20 12:46:19 ubuntu systemd[1]: Started LXD host DNS service.
Oct 20 12:46:19 ubuntu lxdhostdns_start.sh[1645]: Device "lxdbr0" does not exist.
Oct 20 12:46:19 ubuntu lxdhostdns_start.sh[1645]: Unknown interface lxdbr0: No such device
Oct 20 12:46:19 ubuntu systemd[1]: lxd-host-dns.service: Main process exited, code=exited, status=1/FAILURE
Oct 20 12:46:19 ubuntu systemd[1]: lxd-host-dns.service: Failed with result 'exit-code'.
user@ubuntu:~$ lxc network list
+--------+----------+---------+-------------+---------+
|  NAME  |   TYPE   | MANAGED | DESCRIPTION | USED BY |
+--------+----------+---------+-------------+---------+
| ens33  | physical | NO      |             | 0       |
+--------+----------+---------+-------------+---------+
| lxdbr0 | bridge   | YES     |             | 1       |
+--------+----------+---------+-------------+---------+
user@ubuntu:~$ 

Actually, even this hogs CPU at 100% after a while. Start pinging .lxd domains, wait 2-3min and start to see the CPU spike.

After having used simos’s solution in his blogpost, together with the detect-dns-loop option, I cannot do nslookup any non-lxd domain from inside an lxd container. I use ubuntu 18.04 server. Is that your experience too? Can you nslookup non-lxd domains from inside an lxd container?

Thanks