Debian Sid LXD DNS won't resolve


(fosslinux) #1

Hi LXDers!

I installed LXD 3.9 on my Debian Sid system (fully updated), via snap yesterday. Everything was working fine. I turned off my system for the night, no updates or anything.

I booted my system this morning, ran lxd exec testing -- bash and ran apt update. I got the following error:

Err:1 http://security.ubuntu.com/ubuntu bionic-security InRelease        
  Temporary failure resolving 'security.ubuntu.com'
Err:2 http://archive.ubuntu.com/ubuntu bionic InRelease                  
  Temporary failure resolving 'archive.ubuntu.com'
Err:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:4 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
  Temporary failure resolving 'archive.ubuntu.com'

Obviously, this means that there is an issue with DNS.

This would not work on any container. To attempt to debug it further, I attempted to create a fresh container to reproduce the problem, however I got:

➜ lxc launch ubuntu:18.04
Creating the container
Error: Failed container creation: Get https://cloud-images.ubuntu.com/releases/streams/v1/index.json: lookup cloud-images.ubuntu.com on [::1]:53: read udp [::1]:45913->[::1]:53: read: connection refused

(this command was run from the host).

Is this LXD that is making the problem?

snap is working fine. snap install vscode worked.


(Stéphane Graber) #2

Can you show:

  • ls -lh /etc
  • ls -lh /proc/$(pgrep daemon.start)/root/etc/

This kind of problem usually happens due to /etc/resolv.conf pointing somewhere we’ve not seen before, causing the snap environment having a broken /etc/resolv.conf


(fosslinux) #3

https://pastebin.com/J7su687q


(fosslinux) #4

hmm, $(pgrep daemon.start) dosent return anything


(fosslinux) #5

but yeah, resolv.conf points to /run/NetworkManager/resolv.conf.


(Stéphane Graber) #6

Hmm, /run/NetworkManager is one of those that we do support, so this is a bit odd.

Can you show ps fauxww so I can find a pid suitable to go see what’s going on inside the snap.


(fosslinux) #7

@stgraber sorry for the delay, internet went out.

I just rebooted, same error. Here’s a fresh ps fauxww after opening Firefox, gnome-terminal, and running lxc exec dns -- bash and lxc launch ubuntu:18.04.

https://ttm.sh/W3.txt

It’s a bit strange, there dosen’t seem to be any snap/lxd processes.


(fosslinux) #8

@stgraber I found a very hacky solution.

cd /snap/lxd
cp -r 9919 9920
vim 9920/etc/resolv.conf # add nameserver 1.1.1.1
rm current
ln -s 9920 current
cd /var/lib/snapd/snaps
ln -s lxd_9919.snap lxd_9920.snap

and it works.

very weird


(Stéphane Graber) #9

Can you show journalctl -u snap.lxd.daemon -n 600?


(fosslinux) #10

sure. In addition, the *solution* i provided eariler does not work after a reboot, and i cant replicate it working.

https://ttm.sh/BF.txt


(fosslinux) #11

Also, I reinstalled lxd and made a copy of a known working snap. fosslinux.me/lxd_9919.snap


(Stéphane Graber) #12

Does running systemctl reload snap.lxd.daemon after NetworkManager is properly connected to the network help?

This stuff may be a race condition where LXD starts before your /etc/resolv.conf makes sense on the host.


(fosslinux) #13

Ah yes, that fixes it!


(Stéphane Graber) #14

Okay, so yeah, looks like NM is starting pretty late on your system, effectively after systemd considers the network as ready, which then causes this issue.

You may be able to put a systemd override file on the snap.lxd.daemon.service unit to have it directly depend on NetworkManager or something, hopefully fixing that race in your environment.


(fosslinux) #15

Ok, thanks!!