Debian Sid LXD DNS won't resolve

Hi LXDers!

I installed LXD 3.9 on my Debian Sid system (fully updated), via snap yesterday. Everything was working fine. I turned off my system for the night, no updates or anything.

I booted my system this morning, ran lxd exec testing -- bash and ran apt update. I got the following error:

Err:1 http://security.ubuntu.com/ubuntu bionic-security InRelease        
  Temporary failure resolving 'security.ubuntu.com'
Err:2 http://archive.ubuntu.com/ubuntu bionic InRelease                  
  Temporary failure resolving 'archive.ubuntu.com'
Err:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:4 http://archive.ubuntu.com/ubuntu bionic-backports InRelease
  Temporary failure resolving 'archive.ubuntu.com'

Obviously, this means that there is an issue with DNS.

This would not work on any container. To attempt to debug it further, I attempted to create a fresh container to reproduce the problem, however I got:

âžś lxc launch ubuntu:18.04
Creating the container
Error: Failed container creation: Get https://cloud-images.ubuntu.com/releases/streams/v1/index.json: lookup cloud-images.ubuntu.com on [::1]:53: read udp [::1]:45913->[::1]:53: read: connection refused

(this command was run from the host).

Is this LXD that is making the problem?

snap is working fine. snap install vscode worked.

Can you show:

  • ls -lh /etc
  • ls -lh /proc/$(pgrep daemon.start)/root/etc/

This kind of problem usually happens due to /etc/resolv.conf pointing somewhere we’ve not seen before, causing the snap environment having a broken /etc/resolv.conf

https://pastebin.com/J7su687q

hmm, $(pgrep daemon.start) dosent return anything

but yeah, resolv.conf points to /run/NetworkManager/resolv.conf.

Hmm, /run/NetworkManager is one of those that we do support, so this is a bit odd.

Can you show ps fauxww so I can find a pid suitable to go see what’s going on inside the snap.

@stgraber sorry for the delay, internet went out.

I just rebooted, same error. Here’s a fresh ps fauxww after opening Firefox, gnome-terminal, and running lxc exec dns -- bash and lxc launch ubuntu:18.04.

https://ttm.sh/W3.txt

It’s a bit strange, there dosen’t seem to be any snap/lxd processes.

@stgraber I found a very hacky solution.

cd /snap/lxd
cp -r 9919 9920
vim 9920/etc/resolv.conf # add nameserver 1.1.1.1
rm current
ln -s 9920 current
cd /var/lib/snapd/snaps
ln -s lxd_9919.snap lxd_9920.snap

and it works.

very weird

Can you show journalctl -u snap.lxd.daemon -n 600?

sure. In addition, the *solution* i provided eariler does not work after a reboot, and i cant replicate it working.

https://ttm.sh/BF.txt

Also, I reinstalled lxd and made a copy of a known working snap. fosslinux.me/lxd_9919.snap

Does running systemctl reload snap.lxd.daemon after NetworkManager is properly connected to the network help?

This stuff may be a race condition where LXD starts before your /etc/resolv.conf makes sense on the host.

Ah yes, that fixes it!

Okay, so yeah, looks like NM is starting pretty late on your system, effectively after systemd considers the network as ready, which then causes this issue.

You may be able to put a systemd override file on the snap.lxd.daemon.service unit to have it directly depend on NetworkManager or something, hopefully fixing that race in your environment.

1 Like

Ok, thanks!!