LXD via snapd - containers have no network and IP address after host upgrade

swyngaard · August 14, 2019, 8:51pm

I did an apt-get upgrade and now none of my LXD containers have an IP address. Host is Debian testing/bullseye. This is the output of sudo journalctl -u snap.lxd.daemon | cat:

-- Logs begin at Wed 2019-08-14 15:48:13 SAST, end at Wed 2019-08-14 22:44:49 SAST. --
Aug 14 21:46:42 vetkoek systemd[1]: Started Service for snap application lxd.daemon.
Aug 14 21:46:42 vetkoek lxd.daemon[988]: => Preparing the system
Aug 14 21:46:42 vetkoek lxd.daemon[988]: ==> Loading snap configuration
Aug 14 21:46:42 vetkoek lxd.daemon[988]: ==> Setting up mntns symlink (mnt:[4026532434])
Aug 14 21:46:42 vetkoek lxd.daemon[988]: ==> Setting up persistent shmounts path
Aug 14 21:46:42 vetkoek lxd.daemon[988]: ====> Making LXD shmounts use the persistent path
Aug 14 21:46:42 vetkoek lxd.daemon[988]: ====> Making LXCFS use the persistent path
Aug 14 21:46:42 vetkoek lxd.daemon[988]: ==> Setting up kmod wrapper
Aug 14 21:46:42 vetkoek lxd.daemon[988]: ==> Preparing /boot
Aug 14 21:46:42 vetkoek lxd.daemon[988]: ==> Preparing a clean copy of /run
Aug 14 21:46:42 vetkoek lxd.daemon[988]: ==> Preparing a clean copy of /etc
Aug 14 21:46:43 vetkoek lxd.daemon[988]: ==> Setting up ceph configuration
Aug 14 21:46:43 vetkoek lxd.daemon[988]: ==> Setting up LVM configuration
Aug 14 21:46:43 vetkoek lxd.daemon[988]: ==> Rotating logs
Aug 14 21:46:43 vetkoek lxd.daemon[988]: ==> Escaping the systemd cgroups
Aug 14 21:46:43 vetkoek lxd.daemon[988]: ====> Detected cgroup V1
Aug 14 21:46:43 vetkoek lxd.daemon[988]: ==> Escaping the systemd process resource limits
Aug 14 21:46:43 vetkoek lxd.daemon[988]: ==> Disabling shiftfs on this kernel (auto)
Aug 14 21:46:43 vetkoek lxd.daemon[988]: ==> Detected kernel with partial AppArmor support
Aug 14 21:46:43 vetkoek lxd.daemon[988]: => Starting LXCFS
Aug 14 21:46:43 vetkoek lxd.daemon[988]: => Starting LXD
Aug 14 21:46:43 vetkoek lxd.daemon[988]: mount namespace: 5
Aug 14 21:46:43 vetkoek lxd.daemon[988]: hierarchies:
Aug 14 21:46:43 vetkoek lxd.daemon[988]:   0: fd:   6: pids
Aug 14 21:46:43 vetkoek lxd.daemon[988]:   1: fd:   7: memory
Aug 14 21:46:43 vetkoek lxd.daemon[988]:   2: fd:   8: freezer
Aug 14 21:46:43 vetkoek lxd.daemon[988]:   3: fd:   9: devices
Aug 14 21:46:43 vetkoek lxd.daemon[988]:   4: fd:  10: cpuset
Aug 14 21:46:43 vetkoek lxd.daemon[988]:   5: fd:  11: blkio
Aug 14 21:46:43 vetkoek lxd.daemon[988]:   6: fd:  12: perf_event
Aug 14 21:46:43 vetkoek lxd.daemon[988]:   7: fd:  13: rdma
Aug 14 21:46:43 vetkoek lxd.daemon[988]:   8: fd:  14: cpu,cpuacct
Aug 14 21:46:43 vetkoek lxd.daemon[988]:   9: fd:  15: net_cls,net_prio
Aug 14 21:46:43 vetkoek lxd.daemon[988]:  10: fd:  16: name=systemd
Aug 14 21:46:43 vetkoek lxd.daemon[988]:  11: fd:  17: unified
Aug 14 21:46:43 vetkoek lxd.daemon[988]: t=2019-08-14T21:46:43+0200 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored."
Aug 14 21:47:02 vetkoek lxd.daemon[988]: t=2019-08-14T21:47:02+0200 lvl=eror msg="Failed to bring up network" err="Failed to run: dnsmasq --strict-order --bind-interfaces --pid-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --no-ping --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.249.196.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.249.196.2,10.249.196.254,1h -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd: " name=lxdbr0
Aug 14 21:47:02 vetkoek lxd.daemon[988]: => LXD is ready
Aug 14 22:38:08 vetkoek lxd.daemon[988]: t=2019-08-14T22:38:08+0200 lvl=warn msg="Detected poll(POLLNVAL) event."

I tried refreshing the snap with snap refresh lxd and sudo systemctl restart snap.lxd.daemon but no change. From the above log, there seems to be an issue with dnsmasq but I’m not sure how to resolve it.

stgraber · August 14, 2019, 9:01pm

Indeed, something must be wrong with dnsmasq, but we’ve sure seen more useful errors

The most common source of problems is having something else bind port 53.
You can check that with netstat -lnp.

You may also find some dnsmasq log messages in /var/log/syslog

You can also try to nudge the LXD networking code by making a trivial change which would cause it to reload, possibly printing a more useful error:

lxc network set lxdbr0 bridge.mtu 1500
lxc network unset lxdbr0 bridge.mtu

In a working setup, you’d want to run both of those to undo the temporary change, but if dnsmasq still fails, the former will effectively never take effect, so not point in then running the latter.

swyngaard · August 14, 2019, 9:08pm

Nevermind, I recently installed the polipo package and it had automatically installed dnsmasq as a dependency. I just removed those packages and my containers are getting IP addresses again.