DHCP stopped working

I have a system where the containers don’t get an IP.
I don’t see a dnsmasq process running on my system.
But according to the pid file (/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.pid)

$ cat /var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.pid
name: dnsmasq
args: [–keep-in-foreground, --strict-order, --bind-interfaces, --except-interface=lo,
–pid-file=, --no-ping, --interface=lxdbr0, --quiet-dhcp, --quiet-dhcp6, --quiet-ra,
–listen-address=, --dhcp-no-override, --dhcp-authoritative, --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases,
–dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts, --dhcp-range,
‘,,1h’, -s, lxd, -S, /lxd/, --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw,
-u, lxd, -g, lxd]
apparmor: lxd_dnsmasq-lxdbr0_</var/snap/lxd/common/lxd>
pid: 11307
stdout: “”
stderr: “”
uid: 0
gid: 0
set_groups: false

There might be one running. I tried to start the dnsmasq in a snap shell

dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr0 --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address= --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range ',,1h' -s lxd -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd -g lxd

dnsmasq: cannot create DHCP socket: Permission denied

Which does not work.

$ lxc network show lxdbr0 
  ipv4.nat: "true"
  ipv6.address: none
description: ""
name: lxdbr0
type: bridge
- /1.0/instances/lol-internet
- /1.0/profiles/default
managed: true
status: Created
- consul-4
- consul-5
- server1
- consul-2
- consul-3

LXD bridge shows that it is used by instances like lol-internet

How can I debug why dnsmasq is not running?

Please show output of sudo ss -ulpn on your host.

$ sudo ss -ulpn
State        Recv-Q       Send-Q              Local Address:Port                Peer Address:Port       Process                                                                                                 
UNCONN       0            0                           *           users:(("rpcbind",pid=632,fd=5),("systemd",pid=1,fd=93))                                               
UNCONN       0            0                         *                                                                                                                  
UNCONN       0            0                    *           users:(("docker-proxy",pid=1456,fd=4))                                                                 
UNCONN       0            0                   *           users:(("nomad",pid=764,fd=10))                                                                        
UNCONN       0            0                   *           users:(("docker-proxy",pid=1508,fd=4))                                                                 
UNCONN       0            0                         *           users:(("rpc.mountd",pid=694,fd=16))                                                                   
UNCONN       0            0                          *                                                                                                                  
UNCONN       0            0                         *           users:(("rpc.mountd",pid=694,fd=12))                                                                   
UNCONN       0            0                         *           users:(("rpc.mountd",pid=694,fd=8))                                                                    
UNCONN       0            0                   *           users:(("docker-proxy",pid=1483,fd=4))                                                                 
UNCONN       0            0                               *:53                             *:*           users:(("consul",pid=746,fd=15))                                                                       
UNCONN       0            0                               *:8301                           *:*           users:(("consul",pid=746,fd=14))                                                                       
UNCONN       0            0                               *:8302                           *:*           users:(("consul",pid=746,fd=9))                                                                        
UNCONN       0            0                            [::]:111                         [::]:*           users:(("rpcbind",pid=632,fd=7),("systemd",pid=1,fd=96))                                               
UNCONN       0            0                            [::]:50391                       [::]:*                                                                                                                  
UNCONN       0            0                            [::]:38357                       [::]:*           users:(("rpc.mountd",pid=694,fd=14))                                                                   
UNCONN       0            0                            [::]:2049                        [::]:*                                                                                                                  
UNCONN       0            0                            [::]:51833                       [::]:*           users:(("rpc.mountd",pid=694,fd=10))                                                                   
UNCONN       0            0                            [::]:44889                       [::]:*           users:(("rpc.mountd",pid=694,fd=18)) 

This shows at least that something is running on port 53.

It looks like consul is running a dns server on :53 not sure why this ever worked but when I stop consul and restart lxd the dnsmasq is started correctly.

Question is why is this nowhere visible that the dnsmasq server fails to start up.

You should see something in the LXD logs about failing to start dnsmasq if it cannot listen. It could also be something we potentially add to the new persistent warnings system.

CC @stgraber @monstermunchkin

I tried to run sudo lxd --debug -group lxd and there was no output that dnsmasq failed to start. So In which logs would this be visible?

I re-created the scenario and saw this in the logs:

journalctl -b | grep dnsmasq
dnsmasq[751328]: failed to create listening socket for Address already in use
dnsmasq[751328]: FAILED to start up

However you’re correct that LXD’s own logs do not register that dnsmasq started and then immediately exited, so I’ve added this to LXD now:

1 Like