`lxd init` fails to initialize network bridge due to DNS/DHCP error on Debian 11

tzima · April 25, 2023, 9:58pm

Hello.

I’m trying to install lxd on Debian 11.6 from a snap package.

Steps to reproduce (running as root on the host machine):

snap install lxd --channel=5.0/stable
lxd init

Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: 
Name of the storage backend to use (ceph, cephobject, dir, lvm, zfs, btrfs) [default=zfs]: dir
Would you like to connect to a MAAS server? (yes/no) [default=no]:   
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
Would you like the LXD server to be available over the network? (yes/no) [default=no]:  
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: no
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

It fails with the following error:

Error: Failed to create local member network "lxdbr0" in project "default": The DNS and DHCP service exited prematurely: Process exited with non-zero value 2 ("dnsmasq: failed to create listening socket for 10.45.46.1: Address already in use")

Diagnostics:

I have dnsmasq running and listening on port 53:

# lsof -n -i :53
COMMAND    PID    USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
dnsmasq 774631 dnsmasq    4u  IPv4 2061807      0t0  UDP *:domain 
dnsmasq 774631 dnsmasq    5u  IPv4 2061808      0t0  TCP *:domain (LISTEN)
dnsmasq 774631 dnsmasq    6u  IPv6 2061809      0t0  UDP *:domain 
dnsmasq 774631 dnsmasq    7u  IPv6 2061810      0t0  TCP *:domain (LISTEN)

When I force dnsmasq to stop (systemctl stop dnsmasq) and run the lxd init again with the same configuration options as shown above (except reusing the now-existing storage pool), the network bridge is created and no error is shown.

I verified the configuration by running:

# lxc network list
+---------+----------+---------+----------------+---------------------------+-------------+---------+---------+
|  NAME   |   TYPE   | MANAGED |      IPV4      |           IPV6            | DESCRIPTION | USED BY |  STATE  |
+---------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| docker0 | bridge   | NO      |                |                           |             | 0       |         |
+---------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| enp3s0  | physical | NO      |                |                           |             | 0       |         |
+---------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| lxdbr0  | bridge   | YES     | 10.123.65.1/24 | fd42:4146:88d7:b594::1/64 |             | 1       | CREATED |
+---------+----------+---------+----------------+---------------------------+-------------+---------+---------+
| virbr0  | bridge   | NO      |                |                           |             | 0       |         |
+---------+----------+---------+----------------+---------------------------+-------------+---------+---------+

And also:

# lxc network info lxdbr0
Name: lxdbr0
MAC address: 00:16:3e:2b:f4:50
MTU: 1500
State: up
Type: broadcast

IP addresses:
  inet	10.123.65.1/24 (global)
  inet6	fd42:4146:88d7:b594::1/64 (global)

Network usage:
  Bytes received: 0B
  Bytes sent: 0B
  Packets received: 0
  Packets sent: 0

Bridge:
  ID: 8000.00163e2bf450
  STP: false
  Forward delay: 1500
  Default VLAN ID: 1
  VLAN filtering: true
  Upper devices:

However, when I create a new container, the networking doesn’t work in the guest system:

# lxc launch images:ubuntu/20.04 demo
Creating demo
Starting demo                               
# lxc exec demo -- ping 8.8.4.4
ping: connect: Network is unreachable

Needless to say, the said address 8.8.4.4 is accessible from the host system.

The guest system doesn’t have the IPv4 address assigned, but it does have an IPv6 address assigned:

# lxc exec demo -- ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
9: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:0a:e6:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fd42:f718:a7db:f3bf:216:3eff:fe0a:e60c/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 3595sec preferred_lft 3595sec
    inet6 fe80::216:3eff:fe0a:e60c/64 scope link 
       valid_lft forever preferred_lft forever

The guest system can access a remote IPv6 address (though DNS doesn’t work in the guest system):

# host seznam.cz | grep IPv6
seznam.cz has IPv6 address 2a02:598:a::79:222
seznam.cz has IPv6 address 2a02:598:2::1222
# lxc exec demo -- ping -c1 2a02:598:a::79:222
PING 2a02:598:a::79:222(2a02:598:a::79:222) 56 data bytes
64 bytes from 2a02:598:a::79:222: icmp_seq=1 ttl=53 time=9.03 ms

--- 2a02:598:a::79:222 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 9.028/9.028/9.028/0.000 ms

I can also connect to the guest system from the host system. When I run lxc exec demo -- nc -6 -l -p 8080 in one terminal and then echo "hello" | nc -6 fd42:f718:a7db:f3bf:216:3eff:fe0a:e60c 8080 in the other terminal, it works as expected.

I previously also tried to directly attach enp3s0 (host machine’s physical interface) to the guest system, which made IPv4 networking and DNS accessible to the guest system, but that’s obviously not a proper solution – I need the guest system to have it’s own network card and IP address to be able to do networking between the guest system and the host system.

Testing again with the most recent version:

I also tried to install the most recent version instead of the LTS version:

# snap install lxd
lxd 5.13-cea5ee2 from Canonical✓ installed

Running lxd init again failed with the very same error as shown in the beginning of the post.

Side-note (a different minor bug):

snap package cannot be easily removed using snap remove lxd:

# snap remove lxd
2023-04-25T23:26:04+02:00 INFO Waiting for "snap.lxd.daemon.service" to stop.
error: cannot perform the following tasks:
- Stop snap "lxd" services (systemctl command [--no-reload enable snap.lxd.user-daemon.unix.socket snap.lxd.daemon.unix.socket snap.lxd.activate.service] failed with exit status 1: Failed to enable unit: Unit file snap.lxd.user-daemon.unix.socket does not exist.
)
- Remove data for snap "lxd" (24758) (unlinkat /var/snap/lxd/common/shmounts: device or resource busy)

It’s necessary to first manually run umount /var/snap/lxd/common/shmounts and only then can the snap package be uninstalled normally.

Question:

Please tell me how to properly install LXD on Debian 11 and configure the networking. By default, I want all guest systems to a) have access to the Internet, b) be accessible over the network from the host system (so that I can run e.g. sshd or a web server in the guest system). In specific circumstances where that won’t be required, I also want to be able to either block the Internet access or detach the network interface altogether – but I already know how to do the latter and I’m pretty sure I would be able to figure out the former from the documentation as well.

Please note that I’m a complete beginner in regards to LXC/LXD. Any help will be greatly appreciated.

Thanks.

PS: I’m sorry about the syntax highlighting in the code-blocks above, but I didn’t find a way to disable it. I simply used Markdown’s triple backticks and didn’t specify any language; it seems to assume some language on its own and none of ```raw, ```none or ```ascii forced it off.

RichardHuxton · April 26, 2023, 5:12pm

I can’t solve all of this, but I can definitely point you in the right direction.

You are running dnsmasq on the host bound to all network interfaces so when lxd tries to start up its own dnsmasq on lxdbr0 to hand out addresses to containers that fails. That was why stopping dnsmasq on your machine let the network come up.

You don’t need to go that far though I think what you probably want to do is in /etc/dnsmasq.conf make sure you have the following uncommented:

interface=lo
interface=eth0

bind-interfaces

The comments around that should make sense.

Then, to make sure you can use dns to e.g. ping mycontainer.lxd from your host then you should create a file /etc/dnsmasq.d/lxd with the following single line:

server=/lxd/10.123.65.1

That tells dnsmasq to forward any name lookups for *.lxd to your lxdbr0 address. Note that you may get a different address if you reinstall lxd.

Then - if restarting containers doesn’t grant them an IP address check your firewall.

Now - as far as routing traffic goes, if you do have a firewall make sure it is set up to route traffic through lxdbr0. For ufw that might be something like this:

ufw allow in on lxdbr0
ufw route allow in on lxdbr0
ufw route allow out on lxdbr0

As far as your problems with snap and uninstalling go, I’m afraid I can’t help much. I’ve found snap (on non-ubuntu distros anyway) to be temperamental and I find some of its core design decisions questionable but if you want to run lxd you don’t really have any choice.

tzima · April 27, 2023, 3:39am

Wow, thank you for a fast reply with such a thorough explanation! That solved it for me. The networking now works and I can access my containers through the *.lxd domain, which I didn’t even realize was possible (I would otherwise go with /etc/hosts, which would be sufficient, but this is much much better). I’m marking the problem as solved, thank you for your help.