LXD host default route changed

ilg · March 10, 2021, 11:47am

Hello!

Installed LXD 4.11 on Raspbian Debian 64bit via snap.
4 machines cluster, connected over WiFi.
Network of the cluster has two bridges configured.

$ lxc network list
+--------+----------+---------+--------------+------+----------------+---------+---------+
|  NAME  |   TYPE   | MANAGED |     IPV4     | IPV6 |  DESCRIPTION   | USED BY |  STATE  |
+--------+----------+---------+--------------+------+----------------+---------+---------+
| eth0   | physical | NO      |              |      |                | 0       |         |
+--------+----------+---------+--------------+------+----------------+---------+---------+
| k3sbr0 | bridge   | YES     | 10.0.10.1/24 | none |                | 8       | CREATED |
+--------+----------+---------+--------------+------+----------------+---------+---------+
| lxdbr0 | bridge   | YES     | 10.0.8.1/24  | none | Default bridge | 1       | CREATED |
+--------+----------+---------+--------------+------+----------------+---------+---------+
| wlan0  | physical | NO      |              |      |                | 0       |         |
+--------+----------+---------+--------------+------+----------------+---------+---------+


$ lxc network show k3sbr0
config:
  ipv4.address: 10.0.10.1/24
  ipv4.dhcp: "true"
  ipv4.dhcp.gateway: 10.0.10.1
  ipv4.dhcp.ranges: 10.0.10.2-10.0.10.200
  ipv4.nat: "true"
  ipv6.address: none
  tunnel.lan.id: "10"
  tunnel.lan.protocol: vxlan
description: ""
name: k3sbr0
type: bridge
used_by:
- /1.0/instances/kmaster-1
- /1.0/instances/kmaster-2
- /1.0/instances/kmaster-3
- /1.0/instances/kworker-1
- /1.0/instances/kworker-2
- /1.0/instances/kworker-3
- /1.0/instances/kworker-4
- /1.0/profiles/k3s
managed: true
status: Created
locations:
- raspb-001
- raspb-002
- raspb-003
- raspb-004

$ lxc network info k3sbr0
Name: k3sbr0
MAC address: 00:16:3e:84:d0:fc
MTU: 1400
State: up

Ips:
  inet	10.0.10.1
  inet6	fe80::216:3eff:fe84:d0fc

Network usage:
  Bytes received: 2.68MB
  Bytes sent: 100.11MB
  Packets received: 50632
  Packets sent: 84424


$ ip r
default via 10.0.10.1 dev k3sbr0-lan proto dhcp src 10.0.10.68 metric 206 mtu 1400
default via 192.168.8.1 dev wlan0 proto dhcp src 192.168.8.111 metric 303
10.0.8.0/24 dev lxdbr0 proto kernel scope link src 10.0.8.1
10.0.10.0/24 dev k3sbr0 proto kernel scope link src 10.0.10.1
10.0.10.0/24 dev k3sbr0-lan proto dhcp scope link src 10.0.10.68 metric 206 mtu 1400
169.254.0.0/16 dev k3sbr0-mtu scope link src 169.254.118.34 metric 205
169.254.0.0/16 dev lxdbr0-mtu scope link src 169.254.234.204 metric 208
169.254.0.0/16 dev veth39cb3eb6 scope link src 169.254.139.134 metric 210
169.254.0.0/16 dev vetha1b1be2d scope link src 169.254.168.45 metric 212
192.168.8.0/24 dev wlan0 proto dhcp scope link src 192.168.8.111 metric 303

k3sbr0 was created using the following command:

$ lxc network create k3sbr0 \
  ipv4.address=10.0.10.1/24 \
  ipv4.dhcp.ranges=10.0.10.2-10.0.10.200 ipv4.dhcp=true ipv4.nat=true \
  ipv4.dhcp.gateway=10.0.10.1 \
  ipv6.address=none \
  tunnel.lan.id=10 tunnel.lan.protocol=vxlan

192.168.8.0/24 is the network address of the hosts.

The problem is that on every LXD host default gateway entry added to point into k3sbr0-lan interface, which makes internet unavailable to the host and to the containers. If gateway removed manually, everything works fine for some time, then the route is being added back by some process.

So, the first entry in the table above causing problems and should not be there.

It looks like some vxlan tunnel specific setting is causing the problem. Any help is appreciated.

tomp · March 10, 2021, 5:54pm

Can you show the problem routing table please with ip r on the host?

tomp · March 10, 2021, 5:55pm

And does the route get re-added if LXD is reloaded?

ilg · March 10, 2021, 9:28pm

@tomp thanks for your reply. the route get re-added even if I do nothing. I did not check reloading of LXD explicitly, I can verify that as well.

ip r command output is in the message above, here is the interesting part of it:

$ ip r
default via 10.0.10.1 dev k3sbr0-lan proto dhcp src 10.0.10.68 metric 206 mtu 1400
default via 192.168.8.1 dev wlan0 proto dhcp src 192.168.8.111 metric 303

the second default gw is the one which should stay, as it points to the physical interface. Right now I have added cron script to remove that first entry, but I would prefer to have a proper fix for this problem of course

I do not have access to the box right now, but I remember I have seen an interesting message in the logs of the LXD host regarding vxlan transitioning to another address or something similar, or something similar, unfortunately cannot remember the full log message right now. I can check it tomorrow. Just wondering if that could be causing routing table updates.

tomp · March 10, 2021, 10:42pm

It looks like your host is running a DHCP client that is getting a DHCP lease over both interfaces. Turn that off and it should stop.

ilg · March 10, 2021, 10:51pm

@tomp, yes that is true, I should have mentioned that in a first place probably, hosts are getting static DHCP lease from the server. I can provision them with static IPs as well, I will try that tomorrow or next day.

Thank you for your help!

tomp · March 10, 2021, 10:53pm

Yes that will cause them to get a default route from both interfaces.