IPv6 route lost after a random amount of time

Lester · October 16, 2018, 9:05am

Hi! I’m experimenting with LXD clustering (snap 3.6).
At the moment, I have two armv7 hosts on Ubuntu 18.04 and a single bridged network, manually tunneled with GRE.

When Alpine (3.8 and edge) containers start, they are able to communicate between each others and with internet, both in IPv4 and IPv6. However, after some time, IPv6 connectivity is completely lost.

I’ve noticed some strange output regarding the “expires” duration. The following output is produced right after container startup and during IPv6 failure.

# ip -6 route # in container
fd42:da74:f51:7a04::/64 dev eth0  metric 256  expires 0sec
fe80::/64 dev eth0  metric 256 
default via fe80::20f8:b4ff:fee9:e852 dev eth0  metric 1024  expires 0sec
default via fe80::a456:d6ff:fe02:ac4b dev eth0  metric 1024  expires 0sec
ff00::/8 dev eth0  metric 256

Restarting the container network with service networking restart gives IPv6 connectivity again.
I’ve not observed this issue on Debian containers (yet).
What can I do to investigate and solve this issue? Thanks!

Edit: Debian containers are also affected after some time

simos · October 16, 2018, 10:00am

It’s a weird issue. I am trying to figure out a way on how to reproduce.

I suppose you have tried with such a host (armv7) and Alpine containers on a typical setup and IPv6 works fine. Is that correct?

Also, you can use tcpdump/tshark to collect the packets and see what’s going on on the network.
The straightforward theory (for testing) is that the Alpine containers are failing to make specific IPv6 network connections.

Also, if you can measure the expiration time (when IPv6 starts to fail), you can convert to secods or minutes and see if it’s a common whole number.

Failing all that, you would need to document your setup (bridged network and how you manually tunnel with GRE).

Lester · October 16, 2018, 10:40am

Thanks for your feedback!
I’m trying on a typical setup (that is, without tunnels) right now.
However, it took more than 24 hours before last failure, so I’m waiting for more “fresh” data to provide.

Lester · October 17, 2018, 7:10pm

So, during around 24 hours everything worked fine.
However, I created additional containers and ended up in a very unstable situation where no IPv6 connectivity was available from any container.

I had to lxc restart --all to retrieve a stable environment (which is fine as a temporary workaround).

I have several questions that might help me fully understand the issue:

Are network configuration synchronized between hosts in the cluster? I had to execute the following command on several hosts to apply the custom dnsmasq configuration:

lxc network set br0 raw.dnsmasq "address=/example.com/10.25.10.xxx"

Is dnsmasq restarted during containers (re)configuration? How could it affect containers?
I tried to extract some relevant information from tcpdump but I’ll probably have to record more traces. I tried to see if logs contained warnings, but nothing to report. Is there something I could do to increase lxd verbosity, especially during failures?

Here is my network configuration, with two hosts on the “external” network, 192.168.1.8 and 192.168.1.9.

config:
  ipv4.address: 10.25.10.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:da74:f51:7a04::1/64
  ipv6.nat: "true"
  raw.dnsmasq: address=/example.com/10.25.10.xxx
  tunnel.host0.local: 192.168.1.8
  tunnel.host0.protocol: gre
  tunnel.host0.remote: 192.168.1.9
  tunnel.host1.local: 192.168.1.9
  tunnel.host1.protocol: gre
  tunnel.host1.remote: 192.168.1.8
description: ""
name: br0
type: bridge
used_by:
  [redacted]
managed: true
status: Created
locations:
- host0
- host1

Thanks a lot!

Lester · October 22, 2018, 10:01am

Some new insights!

I tried to increase the dhcp lease time of my network.
It restarted the dnsmasq service on host0, but not on host1.

Here is a shortened result of ps aux | grep dns

Host0:

dnsmasq [...] --dhcp-range 10.25.10.2,10.25.10.254,12h

Host1:

dnsmasq [...] --dhcp-range 10.25.10.2,10.25.10.254,1h

Then, I restarted the networking service in containers multiple times, and it seemed that sometimes the lease was 1h, and sometimes the lease was 12h.

udhcpc: started, v1.28.4
udhcpc: sending discover
udhcpc: sending select for 10.25.10.101
udhcpc: lease of 10.25.10.101 obtained, lease time 43200
[...]
udhcpc: started, v1.28.4
udhcpc: sending discover
udhcpc: sending select for 10.25.10.101
udhcpc: lease of 10.25.10.101 obtained, lease time 3600

When the lease duration was 43200, container had IPv6. In the other case, no IPv6.
I killed dnsmasq manually on host1 and both lease time and connectivity seem to be good.

TL;DR: it seems to have a race conditions between dnsmasq instances with GRE tunnels. Multiple dnsmasq are running with conflicting configurations, whereas only one seems to be required.
am I right?