Random IPv6 address dropouts in lxdbr0

amcduffee · August 4, 2021, 8:01pm

I am seeing random occurrences where the IPv6 address of a container is somehow being unassigned/dropped. It hangs my active SSH connections into the container, so it is very obvious. It takes a good 5 mins or more for it to resolve itself at which point the hung connections go back to normal.

Any ideas what might cause this? Below are observations from the host machine and the container shortly after it happens.

Host Info:

OS: Ubuntu 18.04.5 LTS
LXD: 4.16

LXD network config for lxdbr0:

config:
  dns.search: lxd,corp.terasci.com
  ipv4.address: 10.11.12.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:c8f3:56ae:8db::1/64
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/build-armbian
managed: true
status: Created
locations:
- none

Container config:

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 21.04 amd64 (release) (20210616)
  image.label: release
  image.os: ubuntu
  image.release: hirsute
  image.serial: "20210616"
  image.type: squashfs
  image.version: "21.04"
  security.syscalls.intercept.mount: "true"
  security.syscalls.intercept.mount.allowed: devtmpfs
  volatile.base_image: f1d9d2d7ea5d90691c4559f0bdb1b68598041f0c90678451695b5d7e8a98d327
  volatile.eth0.host_name: vethfc8ff1ce
  volatile.eth0.hwaddr: 00:16:3e:fa:f9:5e
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: df7b3c90-f517-41d6-82df-ef328bbe7b38
devices: {}
ephemeral: false
profiles:
- default
- loop
stateful: false
description: ""

Interface details for ‘lxdbr0’ on host:

anderson-ryzen9:~$ ip addr show dev lxdbr0
4: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:dc:42:47 brd ff:ff:ff:ff:ff:ff
    inet 10.11.12.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fd42:c8f3:56ae:8db::1/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fedc:4247/64 scope link 
       valid_lft forever preferred_lft forever

DNS lookup for container name ‘build-armbian.lxd’:

anderson-ryzen9:~$ dig -tAAAA build-armbian.lxd

; <<>> DiG 9.11.3-1ubuntu1.15-Ubuntu <<>> -tAAAA build-armbian.lxd
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28439
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;build-armbian.lxd.		IN	AAAA

;; ANSWER SECTION:
build-armbian.lxd.	0	IN	AAAA	fd42:c8f3:56ae:8db:216:3eff:fefa:f95e

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Wed Aug 04 12:21:42 PDT 2021
;; MSG SIZE  rcvd: 74

Interface details from within the container:

build-armbian:~$ ip addr
5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:fa:f9:5e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.11.12.199/24 brd 10.11.12.255 scope global dynamic eth0
       valid_lft 2848sec preferred_lft 2848sec
    inet6 fe80::216:3eff:fefa:f95e/64 scope link 
       valid_lft forever preferred_lft forever

Ping, from the host, to the IPv6 address returned from DNS:

anderson-ryzen9:~$ ping6 build-armbian.lxd
PING build-armbian.lxd(fd42:c8f3:56ae:8db:216:3eff:fefa:f95e (fd42:c8f3:56ae:8db:216:3eff:fefa:f95e)) 56 data bytes
From anderson-ryzen9 (fd42:c8f3:56ae:8db::1) icmp_seq=1 Destination unreachable: Address
From anderson-ryzen9 (fd42:c8f3:56ae:8db::1) icmp_seq=196 Destination unreachable: Address unreachable
64 bytes from fd42:c8f3:56ae:8db:216:3eff:fefa:f95e (fd42:c8f3:56ae:8db:216:3eff:fefa:f95e): icmp_seq=198 ttl=64 time=1024 ms
64 bytes from fd42:c8f3:56ae:8db:216:3eff:fefa:f95e (fd42:c8f3:56ae:8db:216:3eff:fefa:f95e): icmp_seq=197 ttl=64 time=2048 ms
64 bytes from fd42:c8f3:56ae:8db:216:3eff:fefa:f95e (fd42:c8f3:56ae:8db:216:3eff:fefa:f95e): icmp_seq=199 ttl=64 time=0.027 ms
64 bytes from fd42:c8f3:56ae:8db:216:3eff:fefa:f95e (fd42:c8f3:56ae:8db:216:3eff:fefa:f95e): icmp_seq=200 ttl=64 time=0.056 ms

So, at this point the connection has returned according to the ping, below are the interface details again after the connections are back.

Interface details for ‘lxdbr0’ on host:

anderson-ryzen9:~$ ip addr show dev lxdbr0
4: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:dc:42:47 brd ff:ff:ff:ff:ff:ff
    inet 10.11.12.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fd42:c8f3:56ae:8db:216:3eff:fedc:4247/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 2679sec preferred_lft 2679sec
    inet6 fd42:c8f3:56ae:8db::1/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fedc:4247/64 scope link 
       valid_lft forever preferred_lft forever

Interface details from within the container:

build-armbian:~$ ip addr
5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:fa:f9:5e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.11.12.199/24 brd 10.11.12.255 scope global dynamic eth0
       valid_lft 2716sec preferred_lft 2716sec
    inet6 fd42:c8f3:56ae:8db:216:3eff:fefa:f95e/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 3594sec preferred_lft 3594sec
    inet6 fe80::216:3eff:fefa:f95e/64 scope link 
       valid_lft forever preferred_lft forever

No indication of any problems on ‘dmesg’ output on the host.

Not sure where to look next?

tomp · August 4, 2021, 9:02pm

The first thing I would check is that you don’t have a firewall on the host or upstream interfering with NDP packets, either router advertisements from lxds dnsmasq or router solicitations from your instances towards lxds dnsmasq.

amcduffee · August 4, 2021, 9:34pm

Ok, I am not well versed in IPv6, any suggestions on how to investigate? I am familiar with tshark to watch the interface, but I don’t really know what to look for.

tomp · August 5, 2021, 8:23am

IPv6s assigned by router announcements (RAs) have a lifetime, you can see this in your ip addr output above:

    inet6 fd42:c8f3:56ae:8db:216:3eff:fefa:f95e/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 3594sec preferred_lft 3594sec

Lifetime of 3594sec, so just under 1 hour left.

You would expect the LXD dnsmasq to advertise itself more frequently than that, thus renewing the allocation lifetime.

However if that NDP router advert is not getting through then it may be your container is expiring its IP and then re-requesting via a router solicitation which is why it comes back shortly afterwards.

I would run tcpdump -i eth0 -nn ip6 inside the container and look out for RAs from LXD’s dnsmasq.

amcduffee · August 5, 2021, 7:27pm

Thanks for the explanation, that helped me learn a few bits I didn’t know.

When I run tcpdump from the container I don’t see any output, possibly due to it being an unprivileged container?

I tried from the host instead with tcpdump -nn -i lxdbr0 icmp6 and see the following neighbor solicitation/advertisements often:

12:07:27.691069 IP6 fe80::216:3eff:fedc:4247 > fd42:c8f3:56ae:8db:216:3eff:fefa:f95e: ICMP6, neighbor solicitation, who has fd42:c8f3:56ae:8db:216:3eff:fefa:f95e, length 32
12:07:27.691133 IP6 fd42:c8f3:56ae:8db:216:3eff:fefa:f95e > fe80::216:3eff:fedc:4247: ICMP6, neighbor advertisement, tgt is fd42:c8f3:56ae:8db:216:3eff:fefa:f95e, length 24
12:07:32.810971 IP6 fe80::216:3eff:fefa:f95e > fe80::216:3eff:fedc:4247: ICMP6, neighbor solicitation, who has fe80::216:3eff:fedc:4247, length 32
12:07:32.811059 IP6 fe80::216:3eff:fedc:4247 > fe80::216:3eff:fefa:f95e: ICMP6, neighbor advertisement, tgt is fe80::216:3eff:fedc:4247, length 24
12:07:37.930984 IP6 fe80::216:3eff:fedc:4247 > fe80::216:3eff:fefa:f95e: ICMP6, neighbor solicitation, who has fe80::216:3eff:fefa:f95e, length 32
12:07:37.931050 IP6 fe80::216:3eff:fefa:f95e > fe80::216:3eff:fedc:4247: ICMP6, neighbor advertisement, tgt is fe80::216:3eff:fefa:f95e, length 24
12:08:12.746988 IP6 fe80::216:3eff:fedc:4247 > fd42:c8f3:56ae:8db:216:3eff:fefa:f95e: ICMP6, neighbor solicitation, who has fd42:c8f3:56ae:8db:216:3eff:fefa:f95e, length 32
12:08:12.747041 IP6 fd42:c8f3:56ae:8db:216:3eff:fefa:f95e > fe80::216:3eff:fedc:4247: ICMP6, neighbor advertisement, tgt is fd42:c8f3:56ae:8db:216:3eff:fefa:f95e, length 24
12:08:17.866972 IP6 fe80::216:3eff:fefa:f95e > fe80::216:3eff:fedc:4247: ICMP6, neighbor solicitation, who has fe80::216:3eff:fedc:4247, length 32
12:08:17.867047 IP6 fe80::216:3eff:fedc:4247 > fe80::216:3eff:fefa:f95e: ICMP6, neighbor advertisement, tgt is fe80::216:3eff:fedc:4247, length 24
12:08:22.986985 IP6 fe80::216:3eff:fedc:4247 > fe80::216:3eff:fefa:f95e: ICMP6, neighbor solicitation, who has fe80::216:3eff:fefa:f95e, length 32
12:08:22.987049 IP6 fe80::216:3eff:fefa:f95e > fe80::216:3eff:fedc:4247: ICMP6, neighbor advertisement, tgt is fe80::216:3eff:fefa:f95e, length 24
12:08:25.358257 IP6 fe80::216:3eff:fedc:4247 > ff02::1: ICMP6, router advertisement, length 88
...
12:16:14.360471 IP6 fe80::216:3eff:fedc:4247 > ff02::1: ICMP6, router advertisement, length 88

There are router advertisements going to ff02::1:. Some reading indicates that is the ‘all nodes’ address? Yet, the advertisement must not be getting to the container because the valid_lft and preferred_lft continue to drop until the ip6 assignment is removed:

build-armbian:~$ ip addr
5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:fa:f9:5e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.11.12.199/24 brd 10.11.12.255 scope global dynamic eth0
       valid_lft 2971sec preferred_lft 2971sec
    inet6 fe80::216:3eff:fefa:f95e/64 scope link 
       valid_lft forever preferred_lft forever

amcduffee · August 5, 2021, 7:36pm

I also just noticed the following:

12:24:01.905938 IP6 fe80::216:3eff:fedc:4247 > ff02::1: ICMP6, router advertisement, length 88
// IP6 address, in container, dropped between these two RAs.
12:32:52.574991 IP6 fe80::216:3eff:fedc:4247 > ff02::1: ICMP6, router advertisement, length 88
// IP6 address, in container, comes back around the same time as the RA?

amcduffee · August 5, 2021, 8:00pm

I realized that tcpdump worked fine inside some of our production containers, so I decided to investigate why I couldn’t get any output from tcpdump in this build container. An strace on a simple tcpdump --version command shows the following:

write(2, "tcpdump version 4.9.3\n", 22) = -1 EACCES (Permission denied)
write(2, "libpcap version 1.10.0 (with TPA"..., 41) = -1 EACCES (Permission denied)
write(2, "OpenSSL 1.1.1j  16 Feb 2021\n", 28) = -1 EACCES (Permission denied)

Odd, it doesn’t have permission to write to stderr… So, a look at dmesg shows the following:

[1975423.197142] audit: type=1400 audit(1628192679.034:632): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-build-armbian_<var-snap-lxd-common-lxd>" profile="tcpdump" name="/dev/pts/0" pid=18362 comm="tcpdump" requested_mask="wr" denied_mask="wr" fsuid=1000000 ouid=1001001

Apparently, running tcpdump over SSH doesn’t work because apparmor is blocking operation="file_inherit" on name="/dev/pts/0", hence the EACCES errors.

Running tcpdump from an lxc exec build-armbian -- /bin/bash console doesn’t suffer the same problem.

tomp · August 5, 2021, 8:02pm

To confirm try running again in container with the -l flag as that should allow tcpdump to output in an unpriv container.

amcduffee · August 5, 2021, 8:28pm

That flag didn’t change anything, it still generates apparmor DENIED messages.

I can run tcpdump from an lxc exec console and can confirm that the container is seeing RAs from dnsmasq:

anderson@anderson-ryzen9:~$ lxc exec build-armbian -- /bin/bash
root@build-armbian:~# tcpdump -i eth0 -nn icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
...
13:01:53.126486 IP6 fe80::216:3eff:fedc:4247 > ff02::1: ICMP6, router advertisement, length 88
...
13:10:32.470577 IP6 fe80::216:3eff:fedc:4247 > ff02::1: ICMP6, router advertisement, length 88
...
13:18:46.471694 IP6 fe80::216:3eff:fedc:4247 > ff02::1: ICMP6, router advertisement, length 88

Not sure why dnsmasq is sending them on fe80:: address when the DHCP address it hands out are in fd42:c8f3:56ae:8db::?

tomp · August 5, 2021, 8:36pm

That’ll be its link-local address.

tomp · August 5, 2021, 8:37pm

Do you have any firewalls running on host or inside container?

tomp · August 5, 2021, 8:51pm

Also your address lifetime looks quite low, can you show output of lxc network show lxdbr0?

amcduffee · August 5, 2021, 8:57pm

Both iptables and ip6tables are installed on the host and in the container, but they don’t have any rules.

Host:

anderson@anderson-ryzen9:~$ sudo ip6tables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
anderson@anderson-ryzen9:~$ sudo ip6tables -nL -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination

Container:

anderson@build-armbian:~$ sudo ip6tables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
anderson@build-armbian:~$ sudo ip6tables -nL -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination

tomp · August 5, 2021, 8:58pm

And nft list ruleset on both?

amcduffee · August 5, 2021, 8:59pm

Host:

anderson-ryzen9:~$ nft list ruleset

Command 'nft' not found, but can be installed with:

sudo apt install nftables

Container:

anderson@build-armbian:~$ nft list ruleset
Command 'nft' not found, but can be installed with:
sudo apt install nftables

tomp · August 5, 2021, 9:03pm

And lxc network show lxdbr0?

BTW i just checked on my LXD host and each time an RA is received by the container the IPv6 lifetime address resets back to its maximum value again so check yours is doing the same, and if not then thats likely the issue. Also mine is an order of magnitude larger than your lifetime, suggesting something may have set it too low for the RA advert interval.

amcduffee · August 5, 2021, 9:05pm

anderson@anderson-ryzen9:~$ lxc network show lxdbr0
config:
  dns.search: lxd,corp.terasci.com
  ipv4.address: 10.11.12.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:c8f3:56ae:8db::1/64
  ipv6.nat: "true"
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/build-aosp-1404
- /1.0/instances/build-armbian
- /1.0/instances/build-ipxe
- /1.0/instances/build-stretch
- /1.0/profiles/default
managed: true
status: Created
locations:
- none

amcduffee · August 11, 2021, 9:07pm

I posted my lxdbr0 configuration above.

A few other notes:

The highest lifetime I have seen on the fd42:c8f3:56ae:8db::1/64 addresses is 3600sec or 1hr. This is true both for the interface in the container as well as the lxdbr0 interface on the host. The lifetime counts down and expires on both the host and container interfaces in the same way.
dnsmasq does send an RA every 8-10 minutes, so it is often enough that the 3600sec lifetime above should not be an issue. However, it sends the RA using the fe80:: address on the host lxdbr0 interface. Is it possible this causes the container to ignore it or not apply it due to the RA not coming from the fd42:c8f3:56ae:8db::1/64 network?
I do not see this issue on our production cluster where the containers are connected to a pre-configured br0 bridge to our office network. The production cluster does not use lxdbr0 and therefore there is no dnsmasq involved.

tomp · August 11, 2021, 9:11pm

Does it improve if you run lxc network set lxdbr0 ipv6.dhcp.stateful=true?

You may need to restart your containers afterwards so they take an address via DHCPv6.

amcduffee · August 11, 2021, 9:24pm

No change. I set that option, restarted the container and waited for the next RA on tcpdump. After the next RA the lifetime on eth0, in the container, was still dropping.