Network problem - suddenly no traffic between container(s) and host (as result no DHCP)

Once every X days I end with following scenario:

  1. notice there is no “internet” on container
  2. notice that this container has no IP (eth0 is not configured)
  3. notice that all other containers have same problem :slight_smile:
  4. manual kick-start dhcp-client not working (got timeout)
  5. dnsmasq is alive (and listening, no errors logged)

Is is hard for me to correlate this w/ other events such as putting laptop to sleep, turning WiFi/LAN on/of, etc.

I can be “fixed” by service lxd restart , but this is PITA cause all container go stop ;/

LXD (4.14) from snap on ubuntu 20.04

Facts and tests (facts applies to all container on this machine)
A) eth0 in container (A) is up. I am able to address the interface manually ( x.y.z.201/24 )
B) I can do the same on other container (B) ( for example x.y.z.202/24 )
C) lxdbr0 is UP and addressed ( x.y.z.1/24 )
D) I can ping A <-> B (
-> and tcpdump inside showing packets)
-> tcpdump on host (on -i lxdbr0 ) shows nothing)
E) I CAN NOT ping/tcp/udp A <-> HOST or B <-> HOST
F) I created another interface on host (TMP1), addresed x.y.z.222/24 and added to bridge (lxdbr0)
-> can ping HOST (and visible in tcpdump on host)
-> can not ping containers (A) or (B)
G) iptables has no DROP
H) ebtables is empty
I) there is no conflict in address space w/ other interfaces
J) Routing table is OK (both in host and containers)
K) ip_forwarding is =1

Sounds like a firewall issue.

Have you checked nftables?

sudo apt install nftables -y
sudo nft list ruleset

Added to (ip|eb|nf)tables collection :slight_smile: Here is output

 table bridge filter {
	chain INPUT {
		type filter hook input priority filter; policy accept;
	}

	chain FORWARD {
		type filter hook forward priority filter; policy accept;
	}

	chain OUTPUT {
		type filter hook output priority filter; policy accept;
	}
 }

and I did check if there is any DROP by iptabls. (none found)

Next time it occurs, please provide output of ip a and ip r on the host and inside two of the affected containers.

Got this situation right now (since yesterday :stuck_out_tongue: ).
Keeping in that state cause want to permanently fix it :slight_smile:
For now I only keep 2 containers alive (rest is stopped)

HOST
enx00e04c680556 is my ethernet connection
wlp59s0 is Wifi ( state: down)

28:0?>ip r
default via 192.168.0.1 dev enx00e04c680556 proto dhcp metric 100 
192.168.0.0/24 dev enx00e04c680556 proto kernel scope link src 192.168.0.185 metric 100 
192.168.250.0/24 dev lxdbr0 proto kernel scope link src 192.168.250.1 

28:0?>ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: wlp59s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether dc:fb:48:5f:38:22 brd ff:ff:ff:ff:ff:ff
17: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:35:ad:d5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.250.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::6af9:a797:2635:be4f/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
30: vethbc77751d@if29: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lxdbr0 state UP group default qlen 1000
    link/ether 66:a2:df:2b:91:91 brd ff:ff:ff:ff:ff:ff link-netnsid 2
35: vethba5a46e8@if34: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master lxdbr0 state UP group default qlen 1000
    link/ether 36:16:68:4c:ee:f1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
36: enx00e04c680556: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:e0:4c:68:05:56 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.185/24 brd 192.168.0.255 scope global dynamic noprefixroute enx00e04c680556
       valid_lft 67882sec preferred_lft 67882sec
    inet6 2a02:a31a:4240:f100:8e65:ec66:fafe:af3c/64 scope global dynamic noprefixroute 
       valid_lft 935630sec preferred_lft 330830sec
    inet6 fe80::c7ff:7705:c652:916d/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

# brctl show
bridge name	bridge id		STP enabled	interfaces
lxdbr0		8000.00163e35add5	yes		vethba5a46e8
							vethbc77751d

CONTAINER A

ubuntu@pwn1:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
29: eth0@if30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:46:be:c2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.250.130/24 brd 192.168.250.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::216:3eff:fe46:bec2/64 scope link 
       valid_lft forever preferred_lft forever

ubuntu@pwn1:~$ ip r s
default via 192.168.250.1 dev eth0 
192.168.250.0/24 dev eth0 proto kernel scope link src 192.168.250.130

Container B: same (except address is …250.120 )

And those addresses in the containers are added manually by yourself or via DHCP?

I’m trying to understand the problem clearly as earlier you mentioned there were no IPs, and then later you mentioned that you can ping between containers (the two scenarios being mutually exclusive).

Manually of course (to be able to try to send packets/ping/etc).
Here is a fresh container:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
37: eth0@if38: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:15:50:c6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::216:3eff:fe15:50c6/64 scope link 
       valid_lft forever preferred_lft forever

And ip r s is empty

I see thanks, it wasn’t clear to me.

Can you try running sudo tcpdump -i lxdbr0 -nn on the host and then try pinging the lxdbr0 IP from one of the containers that has the manually added IPs, and see what you get on the host-side from tcpdump?

Please can I see the output of sudo iptables-save as well.

FROM HOST ( ping 192.168.250.1 running inside both containers )

15:32:19.673832 ARP, Request who-has 192.168.250.1 tell 192.168.250.120, length 28
15:32:20.692403 ARP, Request who-has 192.168.250.1 tell 192.168.250.130, length 28

( a lot of ARP request , no response ! )

Same visible inside container(s) on eth0

I tried to add static ARP entry inside container :
> arp -s 192.168.250.1 00:16:3e:35:ad:d5 # this is MAC address of lxdbr0

After that there are ICMP requests (no response) in tcpdump (on host and inside container):

13:36:43.130122 IP 192.168.250.120 > 192.168.250.1: ICMP echo request, id 19546, seq 30, length 64
13:36:44.154127 IP 192.168.250.120 > 192.168.250.1: ICMP echo request, id 19546, seq 31, length 64

And here is IPTABLES:

iptables-save 
# Generated by iptables-save v1.8.4 on Thu May 20 15:40:21 2021
*raw
:PREROUTING ACCEPT [714102:674799576]
:OUTPUT ACCEPT [249872:21154928]
COMMIT
# Completed on Thu May 20 15:40:21 2021
# Generated by iptables-save v1.8.4 on Thu May 20 15:40:21 2021
*mangle
:PREROUTING ACCEPT [4593:2510428]
:INPUT ACCEPT [4593:2510428]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [4391:460657]
:POSTROUTING ACCEPT [4410:462492]
:LIBVIRT_PRT - [0:0]
-A POSTROUTING -j LIBVIRT_PRT
-A POSTROUTING -o lxdbr0 -p udp -m udp --dport 68 -m comment --comment "generated for LXD network lxdbr0" -j CHECKSUM --checksum-fill
COMMIT
# Completed on Thu May 20 15:40:21 2021
# Generated by iptables-save v1.8.4 on Thu May 20 15:40:21 2021
*nat
:PREROUTING ACCEPT [179:80974]
:INPUT ACCEPT [179:80974]
:OUTPUT ACCEPT [568:47177]
:POSTROUTING ACCEPT [546:42982]
:LIBVIRT_PRT - [0:0]
-A POSTROUTING -j LIBVIRT_PRT
-A POSTROUTING -s 192.168.250.0/24 ! -d 192.168.250.0/24 -m comment --comment "generated for LXD network lxdbr0" -j MASQUERADE
COMMIT
# Completed on Thu May 20 15:40:21 2021
# Generated by iptables-save v1.8.4 on Thu May 20 15:40:21 2021
*filter
:INPUT ACCEPT [4593:2510428]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [4391:460657]
:LIBVIRT_FWI - [0:0]
:LIBVIRT_FWO - [0:0]
:LIBVIRT_FWX - [0:0]
:LIBVIRT_INP - [0:0]
:LIBVIRT_OUT - [0:0]
-A INPUT -j LIBVIRT_INP
-A INPUT -i lxdbr0 -p tcp -m tcp --dport 53 -m comment --comment "generated for LXD network lxdbr0" -j ACCEPT
-A INPUT -i lxdbr0 -p udp -m udp --dport 53 -m comment --comment "generated for LXD network lxdbr0" -j ACCEPT
-A INPUT -i lxdbr0 -p udp -m udp --dport 67 -m comment --comment "generated for LXD network lxdbr0" -j ACCEPT
-A FORWARD -j LIBVIRT_FWX
-A FORWARD -j LIBVIRT_FWI
-A FORWARD -j LIBVIRT_FWO
-A FORWARD -o lxdbr0 -m comment --comment "generated for LXD network lxdbr0" -j ACCEPT
-A FORWARD -i lxdbr0 -m comment --comment "generated for LXD network lxdbr0" -j ACCEPT
-A OUTPUT -j LIBVIRT_OUT
-A OUTPUT -o lxdbr0 -p tcp -m tcp --sport 53 -m comment --comment "generated for LXD network lxdbr0" -j ACCEPT
-A OUTPUT -o lxdbr0 -p udp -m udp --sport 53 -m comment --comment "generated for LXD network lxdbr0" -j ACCEPT
-A OUTPUT -o lxdbr0 -p udp -m udp --sport 67 -m comment --comment "generated for LXD network lxdbr0" -j ACCEPT
COMMIT
# Completed on Thu May 20 15:40:21 2021

Are you running lldpd on your host by any chance? We saw something similar in this thread recently.

No lldpd here (or any similar stuff).
From network perspective it is standard vanilla ubuntu 20.04

Can I get a login to the box?

(I’v sent info in PM)

Ah sorry I missed you, was working on something else.

I’m was thinking more about a remote console (like SSH or some people use TeamViewer).

Can be teamviewer ! just PM me if you will be online

Still unresolved. Any other ideas how to investigate / fix ? (except lxd restart … )

I just spotted your problem, your external Ethernet device’s subnet is too large and is overlapping with your lxdbr0 subnet.

Hrm, actually it isn’t, sorry, but I’d be interested to know if you changed your lxdbr0 subnet if the problem went away.

I’ve seen issues in the past where the routing table causes response packets from lxdbr0 to be sent out of a different interface.

Can you show output of:

 bridge link show

I is magically (as always - after lxd service restart, forced by OS ) back to normal :frowning: :frowning:
Waiting for this to happen again (and it will)

(posting output of this command in ‘working’ state anyway - but AFAIR it was the same)

# bridge link show
46: veth7d9131f5@if45: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master lxdbr0 state forwarding priority 32 cost 2 
48: vethe9f19c50@if47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master lxdbr0 state forwarding priority 32 cost 2