Network issues - How to troubleshoot?

Wow, thanks! You guys are the best. Thanks @stgraber and thanks @tomp.

My ISP moved my VPS to another host system to make sure that it is not host system related. Now I will keep on using and testing. If it happens again I will try out the routed NIC type to see if it might be related to the managed bridge.

Until then I wish you both the best! :slight_smile:

So in the meantime it still happens when I work a lot on the server. So my newest theory is I have to increase net.core.netdev_max_backlog and txqueuelen as mentioned here https://linuxcontainers.org/lxd/docs/master/production-setup.

When running tcpdump on host’s eth0 I see a lot of packets being sent to the server with me just hitting the save button inside a program once… So I thought that the connection interruption comes from the queue being too small and me sending too much packets for it. Appears logical that my connection then gets dropped, but why does the server need 5 minutes or more to give Cloudflare back the connection to my ip address while others can still connect without problem? What do you think?

To make ip link set eth0 txqueuelen 10000 persistent in Ubuntu 20.04 with netplan and systemd I read you need a udev rule:

Make file:
/lib/udev/rules.d/60-persistent-txqueuelen.rules
Put in the file:
KERNEL=="eth[0,1]", RUN+="/sbin/ip link set %k txqueuelen 10000"
Run:
sudo udevadm trigger

Regarding the “Server Changes” before that, do I need to set them? How do I check which values the kernel is using right now?

Hey @tomp , I read here (might be old info) that netplan still might have bugs with ipv6 so I would like to directly setup systemd-networkd. Anything I should add to this config that might explain the expiring route as it was missing?

root@vm:/run/systemd/network# cat 10-netplan-eth0.network
[Match]
MACAddress=00:00:56:00:89:f9

[Network]
LinkLocalAddressing=ipv6
Address=1.1.1.4/32
Address=::/64
Gateway=fe80::1
DNS=1.1.1.2
DNS=1.1.1.3
DNS=::53:1
DNS=::53:2
Domains=invalid

[Route]
Destination=0.0.0.0/0
Gateway=1.1.1.1
GatewayOnlink=true

The address ::/64 looks odd to me, have you hidden it on purpose or is that the literal value?

I have hidden it on purpose.
I since learned trying to get rid of netplan is not that easy… Working besides it is what I will have to figure out…

Hi,

I thought netplan was easily removed:

nano /etc/default/grub

GRUB_CMDLINE_LINUX="netcfg/do_not_use_netplan=true"

sudo update-grub

apt install ifupdown

add the usual old family faves in /etc/network/interfaces

blow away /etc/netplan

that usually works for me, I loathe netplan and alway run ifupdown or ifupdown2 if I can.

Cheers!

Jon.

1 Like

Hey @bodleytunes, thanks for those steps. So I can’t only use networkd? Either netplan or ifupdown?

Since changing the netplan config from an ipv6 route to simply specifying gateway6 the route to lxdbr0 remains stable on the host.

cat > /etc/netplan/01-netcfg.yaml <<EOF
network:
  version: 2
  renderer: networkd
  ethernets:
    eth0:
      match:
        macaddress: $macaddress
      addresses:
        - $ipv4address/32
        - $ipv6address/128
      gateway6: fe80::1
      routes:
        - to: 0.0.0.0/0
          via: $ipv4gateway
          on-link: true
      nameservers:
        search: [ invalid ]
        addresses:
          - 1.1.1.1
          - 1.0.0.1
          - 2606:4700:4700::1111
          - 2606:4700:4700::1001
EOF

Is it normal for the containers to receive router advertisements and have expiring routes?

1234:1234:1234:8614::/64 dev eth0 proto ra metric 100 expires 3439sec pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
default via fe80::216:3eff:fe15:deb9 dev eth0 proto ra metric 100 expires 1639sec mtu 1500 pref medium

I got some wierdness in dmesg -H:

[Aug30 17:46] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=93.174.93.195 DST=111.111.111.170 LEN=57 TOS=0x00 PREC=0x00 TTL=248 ID=54321 PROTO=UDP SPT=58636 DPT=40876 LEN=37 
[  +5.894403] lxdbr0: port 7(vethfe0624ee) entered disabled state
[  +0.539836] device vethfe0624ee left promiscuous mode
[  +0.000098] lxdbr0: port 7(vethfe0624ee) entered disabled state
[  +0.830375] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=45.129.33.10 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=54375 PROTO=TCP SPT=41764 DPT=27046 WINDOW=1024 RES=0x00 SYN URGP=0 
[ +27.300507] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=94.102.49.159 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=900 PROTO=TCP SPT=52292 DPT=37723 WINDOW=1024 RES=0x00 SYN URGP=0 
[  +7.622822] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=93.174.89.20 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=61263 PROTO=TCP SPT=43347 DPT=823 WINDOW=1024 RES=0x00 SYN URGP=0 
[Aug30 17:47] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=45.129.33.60 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=15275 PROTO=TCP SPT=53172 DPT=36854 WINDOW=1024 RES=0x00 SYN URGP=0 
[ +17.561439] lxdbr0: port 7(veth35319660) entered blocking state
[  +0.000001] lxdbr0: port 7(veth35319660) entered disabled state
[  +0.000493] device veth35319660 entered promiscuous mode
[  +0.036503] lxdbr0: port 8(veth6bd7dc70) entered blocking state
[  +0.000003] lxdbr0: port 8(veth6bd7dc70) entered disabled state
[  +0.000672] device veth6bd7dc70 entered promiscuous mode
[  +0.000062] lxdbr0: port 8(veth6bd7dc70) entered blocking state
[  +0.000002] lxdbr0: port 8(veth6bd7dc70) entered forwarding state
[  +0.217573] audit: type=1400 audit(1598802460.791:662): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxd-s2c1_</var/snap/lxd/common/lxd>" pid=595814 comm="apparmor_parser"
[  +0.108423] eth0: renamed from vethdb504781
[  +0.013310] eth1: renamed from veth5e1b841b
[  +0.019589] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  +0.000160] lxdbr0: port 7(veth35319660) entered blocking state
[  +0.000003] lxdbr0: port 7(veth35319660) entered forwarding state
[  +1.676233] audit: type=1400 audit(1598802462.611:663): apparmor="STATUS" operation="profile_load" label="lxd-s2c1_</var/snap/lxd/common/lxd>//&:lxd-s2c1_<var-snap-lxd-common-lxd>:unconfined" name="nvidia_modprobe" pid=595980 comm="apparmor_parser"
[  +0.000017] audit: type=1400 audit(1598802462.611:664): apparmor="STATUS" operation="profile_load" label="lxd-s2c1_</var/snap/lxd/common/lxd>//&:lxd-s2c1_<var-snap-lxd-common-lxd>:unconfined" name="nvidia_modprobe//kmod" pid=595980 comm="apparmor_parser"
[  +0.058755] audit: type=1400 audit(1598802462.667:665): apparmor="STATUS" operation="profile_load" label="lxd-s2c1_</var/snap/lxd/common/lxd>//&:lxd-s2c1_<var-snap-lxd-common-lxd>:unconfined" name="lsb_release" pid=595979 comm="apparmor_parser"
[  +0.006956] audit: type=1400 audit(1598802462.675:666): apparmor="STATUS" operation="profile_load" label="lxd-s2c1_</var/snap/lxd/common/lxd>//&:lxd-s2c1_<var-snap-lxd-common-lxd>:unconfined" name="/usr/lib/snapd/snap-confine" pid=595981 comm="apparmor_parser"
[  +0.000011] audit: type=1400 audit(1598802462.675:667): apparmor="STATUS" operation="profile_load" label="lxd-s2c1_</var/snap/lxd/common/lxd>//&:lxd-s2c1_<var-snap-lxd-common-lxd>:unconfined" name="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=595981 comm="apparmor_parser"
[  +0.008783] audit: type=1400 audit(1598802462.683:668): apparmor="STATUS" operation="profile_load" label="lxd-s2c1_</var/snap/lxd/common/lxd>//&:lxd-s2c1_<var-snap-lxd-common-lxd>:unconfined" name="/usr/bin/man" pid=595978 comm="apparmor_parser"
[  +0.000006] audit: type=1400 audit(1598802462.683:669): apparmor="STATUS" operation="profile_load" label="lxd-s2c1_</var/snap/lxd/common/lxd>//&:lxd-s2c1_<var-snap-lxd-common-lxd>:unconfined" name="man_filter" pid=595978 comm="apparmor_parser"
[  +0.000002] audit: type=1400 audit(1598802462.683:670): apparmor="STATUS" operation="profile_load" label="lxd-s2c1_</var/snap/lxd/common/lxd>//&:lxd-s2c1_<var-snap-lxd-common-lxd>:unconfined" name="man_groff" pid=595978 comm="apparmor_parser"
[  +0.004065] audit: type=1400 audit(1598802462.687:671): apparmor="STATUS" operation="profile_load" label="lxd-s2c1_</var/snap/lxd/common/lxd>//&:lxd-s2c1_<var-snap-lxd-common-lxd>:unconfined" name="/usr/sbin/tcpdump" pid=595982 comm="apparmor_parser"
[  +4.561504] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=94.102.53.112 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=53564 PROTO=TCP SPT=59434 DPT=41968 WINDOW=1024 RES=0x00 SYN URGP=0 
[ +23.690990] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=94.102.49.159 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=44840 PROTO=TCP SPT=52292 DPT=38484 WINDOW=1024 RES=0x00 SYN URGP=0 
[Aug30 17:48] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=94.102.59.98 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=5997 PROTO=TCP SPT=50353 DPT=5690 WINDOW=1024 RES=0x00 SYN URGP=0 
[ +17.380563] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=68.96.20.177 DST=111.111.111.170 LEN=60 TOS=0x00 PREC=0x00 TTL=57 ID=34563 DF PROTO=TCP SPT=43784 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0 
[Aug30 17:49] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=91.236.116.38 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=247 ID=14419 PROTO=TCP SPT=41480 DPT=3388 WINDOW=1024 RES=0x00 SYN URGP=0 
[ +21.983854] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=62.234.178.25 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=242 ID=60921 PROTO=TCP SPT=50991 DPT=21271 WINDOW=1024 RES=0x00 SYN URGP=0 
[ +12.673545] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=87.251.74.18 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=246 ID=26010 PROTO=TCP SPT=54355 DPT=10025 WINDOW=1024 RES=0x00 SYN URGP=0 
[Aug30 17:50] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:40:56:89:28:99:3a:4d:30:af:08:00 SRC=165.3.91.27 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=53 ID=36179 PROTO=TCP SPT=61159 DPT=23 WINDOW=116 RES=0x00 SYN URGP=0 
[  +0.475815] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=191.232.211.54 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=242 ID=54321 PROTO=TCP SPT=60130 DPT=8080 WINDOW=65535 RES=0x00 SYN URGP=0 
[ +14.865665] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:40:56:89:28:99:3a:4d:30:af:08:00 SRC=185.176.27.178 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=246 ID=35387 PROTO=TCP SPT=62000 DPT=59515 WINDOW=1024 RES=0x00 SYN URGP=0 
[ +22.587530] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=213.217.1.36 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=21759 PROTO=TCP SPT=62000 DPT=25722 WINDOW=1024 RES=0x00 SYN URGP=0 
[Aug30 17:51] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=92.246.159.109 DST=111.111.111.170 LEN=44 TOS=0x00 PREC=0x00 TTL=49 ID=26666 PROTO=TCP SPT=19725 DPT=8080 WINDOW=24613 RES=0x00 SYN URGP=0 
[ +20.152102] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=45.129.33.13 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=55654 PROTO=TCP SPT=56372 DPT=7939 WINDOW=1024 RES=0x00 SYN URGP=0 
[ +20.869292] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=185.153.199.187 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=247 ID=56485 PROTO=TCP SPT=8080 DPT=1428 WINDOW=1024 RES=0x00 SYN URGP=0 
[Aug30 17:52] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:40:56:89:28:99:3a:4d:30:af:08:00 SRC=185.176.27.186 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=247 ID=53668 PROTO=TCP SPT=62000 DPT=46438 WINDOW=1024 RES=0x00 SYN URGP=0 
[ +10.847237] veth5e1b841b: renamed from eth1
[  +0.017105] lxdbr0: port 7(veth35319660) entered disabled state
[  +0.000296] lxdbr0: port 8(veth6bd7dc70) entered disabled state
[  +0.143257] device veth35319660 left promiscuous mode
[  +0.000120] lxdbr0: port 7(veth35319660) entered disabled state
[  +0.081489] device veth6bd7dc70 left promiscuous mode
[  +0.001227] lxdbr0: port 8(veth6bd7dc70) entered disabled state
[ +22.648392] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=45.129.33.6 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=46294 PROTO=TCP SPT=41084 DPT=11976 WINDOW=1024 RES=0x00 SYN URGP=0 
[Aug30 17:53] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=45.129.33.6 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=57508 PROTO=TCP SPT=41084 DPT=10976 WINDOW=1024 RES=0x00 SYN URGP=0 
[  +4.996664] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=45.129.33.6 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=62069 PROTO=TCP SPT=41084 DPT=11287 WINDOW=1024 RES=0x00 SYN URGP=0 
[  +8.248187] device vethff74d912 left promiscuous mode
[  +0.000123] lxdbr0: port 5(vethff74d912) entered disabled state
[  +0.140385] HTB: quantum of class 10010 is big. Consider r2q change.
[  +0.031202] lxdbr0: port 5(vethe976afea) entered blocking state
[  +0.000003] lxdbr0: port 5(vethe976afea) entered disabled state
[  +0.000311] device vethe976afea entered promiscuous mode
[  +0.000031] lxdbr0: port 5(vethe976afea) entered blocking state
[  +0.000002] lxdbr0: port 5(vethe976afea) entered forwarding state
[  +0.041472] eth0: renamed from vethb9d45f92
[ +11.604593] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:40:56:89:28:99:3a:4d:30:af:08:00 SRC=185.176.27.106 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=246 ID=19044 PROTO=TCP SPT=62000 DPT=34602 WINDOW=1024 RES=0x00 SYN URGP=0 
[Aug30 17:54] [UFW BLOCK] IN=eth0 OUT= MAC=00:50:56:41:57:99:28:99:3b:4d:23:91:08:00 SRC=80.82.65.74 DST=111.111.111.170 LEN=40 TOS=0x00 PREC=0x00 TTL=248 ID=54921 PROTO=TCP SPT=58855 DPT=10000 WINDOW=1024 RES=0x00 SYN URGP=0

Does this show a problem @tomp?

Was the outage happening at that time? Also, what was happening on the host to trigger those?

I am sorry, I am not sure what was happening on the host.
One container showed a downtime on an uptime checking service for those 5 minutes from 17:56 to 18:02. So again very selectively.

I since noticed that my offices ISP has a DNS that does not support IPv6. I since switched also my office DNS to Cloudflare and Googles DNS. Could this possibly have had anything to do with my selective outages?

When the container does not have an netplan config set up, is it normal for it to have expiring routes as it gets those RAs from lxdbr0?