Speed of default route out from an OVN network on Incus

I’ve been having some issues with the speed inside containers, particularly for “apt” doing updates. As the speed of access to the containers seemed to be good, I sort of assumed there was some bandwidth throttling going on.

Apparently not.

Inside a container with an OVN network “private” on the range 10.4.0.1/22, default outbound route 10.4.0.1, which disappears off into Incus / OVN and produces a zero hop route to my Internet router,I see 5Mbits / sec. Which explains why I thought it was slow.

After adding another interface (local bridge) and making it the default route, I get 800Mbits/sec.
I think I can kind of live with that, but I’m clearly doing something “wrong” somewhere … but as traceroute is of no help, not sure where to start for a sec …

Default setup, routing via the OVN gateway;

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.4.0.1        0.0.0.0         UG    1024   0        0 eth0
10.4.0.0        0.0.0.0         255.255.252.0   U     1024   0        0 eth0
10.4.0.1        0.0.0.0         255.255.255.255 UH    1024   0        0 eth0
10.4.0.14       0.0.0.0         255.255.255.255 UH    1024   0        0 eth0

# speedtest
Testing download speed................................................................................
Download: 11.82 Mbit/s
Testing upload speed......................................................................................................
Upload: 107.03 Mbit/s

If I now attach a local bridge to the instance and tweak the network scripts to ignore the static route coming from eth0, so it uses the new static route from the bridge, I get a very different result;

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.28.135.1     0.0.0.0         UG    1024   0        0 eth1
10.4.0.0        0.0.0.0         255.255.252.0   U     1024   0        0 eth0
10.4.0.14       0.0.0.0         255.255.255.255 UH    1024   0        0 eth0
10.28.135.0     0.0.0.0         255.255.255.0   U     1024   0        0 eth1
10.28.135.1     0.0.0.0         255.255.255.255 UH    1024   0        0 eth1
10.103.0.0      10.4.0.1        255.255.255.0   UG    0      0        0 eth0

# speedtest
Testing download speed................................................................................
Download: 440.44 Mbit/s
Testing upload speed......................................................................................................
Upload: 107.17 Mbit/s

So I’m getting a 40x performance increase (in one direction) over a local brige interface, rather than the local OVN gateway interface. My mind is boggling at where and how that amount of speed is being lost via the OVN gateway.

Note; “traceroute” in both instances is the same … (!)

Ok, well I can see the problem it’s the ******** MTU again. The network MTU is set to 1300, which seems to work, at least incoming traffic / traffic within the OVN network looks good. Outgoing traffic is slow and when I dump the back-end network segment during a speed test I see;

14:25:56.516748 IP 192.168.1.16 > 104.17.147.22: ICMP 192.168.1.16 unreachable - \
    need to frag (mtu 1300), length 542

But then if I manually change the MTU to 1298 inside the container, I’m up to full speed. If I then change the NETWORK MTU to be 1298, I’m back to frag errors … and if I then manually lower the MTU to 1296 it’s working again.

So the instance MTU seems to need to be 2 less then the network MTU (!)

This kind of explains many of the problems I’ve been seeing, I just don’t understand how this can be or how to manage it …

Host MTU’s are set at 1500.
OVN-IC is now over wireguard which is running with an MTU of 1420.