Speed of default route out from an OVN network on Incus

I’ve been having some issues with the speed inside containers, particularly for “apt” doing updates. As the speed of access to the containers seemed to be good, I sort of assumed there was some bandwidth throttling going on.

Apparently not.

Inside a container with an OVN network “private” on the range 10.4.0.1/22, default outbound route 10.4.0.1, which disappears off into Incus / OVN and produces a zero hop route to my Internet router,I see 5Mbits / sec. Which explains why I thought it was slow.

After adding another interface (local bridge) and making it the default route, I get 800Mbits/sec.
I think I can kind of live with that, but I’m clearly doing something “wrong” somewhere … but as traceroute is of no help, not sure where to start for a sec …

Default setup, routing via the OVN gateway;

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.4.0.1        0.0.0.0         UG    1024   0        0 eth0
10.4.0.0        0.0.0.0         255.255.252.0   U     1024   0        0 eth0
10.4.0.1        0.0.0.0         255.255.255.255 UH    1024   0        0 eth0
10.4.0.14       0.0.0.0         255.255.255.255 UH    1024   0        0 eth0

# speedtest
Testing download speed................................................................................
Download: 11.82 Mbit/s
Testing upload speed......................................................................................................
Upload: 107.03 Mbit/s

If I now attach a local bridge to the instance and tweak the network scripts to ignore the static route coming from eth0, so it uses the new static route from the bridge, I get a very different result;

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.28.135.1     0.0.0.0         UG    1024   0        0 eth1
10.4.0.0        0.0.0.0         255.255.252.0   U     1024   0        0 eth0
10.4.0.14       0.0.0.0         255.255.255.255 UH    1024   0        0 eth0
10.28.135.0     0.0.0.0         255.255.255.0   U     1024   0        0 eth1
10.28.135.1     0.0.0.0         255.255.255.255 UH    1024   0        0 eth1
10.103.0.0      10.4.0.1        255.255.255.0   UG    0      0        0 eth0

# speedtest
Testing download speed................................................................................
Download: 440.44 Mbit/s
Testing upload speed......................................................................................................
Upload: 107.17 Mbit/s

So I’m getting a 40x performance increase (in one direction) over a local brige interface, rather than the local OVN gateway interface. My mind is boggling at where and how that amount of speed is being lost via the OVN gateway.

Note; “traceroute” in both instances is the same … (!)

Ok, well I can see the problem it’s the ******** MTU again. The network MTU is set to 1300, which seems to work, at least incoming traffic / traffic within the OVN network looks good. Outgoing traffic is slow and when I dump the back-end network segment during a speed test I see;

14:25:56.516748 IP 192.168.1.16 > 104.17.147.22: ICMP 192.168.1.16 unreachable - \
    need to frag (mtu 1300), length 542

But then if I manually change the MTU to 1298 inside the container, I’m up to full speed. If I then change the NETWORK MTU to be 1298, I’m back to frag errors … and if I then manually lower the MTU to 1296 it’s working again.

So the instance MTU seems to need to be 2 less then the network MTU (!)

This kind of explains many of the problems I’ve been seeing, I just don’t understand how this can be or how to manage it …

Host MTU’s are set at 1500.
OVN-IC is now over wireguard which is running with an MTU of 1420.

In this configuration, i.e. traffic routing out from an OVN network via a default route, it would appear Geneve is (somewhere) adding two bytes to the encapsulation header somewhere between the instance interface and the host interface, so the instance MTU needs to be two less than the Incus network MTU. I can’t prove this, however it the consistently observed effect.

Whereas you can solve this with cloud-init by inserting a lower MTU into the instance when it boots (or hard-coding it into the instance) and although this works fine, as it turns out a better solution, at least for me, is just not to use OVN.

It should be easy enough to prove:

ping -Mdo -s1272 x.x.x.x    # sends datagrams of size 1300
ping -Mdo -s1270 x.x.x.x    # sends datagrams of size 1298

-Mdo means “set the DF (don’t-fragment) bit”, and 28 bytes is the size of IP header + ICMP header.

In normal operation, PMTU discovery should mean you don’t have to worry about this, i.e. TCP should automatically adjust its segment size to avoid fragmentation. Are you blocking ICMP anywhere?

:slight_smile:

PMTU seems not to work in this scenario.
The maximum MTU I can get away with for the OVN network is 1360.

If I set the network MTU at 1360, I get no or very low traffic. (Geneve silently drops the traffic over a certain size, connectivity for the most part doesn’t work. This is because the container inherits the MTU from the network.

If I set the container MTU (manually) to 1358, everything works perfectly.
So, I lower the network MTU to 1358. Problem is back, so I set the container MTU to 1356.
… repeat …
Eventually the container MTU hits 1280 and Incus won’t let me go any lower (!)

Practical example using what I still have set up (it’s my IC link that’s stuffed, locally I can still operate) I have an OVN network called “private”, MTU set to 1300.

In a container called “demo”;

ifconfig eth0 mtu 1300

# ping -M do 1.1.1.1 -s 1272
PING 1.1.1.1 (1.1.1.1) 1272(1300) bytes of data.
1280 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=9.82 ms

# ping -M do 1.1.1.1 -s 1273
PING 1.1.1.1 (1.1.1.1) 1273(1301) bytes of data.
ping: local error: message too long, mtu=1300

Which makes you think maybe this MTU Ok. So I’ve previously done “apt install speedtest-cli” because seems to be a decent test for this issue …

# speedtest
Retrieving speedtest.net configuration...
Testing download speed................................................................................
Download: 9.55 Mbit/s
Testing upload speed......................................................................................................
Upload: 105.82 Mbit/s

Maybe this seems Ok at first sight, but let’s try lowering the container MTU by 2;

# ifconfig eth0 mtu 1298
# ping -M do 1.1.1.1 -s 1270
PING 1.1.1.1 (1.1.1.1) 1270(1298) bytes of data.
1278 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=9.91 ms

# ping -M do 1.1.1.1 -s 1271
PING 1.1.1.1 (1.1.1.1) 1271(1299) bytes of data.
ping: local error: message too long, mtu=1298

This is what I would expect (?), however if I try a speedtest again;

# speedtest
Retrieving speedtest.net configuration...
Testing download speed................................................................................
Download: 831.29 Mbit/s
Testing upload speed......................................................................................................
Upload: 108.64 Mbit/s

Now (!) initially this only showed up as really slow “apt update/upgrade” … I spent a significant amount of time investigating apparently slow mirrors before realizing it was “all” traffic. If you tcpdump the traffic the output is pretty horrible but the crux is that there are lots of fragmentation errors.

Either way, in this scenario is that the additional encapsulation is causing a problem that isn’t immediately obvious or fixable. (I’ve posted this before, no solutions) My solution was to use cloud-init to install a client-side MTU of 20 less than the network MTU, which was a 100% success, and I did this on the profile so very easy to setup and use … just a problem for OCI containers that don’t support cloud-init.

That is indeed weird. If you use tcpdump, do you see actual fragments arriving? This would imply it’s OVN that’s doing the fragmentation on incoming packets from speedtest.net. Maybe it can be configured to send ICMP error instead?

Where is the container picking up its initial MTU from? If it’s from incus, maybe it should be setting the MTU lower than the broken OVN “MTU”.

Ok, so just to optimise the focus a little, speedtest.net was an example, the problem affects all traffic passing from the OVN network through the network’s default gateway. It does not affect traffic within the OVN network or over the OVN-IC link.

Running again now I can see the frags are from “other” traffic, traffic from the speedtest looks like this (when it’s not working);

12:53:06.264161 IP xx.132.7.130.8080 > 192.168.1.16.40982: Flags [.], seq 329893:331141, ack 317, win 204, options [nop,nop,TS val 569559975 ecr 2121858768], length 1248: HTTP
12:53:06.264195 IP 192.168.1.16.40982 > xx.132.7.130.8080: Flags [.], ack 332389, win 779, options [nop,nop,TS val 2121858786 ecr 569559975,nop,nop,sack 1 {333637:334885}], length 0
12:53:06.264902 IP xx.132.7.130.8080 > 192.168.1.16.40970: Flags [.], seq 859045:860293, ack 317, win 204, options [nop,nop,TS val 569559974 ecr 2121858771], length 1248: HTTP
12:53:06.264963 IP 192.168.1.16.40970 > xx.132.7.130.8080: Flags [.], ack 861541, win 1445, options [nop,nop,TS val 2121858787 ecr 569559974,nop,nop,sack 2 {865285:866533}{862789:864037}], length 0
12:53:06.265232 IP xx.132.7.130.8080 > 192.168.1.16.40996: Flags [.], seq 427237:428485, ack 317, win 204, options [nop,nop,TS val 569559976 ecr 2121858767], length 1248: HTTP
12:53:06.265773 IP 192.168.1.16.40996 > xx.132.7.130.8080: Flags [.], ack 406021, win 909, options [nop,nop,TS val 2121858787 ecr 569559913,nop,nop,sack 3 {427237:428485}{440965:442213}{433477:434725}], length 0
12:53:06.265950 IP xx.132.7.130.8080 > 192.168.1.16.40956: Flags [.], seq 957637:958885, ack 317, win 204, options [nop,nop,TS val 569559975 ecr 2121858770], length 1248: HTTP
12:53:06.266461 IP xx.132.7.130.8080 > 192.168.1.16.38242: Flags [.], seq 518341:519589, ack 315, win 204, options [nop,nop,TS val 569559976 ecr 2121858757], length 1248: HTTP
12:53:06.266486 IP 192.168.1.16.38242 > xx.132.7.130.8080: Flags [.], ack 517093, win 501, options [nop,nop,TS val 2121858788 ecr 569559904,nop,nop,sack 1 {518341:523333}], length 0
12:53:06.266574 IP 192.168.1.16.40956 > xx.132.7.130.8080: Flags [.], ack 950149, win 1683, options [nop,nop,TS val 2121858788 ecr 569559963,nop,nop,sack 3 {957637:958885}{966373:967621}{963877:965125}], length 0

And when is it working;

12:55:02.676438 IP xx.132.7.130.8080 > 192.168.1.16.36666: Flags [.], seq 20077250:20078496, ack 317, win 204, options [nop,nop,TS val 569676379 ecr 2121975180], length 1246: HTTP
12:55:02.676439 IP xx.132.7.130.8080 > 192.168.1.16.36666: Flags [.], seq 20078496:20079742, ack 317, win 204, options [nop,nop,TS val 569676379 ecr 2121975180], length 1246: HTTP
12:55:02.676439 IP xx.132.7.130.8080 > 192.168.1.16.36666: Flags [.], seq 20079742:20080988, ack 317, win 204, options [nop,nop,TS val 569676379 ecr 2121975180], length 1246: HTTP
12:55:02.676440 IP xx.132.7.130.8080 > 192.168.1.16.36666: Flags [.], seq 20080988:20082234, ack 317, win 204, options [nop,nop,TS val 569676379 ecr 2121975180], length 1246: HTTP
12:55:02.676440 IP xx.132.7.130.8080 > 192.168.1.16.36664: Flags [.], seq 19988333:19989579, ack 0, win 204, options [nop,nop,TS val 569676377 ecr 2121975178], length 1246: HTTP
12:55:02.676441 IP xx.132.7.130.8080 > 192.168.1.16.36664: Flags [.], seq 19989579:19990825, ack 0, win 204, options [nop,nop,TS val 569676377 ecr 2121975178], length 1246: HTTP
12:55:02.676442 IP xx.132.7.130.8080 > 192.168.1.16.36664: Flags [.], seq 19990825:19992071, ack 0, win 204, options [nop,nop,TS val 569676377 ecr 2121975178], length 1246: HTTP
12:55:02.676442 IP xx.132.7.130.8080 > 192.168.1.16.36664: Flags [.], seq 19992071:19993317, ack 0, win 204, options [nop,nop,TS val 569676377 ecr 2121975178], length 1246: HTTP
12:55:02.676481 IP 192.168.1.16.36666 > xx.132.7.130.8080: Flags [.], ack 20079742, win 30702, options [nop,nop,TS val 2121975198 ecr 569676379], length 0
12:55:02.676482 IP 192.168.1.16.36664 > xx.132.7.130.8080: Flags [.], ack 19990825, win 30702, options [nop,nop,TS val 2121975198 ecr 569676377], length 0
12:55:02.676491 IP 192.168.1.16.36666 > xx.132.7.130.8080: Flags [.], ack 20082234, win 30711, options [nop,nop,TS val 2121975198 ecr 569676379], length 0

In this case it would seem 192.168.1.16 is the OVN network uplink address.

The container’s initial MTU defaults to the MTU of the OVN network, which was the problem for me as Incus apparently has no facility to provide the container with a different or lower MTU.

BUT, in the final analysis, if I don’t run OVN, I don’t have a problem - this issue is sort of moot.