INCUS, OVN and Performance expectations

oddjobz · June 3, 2025, 12:14am

Hi, as my network is now starting to look moderately stable, I’ve been looking at the network performance, in particular the uplink between my site and the cloud edge. I have some numbers, but I’ve no frame of references as to whether they’re Ok, or could be better.

What I have

Instance => Incus OVN => IC => (tinc VPN trunk over 100M link) => IC => Incus OVN => Instance
Ping latency on the uplink is ~ 25ms.

What I get …

Node to node with iperf3 is;

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   112 MBytes  93.6 Mbits/sec   31             sender
[  5]   0.00-10.03  sec   108 MBytes  90.7 Mbits/sec                  receiver

When I run it instance to instance, i.e. over the SDN;

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  62.6 MBytes  52.5 Mbits/sec   15             sender
[  5]   0.00-10.03  sec  62.2 MBytes  52.0 Mbits/sec                  receiver

Does this look to be good / bad / ugly … ?
(can I do better, or is this just expected overhead?)

Confusingly, if I run it node-to-node over the VPN link, whereas I would expect figures between these two, it comes in ~ 20% slower than the instance to instance speed (!)

stgraber · June 3, 2025, 6:27am

Is that from a container or VM? If VM, how many CPUs does it have assigned?

oddjobz · June 3, 2025, 10:23am

Hi, I’m only using containers … if I run locally via the VPN link I can get up to 250Mbits/sec so I’m thinking it’s not a VPN/CPU issue … although I’m about to switch to AES to see if I can get a hardware boost. One end has access to 4 CPU’s, the other 2. Both ends are ARM64.

stgraber · June 3, 2025, 2:35pm

root@ic-test:~# iperf3 -c 10.170.69.2
Connecting to host 10.170.69.2, port 5201
[  5] local 10.47.238.2 port 43252 connected to 10.170.69.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   107 MBytes   895 Mbits/sec  521   3.01 MBytes       
[  5]   1.00-2.00   sec   107 MBytes   900 Mbits/sec  477   2.48 MBytes       
[  5]   2.00-3.00   sec  84.1 MBytes   706 Mbits/sec  245   2.45 MBytes       
[  5]   3.00-4.00   sec   107 MBytes   901 Mbits/sec   98   2.56 MBytes       
[  5]   4.00-5.00   sec   106 MBytes   893 Mbits/sec  479   2.41 MBytes       
[  5]   5.00-6.00   sec   116 MBytes   977 Mbits/sec    0   2.36 MBytes       
[  5]   6.00-7.00   sec   109 MBytes   911 Mbits/sec  482   2.74 MBytes       
[  5]   7.00-8.00   sec  97.8 MBytes   820 Mbits/sec  1335   2.78 MBytes       
[  5]   8.00-9.00   sec  95.9 MBytes   804 Mbits/sec  952   2.73 MBytes       
[  5]   9.00-10.00  sec   115 MBytes   967 Mbits/sec    0   2.35 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.05 GBytes   905 Mbits/sec  4589             sender
[  5]   0.00-10.01  sec  1.05 GBytes   905 Mbits/sec                  receiver

iperf Done.
root@ic-test:~# iperf3 -c 10.170.69.2 -P4
Connecting to host 10.170.69.2, port 5201
[  5] local 10.47.238.2 port 52524 connected to 10.170.69.2 port 5201
[  7] local 10.47.238.2 port 52530 connected to 10.170.69.2 port 5201
[  9] local 10.47.238.2 port 52546 connected to 10.170.69.2 port 5201
[ 11] local 10.47.238.2 port 52550 connected to 10.170.69.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  67.1 MBytes   563 Mbits/sec  1170   1.62 MBytes       
[  7]   0.00-1.00   sec  52.2 MBytes   438 Mbits/sec  1211   1.86 MBytes       
[  9]   0.00-1.00   sec  81.1 MBytes   680 Mbits/sec  1280   1.72 MBytes       
[ 11]   0.00-1.00   sec  52.1 MBytes   437 Mbits/sec  783    988 KBytes       
[SUM]   0.00-1.00   sec   253 MBytes  2.12 Gbits/sec  4444             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  74.9 MBytes   628 Mbits/sec  1430   1.62 MBytes       
[  7]   1.00-2.00   sec  65.4 MBytes   548 Mbits/sec  2825   1.17 MBytes       
[  9]   1.00-2.00   sec  80.1 MBytes   672 Mbits/sec  1899   1.61 MBytes       
[ 11]   1.00-2.00   sec  48.0 MBytes   403 Mbits/sec  2064   1.04 MBytes       
[SUM]   1.00-2.00   sec   268 MBytes  2.25 Gbits/sec  8218             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  74.2 MBytes   623 Mbits/sec  3764   1.43 MBytes       
[  7]   2.00-3.00   sec  59.5 MBytes   499 Mbits/sec  2950   1.54 MBytes       
[  9]   2.00-3.00   sec  75.9 MBytes   637 Mbits/sec  3630   1.42 MBytes       
[ 11]   2.00-3.00   sec  52.2 MBytes   438 Mbits/sec  2602   1.13 MBytes       
[SUM]   2.00-3.00   sec   262 MBytes  2.20 Gbits/sec  12946             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  78.2 MBytes   656 Mbits/sec  3323   1.63 MBytes       
[  7]   3.00-4.00   sec  50.2 MBytes   421 Mbits/sec  1422   1.07 MBytes       
[  9]   3.00-4.00   sec  77.5 MBytes   650 Mbits/sec  3112   1.68 MBytes       
[ 11]   3.00-4.00   sec  51.9 MBytes   435 Mbits/sec  1598   1.44 MBytes       
[SUM]   3.00-4.00   sec   258 MBytes  2.16 Gbits/sec  9455             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  70.6 MBytes   593 Mbits/sec  2834    861 KBytes       
[  7]   4.00-5.00   sec  27.1 MBytes   228 Mbits/sec  2411   1.70 MBytes       
[  9]   4.00-5.00   sec  83.5 MBytes   701 Mbits/sec  3072   1.53 MBytes       
[ 11]   4.00-5.00   sec  44.0 MBytes   369 Mbits/sec  2116   2.51 MBytes       
[SUM]   4.00-5.00   sec   225 MBytes  1.89 Gbits/sec  10433             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  72.2 MBytes   606 Mbits/sec  2228   1.69 MBytes       
[  7]   5.00-6.00   sec  51.9 MBytes   435 Mbits/sec  881   1.31 MBytes       
[  9]   5.00-6.00   sec  85.5 MBytes   717 Mbits/sec  2846   1.73 MBytes       
[ 11]   5.00-6.00   sec  71.6 MBytes   601 Mbits/sec  1619   1.45 MBytes       
[SUM]   5.00-6.00   sec   281 MBytes  2.36 Gbits/sec  7574             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  78.0 MBytes   654 Mbits/sec  1644   1.38 MBytes       
[  7]   6.00-7.00   sec  77.8 MBytes   652 Mbits/sec  785   2.31 MBytes       
[  9]   6.00-7.00   sec  74.5 MBytes   625 Mbits/sec  1716   1.66 MBytes       
[ 11]   6.00-7.00   sec  73.9 MBytes   620 Mbits/sec  921   1.70 MBytes       
[SUM]   6.00-7.00   sec   304 MBytes  2.55 Gbits/sec  5066             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  72.9 MBytes   611 Mbits/sec  1953   1.16 MBytes       
[  7]   7.00-8.00   sec  75.2 MBytes   631 Mbits/sec  396   1.82 MBytes       
[  9]   7.00-8.00   sec  82.0 MBytes   688 Mbits/sec  2008   1.35 MBytes       
[ 11]   7.00-8.00   sec  75.2 MBytes   631 Mbits/sec  345   1.92 MBytes       
[SUM]   7.00-8.00   sec   305 MBytes  2.56 Gbits/sec  4702             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  77.9 MBytes   653 Mbits/sec  1158   1.52 MBytes       
[  7]   8.00-9.00   sec  77.8 MBytes   653 Mbits/sec  960   1.48 MBytes       
[  9]   8.00-9.00   sec  79.2 MBytes   665 Mbits/sec  1329   1.45 MBytes       
[ 11]   8.00-9.00   sec  78.5 MBytes   659 Mbits/sec  717   1.68 MBytes       
[SUM]   8.00-9.00   sec   313 MBytes  2.63 Gbits/sec  4164             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  77.8 MBytes   651 Mbits/sec  2316   1.50 MBytes       
[  7]   9.00-10.00  sec  86.8 MBytes   727 Mbits/sec  1221   3.07 MBytes       
[  9]   9.00-10.00  sec  75.0 MBytes   628 Mbits/sec  2187   1.41 MBytes       
[ 11]   9.00-10.00  sec  61.9 MBytes   518 Mbits/sec  826   10.2 KBytes       
[SUM]   9.00-10.00  sec   301 MBytes  2.52 Gbits/sec  6550             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   744 MBytes   624 Mbits/sec  21820             sender
[  5]   0.00-10.01  sec   742 MBytes   621 Mbits/sec                  receiver
[  7]   0.00-10.00  sec   624 MBytes   523 Mbits/sec  15062             sender
[  7]   0.00-10.01  sec   622 MBytes   521 Mbits/sec                  receiver
[  9]   0.00-10.00  sec   794 MBytes   666 Mbits/sec  23079             sender
[  9]   0.00-10.01  sec   792 MBytes   664 Mbits/sec                  receiver
[ 11]   0.00-10.00  sec   612 MBytes   514 Mbits/sec  13591             sender
[ 11]   0.00-10.01  sec   609 MBytes   510 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  2.71 GBytes  2.33 Gbits/sec  73552             sender
[SUM]   0.00-10.01  sec  2.70 GBytes  2.32 Gbits/sec                  receiver

iperf Done.
root@ic-test:~#

oddjobz · June 3, 2025, 3:14pm

Ok, as I don’t know how your network is configured, I’m not entirely sure how to interpret those results … on the one hand you seem to be limited to 1G per channel, but on the other you’re getting 2.5G over 4 channels …

I’ve tried mine with -P4, I do get more throughput, indeed much closer to the speed I get with no encapsulation or VPN in the way;

[SUM]   0.00-10.00  sec   103 MBytes  86.7 Mbits/sec  191             sender
[SUM]   0.00-10.03  sec   102 MBytes  85.0 Mbits/sec                  receiver

But again, I’m unsure as to “why”, either way they’re all going over a single VPN process / uplink …

stgraber · June 3, 2025, 3:48pm

In my case, it’s two clusters with a 3Gbps residential internet connection between the two (wireguard used between both sites). It’s not unusual for a single stream over my internet connection to cap at around gigabit speed, multiple streams usually gets me past this and gets close to the real internet speed.

oddjobz · June 3, 2025, 3:55pm

Mmm, on the one hand tinc works very well as a mesh and is easy to deploy, but looking at the way it’s working under load I’m beginning to wonder whether it’s the right solution. I’ve noticed that when iperf3 runs it loads “tincd” on all three cluster nodes, despite traffic only flowing through the one node where the process is running. Not sure whether this is a function of the mesh or whether I’ve something set wrong.

It’s just occurred to me that when I said the VPN is running on the node at each end, the far end node is a cloud server instance… Many thanks for those numbers, I’ll do a little more digging. What I have can obviously be improved

oddjobz · June 3, 2025, 4:51pm

Mmm … I’m confusing myself a little here. I reverted to testing the VPN at node level and got worse performance, until I dropped the MTU to 1300, at which point I got twice the throughput. As I increase the MTU, the performance gets progressively worse.

osch · June 4, 2025, 2:07am

Right, setup the correct MTU value is important for to get the best performance.

I faced a similar issue during I configured my Wireguard VPN. During my research I came across the following gist Wireguard Optimal MTU which contains quite some details in how to find out the correct value.

Hope it is from any help

oddjobz · June 4, 2025, 9:52am

Hi, many thanks. I think I’m going to try switching to WG, seems like the next thing to try. I had thought I’d got the MTU right for Tinc, but something still doesn’t seem to sit right. I’ll try that git and see what happens … either way it’ll be interesting to see if my provisioner can cope with the switch without completely borking the running cluster …

oddjobz · June 7, 2025, 4:36am

Ok, I’m now all set up with Wireguard. Inside the instance I’m seeing;

[  5]   0.00-10.00  sec   100 MBytes  84.3 Mbits/sec   50             sender
[  5]   0.00-10.03  sec  98.4 MBytes  82.3 Mbits/sec                  receiver

vs

[  5]   0.00-10.00  sec   106 MBytes  88.9 Mbits/sec   25             sender
[  5]   0.00-10.03  sec   105 MBytes  87.8 Mbits/sec                  receiver

On the host.

Which seems a lot better, moreover the CPU usage has dropped from “very significant” to “undetectable”. Altogether, it seems like a no-brainer choosing wg over tinc … although I found I had to manually mesh all the nodes to make geneve happy.

Many thanks for the help … I almost seem to have run out of things to fix …