Slow VM Speed compared to host (0.8ms vs 30+ms)

riemers · December 16, 2024, 10:48pm

So what i’ve done so far is build a 2 node with some zfs storage, one node i’ve added a vlan which has a /29 assigned to it. I added a bridge on the host and i forwarded that bridge to the VM running in incus. So far so good, i am able to ping it, reach it from the internet and pretty much everything works but its ‘slow’

With slow i don’t mean 20kb/s on a gigabit i still get around 400mbps, but on the host i get 980. If i ping google or any other standard domain i get about 0.8ms on the host, but inside the vm it is 30ms. I can understand some form of loss, but these numbers are pretty big. I thought i had everything up and running and became pretty happy untill in noticed this. I do use hetzner servers which has a vswitch and uses a MTU size of 1400, the host nic doesn’t have this. This is the only thing that ‘might’ have an impact but to be honest i don’t play around that much in that area. (before i forgot the 1400 and i only got 20kb/s with connection resets, after i added this to the bridge i got ‘better’ speeds)

Any suggestions on what to look otherwise? I think i am close, but cannot seem to pin point it.

Btw, everything else i got up and running within a day so kudos for incus.

stgraber · December 17, 2024, 4:09am

What network performance do you get from a VM that’s connected to a regular Incus bridge rather than hetzner’s vswitch?

That would help confirm that Incus itself isn’t causing the slow down but that there’s something odd going on with the vswitch.

riemers · December 17, 2024, 7:26am

Fair point, I know I tried without but it was complaining about openvswitch not being there. Since I don’t know that much in that area I’ll read up about it and will try to see if there is a difference. Indeed it is always good to test multiple situation. Will let you know. Thanks for the quick reply.

stgraber · December 17, 2024, 8:21am

incus network create testbr0
incus launch images:debian/12 d12 --vm --network testbr0

And then test the network from within d12.

riemers · December 17, 2024, 8:57am

If i try that it says network not pending on any node, use target. When i use --target it adds it but then it is in state “pending” and then the status says Could not load network state: Network interface "testbr0" not found i did remove my host bridge from what i gathered its best to manage the bridge in incus that i understand from the docs now.

update: i had to create this on both nodes ofcourse before i can create it, but that was not clear to me. Once i added the br on the other node then issued the create command again it created the bridge. Testing now further

Ok so with a test vm on a bridge with 10.x segment it seems ok (just test only latency)

root@composed-orca:~# ping google.com
PING google.com (216.58.211.238) 56(84) bytes of data.
64 bytes from mad01s24-in-f14.1e100.net (216.58.211.238): icmp_seq=1 ttl=59 time=0.860 ms
64 bytes from mad07s20-in-f14.1e100.net (216.58.211.238): icmp_seq=2 ttl=59 time=1.08 ms
64 bytes from mad07s20-in-f14.1e100.net (216.58.211.238): icmp_seq=3 ttl=59 time=1.09 ms
64 bytes from mad01s24-in-f238.1e100.net (216.58.211.238): icmp_seq=4 ttl=59 time=1.10 ms

These times are excellent.

So how can i make a managed bridge from my interface? I have created enp41s0.4000 which is on vlan 4000 on the host. This one has the connection to my /29 subnet.

Trying with bridge.external_interfaces=enp41s0.4000 ;p

So i manage to get that up and running, since it was a cluster had to target then create. I tried on both nodes to create something. Weird part is that say xs4all.nl comes back with 1 ms pings, but google/facebook is 30+ ms if i do a speedtest from the datacenter the VM itself also does max output so throughput is good. Yet on the host itself all pings are low and speedtest always shows 30ms and on the host 1.x ms.

So then it sounds like its an issue with the vswitch from hetzner perhaps? configuration wise?

And last to confirm if i just make a bridge with 10.x range it goes full speed.

stgraber · December 17, 2024, 5:37pm

Definitely sounds like the traffic going through that vSwitch is somehow using a different path than what’s coming out directly of the host. So not an Incus issue but something odd going on with Hetzner’s setup.

riemers · December 17, 2024, 5:48pm

Yeah and now to find that needle in that pile of hetzner ;p i’ve got a support ticket open but i don’t know if they support these type of setups (or best effort)

riemers · December 17, 2024, 7:03pm

Also long stretch but related. The ip range i was given is being recorded in the geoip database as located in germany while it is in helsinki. So if i run speedtest-cli on the server it will show me a list of servers in germany and picks the fast one there which is already 25/30 ms high. If i run that same test on the host it knows it is in helsinki and i get a quick reply. Speeds are still a bit meh, but this can at least give me a reason why some domains have a high latency, its just simply sending me to the wrong geo located pod/hosting/server. Never knew this could have such an impact too. Still the test br does 110M down while in the vswitch (from hetzner) i only get 50M.

If i download a file directly it is still fast so long story short i think this geoip database really screws with certain things. I’ve contacted maxmind to update the geoip location of the subnet range.