Packet forwarding through LXD container - packet drops on ingress veth/vtap interface

lxd

(Rob) #1

Hi I have been testing a scenario in which an LXD container is setup to simply route/forward IP packets from one IP interface out of another. We would like the LXD container to logically act as a virtual router with goal of highest throughput possible. Packet per second throughput on the LXD container has been pretty bad (~40k pps) due to packets drops showing on the vtap/veth interface connecting an OVS bridge up to the LXD container. It seems as if network packet processing for the LXD container is getting restricted somehow. I have dug through cgroups config and tried applying the ‘limits.ingress: 10Gbit’ to my LXD profile, but does not seem to help. softirq processing by LXD host seems to remain around equivalent of 1 core (aggregate - spread across mulitiple cores) utilization and due to drops under ‘/proc/net/softnet_stat’ seems the processing queue is not getting serviced quickly enough. CPU utilization inside LXD container is basically null, LXD host is moderate on the cores doing the softirq processing. Is there some other inherent network I/O restriction placed on LXD container? or possibly by cgroups?

Topology:

      iperf3 |------------|   (ovs-br1)----(LXD-container)----(ovs-br2)  |-------------| iperf3
      traffic1                                       LXD server UUT                                      traffic2
             1.1.1.2                          1.1.1.1           2.2.2.1                          2.2.2.2

Secondary question: where would I see ‘limits.egress’ LXC config show up within the cgroups hierarchy?

We are ultimately using Openstack “VLAN provider” model to set this up which basically bridges traffic into the LXD host and then wires the LXD container up via vtap interfaces. I see same behavior on both Ubuntu 16.04 and 18.04 as LXD host and both ‘Version: 2.0.11-0ubuntu1~16.04.4’ and ‘Version: 3.0.1-0ubuntu1~18.04.1’ under more simplified scenario as shown above. Below is the basic profile being used.

root@compute3:/sys/fs/cgroup/cpu/lxc/c3# lxc profile show dual
config: {}
description: Default LXD profile
devices:
  eth0:
    limits.egress: 10Gbit
    limits.ingress: 10Gbit
    nictype: bridged
    parent: ovs-br1
    type: nic
  eth1:
    limits.egress: 10Gbit
    limits.ingress: 10Gbit
    nictype: bridged
    parent: ovs-br2
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: dual
used_by:
- /1.0/containers/c1

Any help or suggestions would be greatly appreciated!


#2

(I added some markup to your post to make it easier to read)

There are some changes that you do on the host to achieve better throughput,


Can you try them and see how they help?


(Brian Mullan) #3

Also, remember your LXD container is still using your Host Kernel.

What is your throughput just using the Host & not the container.


(Jon Clayton) #4

Also what about performance with Linux Bridge versus OvS ?


(Rob) #5

Thanks for the link Simos, “net.core.netdev_max_backlog = 182757” did the trick for correcting the packet drops on the veth interfaces!!

-Rob