SR-IOV vs. macvlan: Isolation and Performance


(Sean McNamara) #1

Using the latest lxd snap on Ubuntu 18.04 (patched and current everything). My networking hardware is an ixgbe 10 Gbps Intel X450-AT2.

I have containers using macvlan, and I have containers using SR-IOV for ethernet. Both work great (except, the first time a container using SR-IOV starts on each boot, there’s often a long (30-60s) “dead zone” of packets across the NIC while the virtual functions get created by the driver.)

Performance seems similar on the surface. Can anyone explain how these two approaches to LXD networking are likely to differ, both in terms of performance and network isolation?

Are SR-IOV guest NICs more isolated / more secure than macvlan?

(Stéphane Graber) #2

They’re similar in many ways though SR-IOV being implemented in hardware, I would expect slightly better performance, mostly visible as lower CPU usage while in full use.

Bonding with SR-IOV is generally not a thing which is a big issue for many.
Switches do not understand multiple LACP sessions on the same set of ports so you can’t pass two SR-IOV devices to a container and have it bond them, nor can you have the host itself run a bond on the physical functions when you have containers attached to them with SR-IOV (nothing will detect this situation, it will just fail in weird ways).

The other main limitation is the number of VFs supported by the hardware, this varies depending on generations, usually ranging from 8 to 32 or so.

(Sean McNamara) #3

Thanks for the info, Stephane. Great stuff!

Impressively, this high-end server chipset has 63 functions per NIC, and two separate NICs, for a total of 126 functions (albeit, two of those are mandatory/default anyway, so 124 functions in addition to the ones that come “out of the box” without SR-IOV). One of the NICs has no ethernet cable attached to it, but I don’t think I’ll need 63 functions because I barely have that many IPs assigned to my server. I’ll probably convert my containers over to SR-IOV for the probable performance benefit.

Do you know if there is a way to enforce a specific MAC to be used on a VF, to prevent root in the container from setting their own MAC? Ideally it would just drop any packets at the physical NIC driver if they are marked with the wrong MAC. Currently the only way my hosting provider’s switch knows which guest it’s talking to is via MAC address. This would close a hole that’s been in my LXD networking strategy for a long time.

I’m currently using lxc config to set the (initial) MAC on the guest’s VF but I assume it’s just as easy to change the MAC guest-side with SR-IOV as it is with macvlan.

(Stéphane Graber) #4

I don’t believe there is a way to prevent changing the MAC, it’d be nice if there was some flag on the parent NIC to prevent MAC spoofing but I don’t believe this is a thing.

And indeed there’s little we can do about this at the LXD level as the traffic is entirely invisible from the host point of view (that’s the whole point). For bridged networking we can put ebtables rules in place to filter traffic coming from unexpected MAC addresses but I can’t think of a way to do that for sriov or macvlan.

(Sean McNamara) #5

There appear to be a couple hardware manufacturers and virtualization providers that make mention of “MAC spoofing” in the same articles as SR-IOV networking. Along those lines, do you know if there is something LXD could do to use this?

For example, here’s info about Mellanox adapters:

Here’s a Red Hat Bugzilla about it with libvirt:

And here’s the official Red Hat libvirt documentation relevant to SR-IOV:

This looks like someone having the opposite problem, with spoofing preventing their changing the MAC address in the guest (this is what I want to happen with LXD):

My Intel ixgbe X450-AT2 looks like it nominally supports MAC spoofing detection, because ip link show spews one of these lines for each VF attached to the NIC:

vf 2 MAC 86:3d:ec:90:80:bc, spoof checking on, link-state auto, trust off, query_rss off

So if the spoof checking is on but I can change the MAC in the container, that makes me suspect LXD needs to somehow tell the kernel which MAC it should be so it can know whether it’s being spoofed or not.


(Stéphane Graber) #6

I wonder if the spoof checking feature maybe only works when the PCI device of the VF is passed through VFIO to a VM, in which case the driver in the VM treats it as a PF and the card knows not to allow further MAC changes.

For containers, the logic is a bit different as the VF remains attached to the same host driver as the PF and so the kernel may just allow MAC changes as you are the owner of that VF and haven’t passed it to anyone through VFIO.