Incorrect TCP checksums

Hi there,

I’m running lxd 3.0.1 installed via packages on Ubuntu 18.04

$ lxc info
[...]
driver_version: 3.0.1
kernel: Linux
kernel_architecture: x86_64
kernel_version: 4.15.0-32-generic

I’d to couple a FreeBSD application running in VirtualBox with LXD containers. For example:

 |-------------|   |--------|   |-----------------|   |-------------------|
 | FreeBSD em0 |---| lxdbr2 |---| vethO54XQB@if13 |---| eth0 Ubuntu 18.04 |
 | 172.27.35.1 |   | no IP  |   |                 |   | 172.27.35.10      |
 |-------------|   |--------|   |-----------------|   |-------------------|

The VirtualBox network interface is configured as “Bridged Adapter” using lxdbr2.

$ lxc network show lxdbr2
config:
   dns.domain: my.lxd
   ipv4.address: none
   ipv4.firewall: "no"
   ipv4.nat: "false"
   ipv6.address: none
   ipv6.firewall: "no"
   ipv6.nat: "false"
   description: Test network
name: lxdbr2
type: bridge
used_by:
- /1.0/containers/ubuntu
managed: true
status: Created
locations:
- none

While overall communication seems to work, it is impossible to create TCP connection between the LXD containers and the VirtualBox appliance. I’ve tracked this down to incorrect TCP checksums.

root@freebsd# tcpdump -i em0 -n port 22 -vv -XX
[...]
12:25:41.286051 IP (tos 0x0, tl 64, id 12400, offset 0, flags [DF], proto 6), length 60)
  172.27.35.10.51740 > 172.27.35.1.22: Flags [S], cksum 0x9e70 (incorrect 0xc715), seq 3047848567, win 29200, options [mss 1460,sackOK,TS val 56867747 0,nop,wscale 7], length 0
    0x0000:  0800 2766 8716 0016 3e45 28b9 0800 4500  ..'f....>E(...E.
    0x0010:  003c 3070 4000 4006 6c0a ac1b 230a ac1b  .<0p@.@.l...#...
    0x0020:  2301 ca1c 0016 b5aa 7a77 0000 0000 a002  #.......zw......
    0x0030:  7210 9e70 0000 0204 05b4 0402 080a 21e5  r..p..........!.
    0x0040:  545e 0000 0000 0103 0307                 T^........

A workaround has been suggested here:
https://patchwork.ozlabs.org/patch/261822/

root@ubuntu# ethtool -K eth0 tx off

Where’s the right place to fix this issue?

So as mentioned in the thread you linked to, not having the checksum isn’t a problem on its own and that should get computed at some point along the line. Since we’ve never seen report of this and a lot of our users use virtual machines and veth network for their containers, I’m inclined to think that the issue is either in the VM nic driver in the VM or with the host hypervisor.

It could effectively be that they end up filling the checksum field with garbage when it’s empty, causing the issue you see above.

It may be interesting to switch between the different type of virtual NICs in your VM to see if that makes a difference somehow (trying to isolate what driver may be at fault).

I’ve tried multiple NIC drivers from VirtualBox and if they work, they show the same issue (esp. “Intel PRO/1000 MT Server (82545EM)” and “Intel PRO/1000 MT Desktop (82540EM)”). I don’t think, that the NIC driver inside the VM is causing the issues here, because it isn’t responsible for incoming checksums. The outgoing packets do have a correct checksum.

The incoming packets generated inside the Ubuntu LXD container lack the correct checksum.

Strangely is that when adding an appropiate IP to lxdbr2 by hand, it is possible to connect from the LXD/VirtualBox hypervisor, while connecting from inside a LXD container fails.

As far as I understand this issue, not having the checksum isn’t a problem if the NIC support checksum offloading. But in our scenario there is no real NIC on the Linux side, so there is no point along the line responsible calculating the checksum.

So it’s absolutely expected that a veth device will not fill in the checksum, the general thought there being that since it’s a completely virtual card, it doesn’t make sense to validate packet integrity.

But I’d have expected the kernel to fill it in when the packet heads out of the VM towards the host, unless that NIC also advertises hardware checksuming, causing the kernel to skip that too, at which point it’d become a hypervisor issue.

Can you check if the network interface in the VM advertises checksum offloading and if it does, whether turning that one off causes the kernel to fill the checksum when the packet leaves the VM?

Harware checksums are turned off by default:

root@freebsd:~ # ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=98<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
[...]

When enabling txcsum (or rxcsum) within the FreeBSD virtual machine, the NIC looses network connectivity and resets these flags. I’ve tested this with the two drivers mentioned above.

After playing around with various Linux virtual network devices, I’ve found a solution to my problem. I have to use a TAP device instead of the LXD bridge and attach the virtual box to it:

$ sudo ip tuntap add dev vbox-tap0 mode tap
$ sudo brctl addif lxdbr2 vbox-tap0
$ sudo ifconfig vbox-tap0 up

My working setup looks as follows:

 |-------------|   |---------|   |--------|   |-----------------|   |-------------------|
 | FreeBSD em0 |---|vbox-tap0|---| lxdbr2 |---| vethO54XQB@if13 |---| eth0 Ubuntu 18.04 |
 | 172.27.35.1 |   | no IP   |   | no IP  |   |                 |   | 172.27.35.10      |
 |-------------|   |---------|   |--------|   |-----------------|   |-------------------|

How can I configure this in Ubuntu 18.04 LTS permantly? I’ve some problems with netplan here, because lxdbr2 is unknown, when the hypervisors network is configured.