Can't make OVN network forward working in cluster environment

So for me it works on Focal:

ovn-nbctl --version
ovn-nbctl 20.03.2
Open vSwitch Library 2.13.5
DB Schema 5.20.0

And not on Jammy:

ovn-nbctl --version
ovn-nbctl 22.03.0
Open vSwitch Library 2.17.0
DB Schema 6.1.0

And you’re running newer than Focal also. What OS and version is this?

I’ve compiled OVN myself on Arch Linux before the official release of 22.03. Therefore I should be able to compile an older version as well.
However I’m not sure about the compatibility to Open vSwitch so maybe I’ve to downgrade Open vSwitch as well in this case?

In any case: should we open a bug report in OVN?

Looks like there is a difference in the OVN generated logical flows between those versions:

My forwarder listen address is 10.64.199.200.

Focal:

ovn-sbctl list logical_flow | grep 10.64.199.200
actions             : "eth.dst = eth.src; eth.src = 00:16:3e:71:62:e8; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = 00:16:3e:71:62:e8; arp.tpa = arp.spa; arp.spa = 10.64.199.200; outport = \"lxd-net2-lr-lrp-int\"; flags.loopback = 1; output;"
match               : "inport == \"lxd-net2-lr-lrp-int\" && arp.tpa == 10.64.199.200 && arp.op == 1"
match               : "ct.new && ip4.dst == 10.64.199.200"
actions             : "reg0[6] = 1; ct_snat(10.64.199.200);"
actions             : "eth.dst = eth.src; eth.src = 00:16:3e:71:62:e8; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = 00:16:3e:71:62:e8; arp.tpa = arp.spa; arp.spa = 10.64.199.200; outport = \"lxd-net2-lr-lrp-ext\"; flags.loopback = 1; output;"
match               : "inport == \"lxd-net2-lr-lrp-ext\" && arp.tpa == 10.64.199.200 && arp.op == 1 && is_chassis_resident(\"cr-lxd-net2-lr-lrp-ext\")"
match               : "flags[1] == 0 && arp.op == 1 && arp.tpa == { 10.64.199.200, 10.153.218.1}"
match               : "((ip4.src == 10.153.218.2)) && ip4.dst == 10.64.199.200"
match               : "flags[1] == 0 && arp.op == 1 && arp.tpa == { 10.64.199.200, 10.54.44.11}"
match               : "ip && ip4.dst == 10.64.199.200"
match               : "ct.new && ip && ip4.dst == 10.64.199.200 && is_chassis_resident(\"cr-lxd-net2-lr-lrp-ext\")"
match               : "ct.est && ip && ip4.dst == 10.64.199.200 && is_chassis_resident(\"cr-lxd-net2-lr-lrp-ext\")"
match               : "ip && ip4.dst == 10.64.199.200"

Jammy:

 ovn-sbctl list logical_flow | grep 10.64.199.200
actions             : "reg0[1] = 0; reg1 = 10.64.199.200; ct_lb(backends=10.82.96.2);"
match               : "ct.new && ip4.dst == 10.64.199.200"
actions             : "reg0 = 10.64.199.200; ct_dnat;"
match               : "ip && ip4.dst == 10.64.199.200"
match               : "ct.new && ip4 && reg0 == 10.64.199.200 && is_chassis_resident(\"cr-lxd-net3-lr-lrp-ext\")"
match               : "ct.est && ip4 && reg0 == 10.64.199.200 && ct_label.natted == 1 && is_chassis_resident(\"cr-lxd-net3-lr-lrp-ext\")"

Interestingly the missing arp reply bit is concerning:

actions             : "eth.dst = eth.src; eth.src = 00:16:3e:71:62:e8; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = 00:16:3e:71:62:e8; arp.tpa = arp.spa; arp.spa = 10.64.199.200; outport = \"lxd-net2-lr-lrp-ext\"; flags.loopback = 1; output;"

Yep I’ll open an issue.

Does it help if I determine the first version it is not working anymore? I think I can easily change my OVN versions for testing.

Yes that would be useful.

Also for your info, I’m using the LXD snap package for both and this comes bundled with the latest version of the OVN client. However the actual logical flows are generated by the host’s version of OVN.

May output from your command above is:

# ovn-sbctl list port_binding | grep bbb.76.20.84
nat_addresses       : ["00:16:3e:3e:25:19 172.17.2.100 bbb.76.20.84 is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"]
nat_addresses       : ["00:16:3e:3e:25:19 bbb.76.20.84 is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")", "00:16:3e:3e:25:19 172.17.2.100 is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"]

Sorry I posted the wrong example (even when corrected it still shows the issue though), can you get output of

ovn-sbctl list logical_flow | grep <forward_ip>

Just wondered where I got the other command from :slight_smile:

# ovn-sbctl list logical_flow | grep bbb.76.20.84
actions             : "reg1 = bbb.76.20.84; ct_lb(backends=10.161.64.2);"
match               : "ct.new && ip4.dst == bbb.76.20.84"
match               : "ct.est && ip4 && reg0 == bbb.76.20.84 && ct_label.natted == 1 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
actions             : "reg0 = bbb.76.20.84; ct_dnat;"
match               : "ip && ip4.dst == bbb.76.20.84"
match               : "ct.new && ip4 && reg0 == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"

Yep so you’re missing the /* ARP reply */ bit too.

Ok, so I’ll check the other OVN versions and check for corresponding actions line.

1 Like

Hi @tomp,

seems like 21.06 is the first “faulty” version:

# ovn-nbctl --version                                                                                               
ovn-nbctl 21.03.0
Open vSwitch Library 2.15.90
DB Schema 5.31.0

# ovn-sbctl list logical_flow | grep bbb.76.20.84                                                                   
match               : "ct.est && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
actions             : "eth.dst = eth.src; eth.src = xreg0[0..47]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[0..47]; arp.tpa = arp.spa; arp.spa = bbb.76.20.84; outport = inport; flags.loopback = 1; output;"
match               : "inport == \"lxd-net11-lr-lrp-int\" && arp.op == 1 && arp.tpa == bbb.76.20.84"
match               : "ip && ip4.dst == bbb.76.20.84"
actions             : "reg1 = bbb.76.20.84; ct_lb(backends=10.161.64.2);"
match               : "ct.new && ip4.dst == bbb.76.20.84"
match               : "ct.new && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
actions             : "eth.dst = eth.src; eth.src = xreg0[0..47]; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = xreg0[0..47]; arp.tpa = arp.spa; arp.spa = bbb.76.20.84; outport = inport; flags.loopback = 1; output;"
match               : "inport == \"lxd-net11-lr-lrp-ext\" && arp.op == 1 && arp.tpa == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
# ovn-nbctl --version
ovn-nbctl 21.06.0
Open vSwitch Library 2.15.90
DB Schema 5.32.0

# ovn-sbctl list logical_flow | grep bbb.76.20.84                                                                   
match               : "ct.est && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
match               : "ip && ip4.dst == bbb.76.20.84"
match               : "inport == \"lxd-net11-lr-lrp-int\" && arp.op == 1 && arp.tpa == { bbb.76.20.84 }"
actions             : "reg1 = bbb.76.20.84; ct_lb(backends=10.161.64.2);"
match               : "ct.new && ip4.dst == bbb.76.20.84"
match               : "inport == \"lxd-net11-lr-lrp-ext\" && arp.op == 1 && arp.tpa == { bbb.76.20.84 } && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"
match               : "ct.new && ip && ip4.dst == bbb.76.20.84 && is_chassis_resident(\"cr-lxd-net11-lr-lrp-ext\")"

I did also a git bisect afterwards and can say that the first bad git commit is ea6ee901ff9107a084bc830a8a38c4e0bd9f75f7 which main purpose seemed to be changes in the ARP handling.

Since lxd seems to reset the route after boot, I’ll go with the version 21.03.0 now.

One other question regarding forwards: since I can only configure container ip addresses as target (not the container name) the normal way to do is to stick the target container to a distinct ip, correct?

Thanks for this, its immensely valuable. :slight_smile:

No problem! Thanks for your great help @tomp! I was quite frustrated in between, because I still don’t understand enough about networks and I am very happy that it works now.
At least I have now learned a few things again :smiley:

I’m going to a log an issue at GitHub - ovn-org/ovn: Open Virtual Network now

1 Like
1 Like