BGP session redundancy, cluster w/ OVN

schwim · February 3, 2023, 11:06pm

Hi!

I have a bare-metal cluster of 3 servers deployed w/ Juju. I have an OVN overlay configured and a routed subnet configured across the cluster. The OVN Uplink is attached to br0, which is connected to the 10.0.208.0/24 subnet. BGP is configured, and the first LXD node is advertising routes via BGP to the rest of the network via the IP assigned by the OVN uplink.

Uplink:
  bgp.peers.rr01.address: 10.0.208.212
  bgp.peers.rr01.asn: "65000"
  ipv4.gateway: 10.0.208.254/24
  ipv4.ovn.ranges: 10.0.208.224-10.0.208.239
  ipv4.routes: 10.0.209.0/26

OVN network:
  ipv4.address: 10.0.209.1/26
  ipv4.nat: "false"
  network: UPLINK-control-plane
  volatile.network.ipv4.address: 10.0.208.224

Testing the most obvious failure mode by powering off the 1st cluster node causes BGP to time out as expected. I was hoping one of the other nodes would pick up where it left off but this doesn’t seem to be the case, whether by re-using the dead node’s IP address or the other cluster node’s IPs.

Is there some way to configure BGP redundancy such that BGP will restart on a different cluster node should the active BGP node fail?

Thanks!
Greg

stgraber · February 3, 2023, 11:26pm

You should have all 3 servers listen on BGP and have a session between your router and each of the servers. All 3 servers will announce the exact same route, so BGP on your router will then only lose the route should all 3 go away.

schwim · February 4, 2023, 5:39pm

This is what I was hoping. I believe I found the problem.

Specifically, the core.bgp* configurations need to be applied to each of the cluster members. I’d made the mistake of thinking this was cluster wide. From there I restarted lxd on each server and the bgp sessions established.

Thanks!
Greg

tomp · February 5, 2023, 6:52am

Yeah the thinking is that you might have different peer addresses for each member.

schwim · February 5, 2023, 7:51pm

Makes sense. Might be nice to tie it to an interface on the node. Example:

lxc set config core.bgp_address eth0
lxc set config core.bgp_routerid eth0

Seems like a feature request.