Help selecting good network hardware for LXD

Hello!

I’m looking for advice on procuring networking hardware, optimized for use with LXD.

I have some challenging constraints, which I so far have failed in finding.

  • No more than 2U rack space, ideally less.
  • Redundant hardware (E.g. 2 devices for fault tolerance)
  • 18-24 ports per device - 10Gbit.
  • Should be able to run ubuntu to support a maas provisioning infrastructure (could be containers).
  • Should be able to support OVN networking.
  • Ideally able to host a monitoring stack (COS light?).
  • Open source everything
  • Budget around 5000USD

I’ve been trying to get my head around picking the networking hardware for this, but so far failed and I would love to hear if the community has experience.

We intend to use Juju to bootstrap the LXD cluster (possibly from maas running inside the networking hardware) including monitoring.

Any help much appreciated.

As it stands, I’m not aware of any hardware that fits the bill exactly.

In general, I suspect the right direction would be a white box switch that’s got a suitable selection of ports and is capable of running Open Source NOS like SONIC.

I believe some of those NOS can at least run containers (likely Docker) with some potentially able to run VMs too (dependent on hardware). Most of the switches capable of running such a stack tend to be reasonably expensive so should have the usual set of features you’d expect like redundant PSU and some kind of MLAG implementation.

OVN is a bit of a tricky one. We’d love for a switch to be able to act as the active chassis for OVN networks and effectively bridge from the physical to the virtual world right on the switch.

It may be possible to do it again using containers or something today, the issue is that you don’t really want OVN to run entirely on the switch management processor and you certainly don’t want all your OVN traffic to have to go through the internal switch port as those are often quite speed restricted.

Ideally what we’d want is a switch that supports hardware flow offloading in a way that’s compatible with OpenVswitch, effectively something similar to what LXD uses on Mellanox Connect-X. With hardware supporting something like that, then running the OVN chassis on the switch will make a lot of sense. Until then, you’re better off distributing your OVN chassis across servers and just have the top of rack switch provide an uplink VLAN to the servers.

That makes alot of sense, I’ll have a look for SONIC.

Yes, we have come to that conclusion as well. We can reduce our requirements to just be able to run “maas” only such that we can bootstrap our clusters with Juju at least.

We have discussed looking for perhaps some “side-by-side” hardware to provide a remedy to the SPOF for the switching infrastructure. We have a Active-Backup setup tested, that provides a reasonable recovery-time in case one of the switches dies. We can do this with very cheap hardware, but it doesnt scale well and we can’t run anything on those.

Its scary to have a hardware SPOF for LXD clusters when the number of containers grow. Active-Passive mode is at least something.

I would love to collaborate with the LXD team to find an open source platform for LXD that we can standardize on when we plan to build multiple of these clusters going forward. I can spend more time in this research, but its proven difficult so far this explicit problem.

Yeah. The economy of the solution is what we are trying massively to bring to a minimum. Its no point in building a “small cloud” if the price only fits a large pocket…