[LXD] Network load-balancers (OVN)

tomp · June 10, 2022, 9:31am


Project	LXD
Status	Implemented
Author(s)	@tomp
Approver(s)	@stgraber
Release	LXD 5.4
Internal ID	LX017

Abstract

Implement a new network API for load balancers on OVN networks. Those should be based (initially) on the capabilities of current OVN load balancers and will feel very similar to our current forwards support, mostly with the difference that load balancers will support multiple targets for the same port.

Rationale

The rationale for this work is to allow ports on an external IP to be forwarded to multiple endpoints inside an OVN network, in order to provide load distribution for a service running across multiple instances.

Specification

Design

A load balancer will consume an entire external listen IP and therefore it cannot be shared with a network forward. This is because potentially in the future we reserve the possibility of adding additional features (such as TLS termination) that would require transparently changing the implementation of a load balancer from OVN to an application level load balancer, which would then require the entire listen IP to be forward to the container running the application.

Whole IP load balancing doesn’t appear to work for ICMP (see below) I would propose that we prevent the use of the equivalent of the default target_address option that a network forward supports.

The design introduces the concept of named load balancer backends, which (at this time) are made up of a single target IP address and one or more target ports. These named backends can then used as targets in the load balancer port definitions.

By using named backends, and storing target port specifications inside them, it means that each backend can use different target ports (although each target port specification must be compatible with the listen port configuration(s) they are used with). Eventually it will also mean that different config (such as status checks) can be defined for each named backend.

Additionally it also means that the same named backend can be used as a target in multiple load balancer port specifications, which simplifies updating the target address of a backend if it needs to change in the future.

By adding named backends to each port specification, rather than the load balancer as a whole, it means different ports can use different backends.

API changes

For the network load balancer feature a new API extension will be added called network_load_balancer with the following API endpoints and structures added:

Create and edit a network load balancer

POST /1.0/networks/<network>/load-balancers
PUT /1.0/networks/<network>/load-balancers/<listen_address>

Using the following new API structures respectively:

type NetworkLoadBalancersPost struct {
	NetworkLoadBalancerPut `yaml:",inline"`

	// The listen address of the load balancer
	// Example: 192.0.2.1
	ListenAddress string `json:"listen_address" yaml:"listen_address"`
}

type NetworkLoadBalancerPut struct {
	// Description of the load balancer listen IP
	// Example: My public IP load balancer
	Description string `json:"description" yaml:"description"`

	// Load balancer configuration map (refer to doc/network-load-balancers.md)
	// Example: {"user.mykey": "foo"}
	Config map[string]string `json:"config" yaml:"config"`

	// Backends (optional)
	Backends []NetworkLoadBalancerBackend `json:"backends" yaml:"backends"`

	// Port forwards (optional)
	Ports []NetworkLoadBalancerPort `json:"ports" yaml:"ports"`
}

type NetworkLoadBalancerBackend struct {
	// Name of the load balancer backend
	// Example: c1-http
	Name string `json:"name" yaml:"name"`

	// Description of the load balancer backend
	// Example: C1 webserver
	Description string `json:"description" yaml:"description"`

	// TargetPort(s) to forward ListenPorts to (allows for many-to-one)
	// Example: 80,81,8080-8090
	TargetPort string `json:"target_port" yaml:"target_port"`

	// TargetAddress to forward ListenPorts to
	// Example: 198.51.100.2
	TargetAddress string `json:"target_address" yaml:"target_address"`
}

type NetworkLoadBalancerPort struct {
	// Description of the load balancer port
	// Example: My web server load balancer
	Description string `json:"description" yaml:"description"`

	// Protocol for load balancer port (either tcp or udp)
	// Example: tcp
	Protocol string `json:"protocol" yaml:"protocol"`

	// ListenPort(s) of load balancer (comma delimited ranges)
	// Example: 80,81,8080-8090
	ListenPort string  `json:"listen_port" yaml:"listen_port"`

	// TargetBackend backend names to load balance ListenPorts to
	// Example: ["c1-http","c2-http"]
	TargetBackend []string `json:"target_backend" yaml:"target_backend"`
}

Delete a network load balancer

DELETE /1.0/networks/<network>/load-balancer/<listen_address>

List network load balancers

GET /1.0/networks/network/load-balancers
GET /1.0/networks/<network>/load-balancers/<listen_address>

Returns a list or single record (respectively) of this new NetworkLoadBalancer structure:

type NetworkLoadBalancer struct {
	NetworkLoadBalancerPut `yaml:",inline"`

	// The listen address of the load balancer
	// Example: 192.0.2.1
	ListenAddress string `json:"listen_address" yaml:"listen_address"`

	// What cluster member this record was found on
	// Example: lxd01
	Location string `json:"location" yaml:"location"`
}

CLI changes

For external IP load balancing there will be a new sub-command added to the lxc network command called load-balancer.

E.g.

lxc network load-balancer create <network> <listen_address> [key=value...]
lxc network load-balancer backend add <network> <listen_address> <backend_name> <target_address> [<target_port(s)>]
lxc network load-balancer port add <network> <listen_address> <protocol> <listen_port(s)> <backend_name[,backend_name...]>
lxc network load-balancer port remove <network> <listen_address> [<protocol>] [<listen_port(s)>] [--force]
lxc network load-balancer backend remove <network> <listen_address> <backend_name> 
lxc network load-balancer delete <network> <listen_address>
lxc network load-balancer show <network> <listen_address>
lxc network load-balancer edit <network> <listen_address>
lxc network load-balancer set <network> <key>=<value>...
lxc network load-balancer unset <network> <key>
lxc network load-balancer get <network> <key>
lxc network load-balancer list <network>

Database changes

There will be two new tables added called networks_load_balancers and networks_load_balancers_config.

CREATE TABLE "networks_load_balancers" (
	id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
	network_id INTEGER NOT NULL,
	node_id INTEGER,
	listen_address TEXT NOT NULL,
	description TEXT NOT NULL,
	backends TEXT NOT NULL,
	ports TEXT NOT NULL,
	UNIQUE (network_id, node_id, listen_address),
	FOREIGN KEY (network_id) REFERENCES "networks" (id) ON DELETE CASCADE,
	FOREIGN KEY (node_id) REFERENCES nodes(id) ON DELETE CASCADE
);

CREATE TABLE "networks_load_balancers_config" (
	id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
	networks_load_balancer_id INTEGER NOT NULL,
	key VARCHAR(255) NOT NULL,
	value TEXT NOT NULL,
	UNIQUE (networks_load_balancer_id, key),
	FOREIGN KEY (networks_load_balancer_id) REFERENCES "networks_load_balancers" (id) ON DELETE CASCADE
);

Because each OVN network has its own virtual router connected to the uplink network, any external IPs being forwarded will need the virtual router to respond to ARP/NDP requests on its uplink interface. As such a specific listen_address can only be used on a single network at any one time (although there can be multiple per-port entries setup on the same external IP load balancing to different instances inside the same network).

Also the IPs will be stored in canonical form so that database queries can be done on them irrespective of the format specified by the user.

Upgrade handling

As these are new features, no upgrade handling is required.

Further information

At this time load balancers will not support health monitoring of backends due to limitations in the OVN implementation. This means that should one backend fail, the load balancer will fail in handling any requests destined for that backend.

The limitations in the current OVN health check system are:

Health checks must be associated with a logical switch port, meaning they must be tightly coupled to an instance NIC rather than target IP address. This is incompatible with our requirements.
Health checks seemingly do not work unless the source address of the health check is reachable inside the OVN network, as such it appears that a local port is required to be added to each OVN network with an IP for use as the healthcheck source.
Health checks don’t work with IPv6 backends.

According to Ubuntu Manpage: ovn-nb - OVN_Northbound database schema

OVN supports health checks for load balancer endpoints, for IPv4 load balancers only.

tomp · June 10, 2022, 1:25pm

@stgraber have confirmed that basic TCP and UDP load balancing works with OVN:

Basic test:

+--------+---------+--------------------+-----------------------------------------------+-----------------+-----------+
|  NAME  |  STATE  |        IPV4        |                     IPV6                      |      TYPE       | SNAPSHOTS |
+--------+---------+--------------------+-----------------------------------------------+-----------------+-----------+
| c1     | RUNNING | 10.218.58.2 (eth0) | fd42:eb2c:99c5:d6ee:216:3eff:fe24:b7f (eth0)  | CONTAINER       | 0         |
+--------+---------+--------------------+-----------------------------------------------+-----------------+-----------+
| c2     | RUNNING | 10.218.58.3 (eth0) | fd42:eb2c:99c5:d6ee:216:3eff:fe38:73c2 (eth0) | CONTAINER       | 0         |
+--------+---------+--------------------+-----------------------------------------------+-----------------+-----------+

Inside each container install dnsmasq and configure a special fake DNS name that can be used to confirm the request is being handled by the relevant container.

apt install dnsmasq
nano /etc/dnsmasq.d/tomp.conf
interface=eth0
bind-interfaces
interface-name=<container_name>.lxd,eth0 # e.g. c1 or c2
host-record=foo.com,127.0.0.<container number> # e.g. 1 or 2
systemctl start dnsmasq

Now create a network forward to c1 on TCP and UDP port 53:

 lxc network forward show ovntest 10.0.0.1
description: ""
config: {}
ports:
- description: ""
  protocol: tcp
  listen_port: "53"
  target_port: ""
  target_address: 10.218.58.2
- description: ""
  protocol: udp
  listen_port: "53"
  target_port: ""
  target_address: 10.218.58.2
listen_address: 10.0.0.1
location: ""

This results in the following OVN config:

sudo ovn-nbctl list load_balancer
_uuid               : 3f257424-46ca-4701-965b-ae7925f426e3
external_ids        : {}
health_check        : []
ip_port_mappings    : {}
name                : lxd-net29-lb-10.0.0.1-udp
options             : {}
protocol            : udp
selection_fields    : []
vips                : {"10.0.0.1:53"="10.218.58.3:53"}

_uuid               : 7c70bd65-7ec8-4e03-935c-9de2027dfc7b
external_ids        : {}
health_check        : []
ip_port_mappings    : {}
name                : lxd-net29-lb-10.0.0.1-tcp
options             : {}
protocol            : tcp
selection_fields    : []
vips                : {"10.0.0.1:53"="10.218.58.3:53"}

Test load balancing isn’t happening by running this multiple times from outside of the OVN network, expect to see 127.0.0.1 as response always.

dig +tcp @10.0.0.1 foo.com
dig @10.0.0.1 foo.com
;; ANSWER SECTION:
foo.com.		0	IN	A	127.0.0.1

Now manually edit the config to load balance:

sudo ovn-nbctl set load_balancer lxd-net29-lb-10.0.0.1-tcp vips='{"10.0.0.1:53"="10.218.58.2:53,10.218.58.3:53"}'
sudo ovn-nbctl set load_balancer lxd-net29-lb-10.0.0.1-udp vips='{"10.0.0.1:53"="10.218.58.2:53,10.218.58.3:53"}'

Now re-run the dig test repeatedly:

dig +tcp @10.0.0.1 foo.com
dig @10.0.0.1 foo.com
;; ANSWER SECTION:
foo.com.		0	IN	A	127.0.0.n # Should change between 127.0.0.1 and 127.0.0.2

tomp · June 10, 2022, 1:33pm

Interestingly whole IP load balancing works too for TCP and UDP, but when sending ICMP ping to the external IP, it always forwards to the first IP.

stgraber · June 16, 2022, 4:46pm

Fine with me. I indeed expect a load-balancer to require port definitions.

stgraber · June 16, 2022, 4:48pm

tomp:

The design introduces the concept of named load-balancer backends, which (at this time) are made up of a single target IP address and one or more target ports. These named backends can then used as targets in the load-balancer listen port definitions.

By using named backends, and storing target port specifications inside them, it means that each backend can use different target ports. Eventually it will also mean that different config (such as status checks) can be defined for each named backend.

Additionally it also means that the same named backend can be used as a target in multiple load-balancer listen port specifications, which simplifies updating the target address of a backend if it needs to change in the future.

By adding named backends to each listen port specification, rather than the load-balancer as a whole, it means different listen ports can use different backends.

We’re going to need some YAML examples to make this easier to understand

tomp · June 16, 2022, 6:13pm

Yep my intention was to add them if you’re happy with the proposed CLI structure so far.

stgraber · June 16, 2022, 6:45pm

Yeah, CLI structure looks fine to me.

stgraber · June 17, 2022, 5:18pm

Spec reviewed and approved

tomp · July 7, 2022, 9:03am