[LXD] Floating IP addresses

tomp · August 5, 2021, 10:19am


Project	LXD
Status	Implemented
Author(s)	@tomp
Approver(s)	@stgraber
Release	4.18
Internal ID	LX005

Abstract

The aim of this project is to add the ability for instances to have external IP addresses forwarded to them, both for bridged and ovn network types.

We also want to allow ovn networks to be able to specify a different outbound NAT address than their external assigned IP on the uplink network.

The three desired features are:

One-to-one forward of entire single external IP to single internal IP.
One-to-many sharing of a single external IP forwarding to multiple internal IPs via specific port(s) forwards to a single internal IP and port(s).
Allow ovn networks to specify their outbound source NAT IP from an external address on the uplink network (outside of the uplink network’s subnet).

Terminology

The following terms will be used both in this document and in the resulting API and CLI commands.

External IP address - an IP address that is outside of the network’s own subnet and routed to the parent network’s uplink. These are the addresses that we want to be able to forward to an instance’s internal IP(s).
Internal IP address - an existing IP address on an instance NIC that is within the network’s subnet. Can be either the primary allocated static address or a dynamic address added within the instance.
Uplink - an existing LXD network that is connected to the network’s router to provide external connectivity. These networks also define what external IP subnets are available for use to forward IPs into instances via their ipv{n}.routes setting.
Project restricted.networks.subnets - A project can also define a subset of the uplink’s external subnets that are allowed for use as external IPs in the networks within that project.

Rationale

We already support forwarding external IPs from the uplink network directly into an instance NIC (without being rewritten to a internal IP destination) using the bridged NIC’s ipv{n}.routes setting or the ovn NIC’s ipv{n}.routes.external (which will also be coming to bridged NICs soon by way of [LXD] BGP address/route advertisement).

However this requires fully allocating an entire IP address to a single instance, and it requires that the instance has the external IP added as an alias inside it (which requires additional setup inside the instance by the user).

Allowing a user to forward an external IP (or selected ports of it) to an existing internal IP will allow more efficient use of the limited public IPv4 addresses available as well as simplifying the setup of providing external inbound connectivity so that there is no need to modify the network settings inside the container (i.e having to set up an IP alias for the external IP as described above).

The rationale to providing the ability to specify the outbound source NAT address for ovn networks is to provide feature parity with bridge networks, by allowing an ovn network to specify a source NAT address in one of the uplink network’s ipv{n}.routes external subnets. As currently the ovn virtual routers can only use IPs for source NAT that are in the uplink network’s own subnet.

Specification

Design

The decision to make external IP forwards be associated to a network rather than a particular instance NIC device was intentional. The idea being that because both bridge and ovn networks allow internal IPs to be used in an adhoc dynamic fashion, it maybe that an internal IP is moved between instances without the knowledge of LXD. By allowing an IP forward to be created at the network level and forwarded to a specific internal IP, the external IP forward will “follow” the internal IP even if it is moved.

Also for port-based forwards (unlike the existing NIC level ipv{n}.routes setting), there can be multiple instance NICs effectively using the same external IP, so defining it agains an instance NIC didn’t make a lot of sense.

So keeping all the external->internal forwarding features together at the network level seemed sensible.

For the ovn network type we intend to use the Load_Balancer functionality provided by OVN.

This provides for both whole IP and individual port forwarding (of either TCP or UDP). It also allows for loopback requests from the target instance to the external IP/port and handles source NAT (mostly) correctly (see below).

For the bridge network type we can extend the existing firewall package (along with the associated xtables and nftables drivers) to provide forwarding functionality using DNAT rules, and loopback support using br_netfilter, hairpin bridge ports and SNAT rules (similar to what we do today with proxy devices in NAT mode).

API changes

For the network forwards feature a new API extension will be added called network_forward with the following API endpoints and structures added:

Create and edit a network forward

POST /1.0/networks/<network>/forwards
PUT /1.0/networks/<network>/forwards/<listen_address>

Using the following new API structures respectively:

type NetworkForwardsPost struct {
	NetworkForwardPut `yaml:",inline"`

	// The listen address of the forward
	// Example: 192.0.2.1
	ListenAddress string `json:"listen_address" yaml:"listen_address"`
}

type NetworkForwardPut struct {
	// Description of the forward listen IP
	// Example: My public IP forward
	Description string `json:"description" yaml:"description"`

	// Forward configuration map (refer to doc/network-forwards.md)
	// Example: {"user.mykey": "foo"}
	Config map[string]string `json:"config" yaml:"config"`

	// Port forwards (optional)
	Ports []NetworkForwardPort `json:"ports" yaml:"ports"`
}

type NetworkForwardPort struct {
	// Description of the forward port
	// Example: My web server forward
	Description string `json:"description" yaml:"description"`

	// Protocol for port forward (either tcp or udp)
	// Example: tcp
	Protocol string `json:"protocol" yaml:"protocol"`

	// ListenPort(s) to forward (comma delimited ranges)
	// Example: 80,81,8080-8090
	ListenPort string  `json:"listen_port" yaml:"listen_port"`

	// TargetPort(s) to forward ListenPorts to (allows for many-to-one)
	// Example: 80,81,8080-8090
	TargetPort string `json:"target_port" yaml:"target_port"`

	// TargetAddress to forward ListenPorts to
	// Example: 198.51.100.2
	TargetAddress string `json:"target_address" yaml:"target_address"`
}

Delete a network forward

DELETE /1.0/networks/<network>/forwards/<listen_address>

List network forwards

GET /1.0/networks/network/forwards
GET /1.0/networks/<network>/forwards/<listen_address>

Returns a list or single record (respectively) of this new NetworkForward structure:

type NetworkForward struct {
	NetworkForwardPut `yaml:",inline"`

	// The listen address of the forward
	// Example: 192.0.2.1
	ListenAddress string `json:"listen_address" yaml:"listen_address"`

	// What cluster member this record was found on
	// Example: lxd01
	Location string `json:"location" yaml:"location"`
}

OVN source NAT

For the OVN network source NAT address feature a new API extension called network_ovn_nat_address will be added to align with the earlier network_nat_address which added the feature to the bridge network type.

Because we intend for the ipv{n}.nat.address setting to be set to an address in the uplink network’s ipv{n}.routes external subnets, it will be outside of the uplink network’s own subnet and thus it will be needed that the specified source NAT address (or a wider subnet) is routed to the OVN network’s virtual router. As such this we will require that the uplink network’s ovn.ingress_mode be set to routed and the required routes on the uplink network have been added (either manually or via BGP).

CLI changes

For external IP forwarding there will be a new sub-command added to the lxc network command called forward.

E.g.

lxc network forward create <network> <listen_address> [key=value...] [--target=<member>]
lxc network forward port add <network> <listen_address> <protocol> <listen_port(s)> <target_address> [<target_port(s)>] [--target=<member>]
lxc network forward port remove <network> <listen_address> [<protocol>] [<listen_port(s)>] [--force] [--target=<member>]
lxc network forward delete <network> <listen_address> [--target=<member>]
lxc network forward show <network> <listen_address> [--target=<member>]
lxc network forward edit <network> <listen_address> [--target=<member>]
lxc network forward set <network> <key>=<value>... [--target=<member>]
lxc network forward unset <network> <key> [--target=<member>]
lxc network forward get <network> <key> [--target=<member>]
lxc network forward list <network>

The forward config keys available will be user.* and target_address which will set the default target address for traffic not matched by any of the port overrides. This field is optional and if not set then unmatched traffic will be dropped.

For lxc network forward port remove the --force flag will remove all port forwards that match, without if multiple port forwards match an error will be returned.

For the OVN network source NAT address feature we will add two new config keys:

ipv4.nat.address
ipv6.nat.address

E.g. lxc network set <network> ipv4.nat.address=192.0.2.1

Database changes

There will be two new tables added called networks_forwards and networks_forward_config.

CREATE TABLE "networks_forwards" (
	id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
	network_id INTEGER NOT NULL,
	node_id INTEGER,
	listen_address TEXT NOT NULL,
	description TEXT NOT NULL,
	ports TEXT NOT NULL,
	UNIQUE (network_id, node_id, listen_address),
	FOREIGN KEY (network_id) REFERENCES "networks" (id) ON DELETE CASCADE,
	FOREIGN KEY (node_id) REFERENCES nodes(id) ON DELETE CASCADE
);

CREATE TABLE "networks_forwards_config" (
	id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
	network_forward_id INTEGER NOT NULL,
	key VARCHAR(255) NOT NULL,
	value TEXT,
	UNIQUE (network_forward_id, key),
	FOREIGN KEY (network_forward_id) REFERENCES "networks_forwards" (id) ON DELETE CASCADE
);

Because each OVN network has its own virtual router connected to the uplink network, any external IPs being forwarded will need the virtual router to respond to ARP/NDP requests on its uplink interface. As such a specific listen_address can only be used on a single network at any one time (although there can be multiple per-port forwards setup on the same external IP forwarding to different instances inside the same network).

Also the IPs will be stored in canonical form so that database queries can be done on them irrespective of the format specified by the user.

Upgrade handling

As these are new features, no upgrade handling is required.

Further information

Whilst the OVN Load_Balancer meets nearly all our requirements, there does appear to be a bug in the version provided in Ubuntu Focal with regards to loopback requests from the target instance to the external IP/port.

When having a setup like below, where we have one entire external IP (fd42:b545:2e58:ec06::12) forwarding to an internal IP (fd42:3242:1613:9c39:216:3eff:fe80:6179), and one or more port based forwardings to the same target IP (fd42:3242:1613:9c39:216:3eff:fe80:6179) but listening on a different external IP (fd42:b545:2e58:ec06::11):

E.g.

ovn-nbctl --may-exist lb-add test '[fd42:b545:2e58:ec06::12]' '[fd42:3242:1613:9c39:216:3eff:fe80:6179]'
ovn-nbctl --may-exist lb-add test '[fd42:b545:2e58:ec06::11]:80' '[fd42:3242:1613:9c39:216:3eff:fe80:6179]:80' tcp
ovn-nbctl --may-exist lb-add test '[fd42:b545:2e58:ec06::11]:81' '[fd42:3242:1613:9c39:216:3eff:fe80:6179]:80' tcp

When making a loopback request from the target IP (fd42:3242:1613:9c39:216:3eff:fe80:6179), the source address of the looped back request to [fd42:b545:2e58:ec06::11]:80 is incorrectly being rewritten to fd42:b545:2e58:ec06::12 rather than the expected fd42:b545:2e58:ec06::11 which is causing connection resets.

It looks like it is picking the first entry with the same target address for SNAT rather than also considering the target port too.

This is happening on both IPv4 and IPv6.

A more recent version of OVN will be tried and if the problem still exists will be reported to the upstream project.

tomp · August 6, 2021, 1:19pm

Ready for review @stgraber

stgraber · August 6, 2021, 3:57pm

I’d use Address instead of IP as we’ve never used IP in our API so far.

stgraber · August 6, 2021, 3:57pm

Typo, should be string

stgraber · August 6, 2021, 4:03pm

Looks good to me overall. For some reason I had in my head that we’d have a single forward entry per address, effectively allowing for /1.0/networks/NAME/forwards/ADDRESS and that the struct there would then specify if we’re doing one-to-one or one-to-many and encapsulate the different targets for the different protocols and ports.

Your approach should work too, though it comes with having to associate a name for each forward rather than being able to easily query them by IP.

Ah and you’re missing a Protocol field I believe.

tomp · August 6, 2021, 4:13pm

Ah yes protocol is missing indeed, thanks.

RE having multiple potential rows per external address, I figured this would be easier to understand as each forward entry (IP and optional port set) would likely be for a particular service/role, allowing for more helpful naming/descriptions to be used.

E.g. Web server, Mail sever etc.

I did specifically make the address field itself separate from the ports though so that we can easily retrieve the forward entries for a specific IP for both validation, and if needed in the future, a way to report on all forwarders an IP.

stgraber · August 6, 2021, 4:17pm

type Forward struct {
    # Required
    Address string

    # Optional
    Description string

    # Optional
    DefaultTarget string

    # Optional
    Ports []ForwardPort
}

type ForwardPort struct {
    # Optional
    Description string

    # Required
    Protocol string

    # Required, supports ranges
    Port string

    # Optional, single-port (allows for many-to-one)
    TargetPort string

    # Required
    TargetAddress string
}

tomp · August 6, 2021, 4:19pm

Have added protocol now.

tomp · August 6, 2021, 4:34pm

Certainly that API structure can be supported with the current CLI and DB proposal, but I tend to start with the CLI and work it backward from an end-user perspective. Do you see that structure altering the CLI behaviour (well it would have to because of the lack of name ofc, but would name just be replaced with external_address?), but would there also need to be an additional set of commands to manage the ports on an IP (like ACL rules basically)?

Also, a question just popped into my head that I need to test, and that is OVN’s behaviour when you have a 1:1 forward to IP A, and a 1:many port forward on the same external IP but forwarding to IP B.
Hopefully OVN does the sensible thing and foreards it to IP B, but I’ve learnt to never assume with OVN!

And with xtables/nftables I’d imagine we’d need to be careful with the rule ordering to ensure we get the behaviour we want there.

I’d originally envisioned adding a validation check that prevents the use of mixed 1:1 and 1:many rules on a single external address (to sidestep those issues), but the struct you propose there effectively formalises the concept of a “port override” forward and falling back to the default forward, something I’ve yet to check OVN actually supports.

tomp · August 6, 2021, 4:45pm

When this isn’t supplied, do you mean that it would forward each listen port to the equivalent target port?

OVN’s load balancer doesn’t port ranges so we’d need to expanded them like we do for the firewall driver for proxy devices.

stgraber · August 6, 2021, 4:48pm

Yeah, indeed, the common case would be using the same port.

stgraber · August 6, 2021, 4:49pm

I indeed need to think a bit about the CLI. I suspect it’d be somewhat similar to what we did for ACL, where you’ll be able to add/remove a port, similar to what we do with rules.

tomp · August 6, 2021, 4:50pm

The good news is OVN seems to do the right thing with regards to using 1:1 and 1:many rules on the same external IP (I tried to break it but wasn’t able to, that being said the records seem to get added in a particular order irrespective of the order of the commands I run suggesting there is some special logic at play).

tomp · August 6, 2021, 4:52pm

So like:

lxc network forward add <network> <external_address> [<default_target_address>]
lxc network forward port add <network> <external_address> <target_address> <protocol> <port range>[,<port range>]
lxc network forward port remove <network> <external_address> <target_address> <protocol> <port range>[,<port range>]

stgraber · August 6, 2021, 4:53pm

Probably add-port as add port feels a bit weird, but yeah, something like that.
And edit would obviously let you do it all at once, same as ACLs.

tomp · August 6, 2021, 4:54pm

Yep true, that was a mistake, I changed it round to “forward port add” like “acl rule add”.

tomp · August 9, 2021, 11:04am

I’ve updated the design with what we discussed on Friday. Thanks

stgraber · August 9, 2021, 2:16pm

Should this one be in Put instead?

stgraber · August 9, 2021, 2:19pm

I think this should be a single port.
The cases we care about are:

one-to-one with differing source/target (both take single port)
one-to-one with same source/target (only ListenPort needed)
many-to-one (ListenPort takes range, TargetPort takes single port)
many-to-many (requires same source/target so TargetPort isn’t needed)

As a result there’s no cases where we need TargetPort to take a range. It’s either a single port or is unset.

stgraber · August 9, 2021, 2:19pm

Odd indentation here