Abstract
Provide the ability to specify peering relationships between OVN networks (including across projects) so that network traffic between the OVN networks stays within the OVN subsystem and doesn’t leave OVN and then re-enter.
Rationale
Currently traffic between OVN networks exits the OVN subsystem via the source network’s virtual router and goes into the uplink network where it may then re-enter the OVN subsystem via the target network’s virtual router. This is inefficient and means that the network bandwidth is limited by the uplink network’s capabilities. If the OVN setup is using faster networking for internal traffic, then it would also be possible to use the same faster networking capabilities for OVN<->OVN traffic by allowing peering relationships to be configured between OVN networks.
Specification
Design
OVN supports creating peering links between virtual routers by adding router ports to each router and setting the peer
property of the router ports to the respective router port name.
For example, to create a peering link between two existing virtual routers:
- lxd-net1-lr (LAN subnets
10.110.120.0/24
,fd42:7832:3b4e:cffb::/64
) - lxd-net2-lr (LAN subnets
10.105.164.0/24
,fd42:5389:62b9:be7c::/64
)
We can create two router ports, one on each router and reference the other as the peer.
In order to avoid having to setup a separate peering subnet, the existing MAC and IPs of the virtual router’s port on the internal LAN have been used, albeit with a single-host subnet (e.g. /32 for IPv4 and /128 for IPv6). This effectively adds the same address to multiple ports on each router. This will become important when actually setting up static routes that use the peering connection, as we will need to explicitly specify the router port to use, as OVN will not be able to deduce the correct port to use for the peering link automatically.
ovn-nbctl lrp-add lxd-net1-lr lxd-net1-lr-lrp-net-2 00:16:3e:d6:73:26 10.110.120.1/32 fd42:7832:3b4e:cffb::1/128 peer=lxd-net2-lr-lrp-net-1
ovn-nbctl lrp-add lxd-net2-lr lxd-net2-lr-lrp-net-1 00:16:3e:3f:e4:9f 10.105.164.1/32 fd42:5389:62b9:be7c::1/128 peer=lxd-net1-lr-lrp-net-2
LXD will then need to setup static routes on the respective virtual routers for the peered local subnets. The static routes will need to use the target router’s IP that was added on the peering ports for the nexthop
address and explicitly specify the local peering router port to use for egress traffic, e.g:
ovn-nbctl lr-route-add lxd-net1-lr 10.105.164.0/24 10.105.164.1 lxd-net1-lr-lrp-net-2
ovn-nbctl lr-route-add lxd-net1-lr fd42:5389:62b9:be7c::/64 fd42:5389:62b9:be7c::1 lxd-net1-lr-lrp-net-2
ovn-nbctl lr-route-add lxd-net2-lr 10.110.120.0/24 10.110.120.1 lxd-net2-lr-lrp-net-1
ovn-nbctl lr-route-add lxd-net2-lr fd42:7832:3b4e:cffb::/64 fd42:7832:3b4e:cffb::1 lxd-net2-lr-lrp-net-1
This will then allow traffic to flow between networks without leaving the OVN subsystem.
Route tables (avoiding asymmetric routing for NIC routes)
Because LXD’s OVN implementation supports routing additional prefixes to ovn
NICs by specifying ipv{n}.routes
and/or ipv{n}.routes.external
this could then result in Instance NICs being configured to create packets destined for the peer network but using a source address outside of the source network’s primary subnet. This will lead to asymmetric routing (where the return packet leaves the OVN subsystem) and cause unexpected behaviour when using stateful ACLs or external firewalls.
To avoid this LXD will identify all of the possible prefixes being used by the peered network and add static routes for those prefixes to the local virtual router pointing towards the peer connection (and vice versa on the peered network’s virtual router).
Any changes to the peered network’s prefixes will be automatically applied to the local router’s routing table. This allows the peered network to indirectly influence the routing table of the local router.
Future work: Route prefix filtering
A possible extension in the future would be to add a prefix filter setting to the local peer connection entry to only allow specific prefixes to be added to the routing table. This would ensure that if the target network later adds a NIC level route that conflicts with addressing inside the source network that these routes are not automatically exported to the source network which would cause network disruption. It would also allow for the ability to create peer connections to multiple networks that may contain some prefixes that conflict with each other but not the source network. In this way the source network can select which prefix is reachable over which peer connection, rather than potentially importing a set of conflicting prefixes from the multiple peer networks.
Mutual peering
It will be possible for peering relationships to be established between OVN networks in different LXD projects.
Because OVN subnets are not guaranteed to be unique (even within a single LXD deployment) it is possible for overlapping subnets to be used in multiple OVN networks. As such a peering relationship between OVN networks needs to be mutually agreed by both sides, and the peering validation process will check that the route prefixes being exchange will not cause conflicts.
Example work flow:
Create two OVN networks in each in different projects.
lxc network create ovn1 --type=ovn --project project1 \
network=myuplink \
ipv4.address=192.168.1.1/24
lxc network create ovn2 --type=ovn --project project2 \
network=myuplink \
ipv4.address=192.168.2.1/24
Initiate peer connection from ovn1 towards ovn2.
The initiator will have to know correct project and network name to succeed, if either are incorrect no error message will be returned. This is to avoid users in one project being able to enumerate existing projects or existing networks in another project.
lxc network peer create ovn1 mypeer-ovn2 project2/ovn2 --project project1
lxc network peer ls ovn1 --project project1
+-------------+-------------+---------------+---------+
| NAME | DESCRIPTION | PEER | STATE |
+-------------+-------------+---------------+---------+
| mypeer-ovn2 | | project2/ovn2 | PENDING |
+-------------+-------------+---------------+---------+
Confirm peer connection from ovn2 towards ovn1.
The user will have to know correct project and network name to succeed, if either are incorrect no error message will be returned.
lxc network peer create ovn2 mypeer-ovn1 project1/ovn1 --project project2
lxc network peer ls ovn2 --project project2
+-------------+-------------+---------------+---------+
| NAME | DESCRIPTION | PEER | STATE |
+-------------+-------------+---------------+---------+
| mypeer-ovn1 | | project1/ovn1 | CREATED |
+-------------+-------------+---------------+---------+
lxc network peer ls ovn1 --project project1
+-------------+-------------+---------------+---------+
| NAME | DESCRIPTION | PEER | STATE |
+-------------+-------------+---------------+---------+
| mypeer-ovn2 | | project2/ovn2 | CREATED |
+-------------+-------------+---------------+---------+
ACL considerations
It is hoped that OVN will eventually allow us to identify traffic going to/from a peer router port and reference that in ACL rules using the peer name. As such we will ensure that peer names are usable in ACL rules when prefixed with the special @
character (that already cannot be used in ACL names) to indicate a specific network port subject.
Peer names will follow the same naming restrictions as ACLs:
- Be between 1 and 63 characters long
- Be made up exclusively of letters, numbers and dashes from the ASCII table
- Not start with a digit or a dash
- Not end with a dash
As well as:
- Must not be “internal” or “external” - this is so they won’t conflict with the reserved
@internal
and@external
subjects.
Currently the ACL will classify traffic on the peer connection as @external
as it does with traffic going to/from the uplink network.
Although OVN itself doesn’t support identifying traffic from the peer connection as a different specific port on the internal LAN (it all appears to come from the router’s port connected to the LAN), as the peer connection has a specific set of target prefixes associated with it, we could potentially create an ACL address set containing those prefixes. We would then need to ensure that traffic from those prefixes not coming from the peer connection was dropped and any traffic coming from an address outside of that address set through the peer connection was also dropped. At that point we could be confident that any packets matching a source address in the address set for the peer connection could only have come from the peer connection itself.
This has been tested to work using router policies in OVN.
E.g. This allows packets from lxd-net2-lr’s subnet 10.105.164.0/24
arriving at lxd-net1-lr’s peer router port, and drops all other traffic arriving at the port.
ovn-nbctl lr-policy-add lxd-net1-lr 100 "ip4.src == 10.105.164.0/24 && inport == \"lxd-net1-lr-lrp-net-2\"" allow
ovn-nbctl lr-policy-add lxd-net1-lr 99 "inport == \"lxd-net1-lr-lrp-net-2\"" drop
This provides the foundations for ensuring that packets arriving at a virtual peer router port match the prefixes expected for the peer, and equally allow ensuring that packets arriving from the external virtual router port (connected to the uplink network) do not come from prefixes expected to be coming from the peer connection. In this way a named ACL address set that references the peer connection name would be able to reliably enforce policies between networks.
API changes
For the network peers feature a new API extension will be added called network_peer
with the following API endpoints and structures added:
Create and edit a network peer
POST /1.0/networks/<network>/peers
PUT /1.0/networks/<network>/peers/<name>
Using the following new API structures respectively:
type NetworkPeersPost struct {
NetworkPeerPut `yaml:",inline"`
// Name of the peer
// Example: project1-network1
Name string `json:"name" yaml:"name"`
// Name of the target project
// Example: project1
TargetProject string `json:"target_project" yaml:"target_project"`
// Name of the target network
// Example: network1
TargetNetwork string `json:"target_network" yaml:"target_network"`
}
type NetworkPeerPut struct {
// Description of the peer
// Example: Peering with network1 in project1
Description string `json:"description" yaml:"description"`
// Peer configuration map (refer to doc/network-peers.md)
// Example: {"user.mykey": "foo"}
Config map[string]string `json:"config" yaml:"config"`
}
Delete a network peer
DELETE /1.0/networks/<network>/peers/<name>
List network peers
GET /1.0/networks/network/peers
GET /1.0/networks/<network>/peers/<name>
Returns a list or single record (respectively) of this new NetworkPeer
structure:
type NetworkPeer struct {
NetworkPeerPut `yaml:",inline"`
// Name of the peer
// Read only: true
// Example: project1-network1
Name string `json:"name" yaml:"name"`
// Name of the target project
// Read only: true
// Example: project1
TargetProject string `json:"target_project" yaml:"target_project"`
// Name of the target network
// Read only: true
// Example: network1
TargetNetwork string `json:"target_network" yaml:"target_network"`
// The state of the peering
// Read only: true
// Example: Pending
Status string `json:"status" yaml:"status"`
}
CLI changes
There will be a new sub-command added to the lxc network
command called peer
.
E.g.
For managing peer relationships:
lxc network peer ls <network>
lxc network peer create <network> <peer name> <[target project/]target_network>
lxc network peer show <network> <peer name>
lxc network peer edit <network> <peer name>
lxc network peer set <network> <peer name> <key>=<value>...
lxc network peer unset <network> <peer name> <key>
lxc network peer get <network> <peer name> <key>
lxc network peer delete <network> <peer name>
Database changes
There will be two new tables added called networks_peers
and networks_peer_config
.
CREATE TABLE "networks_peers" (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
network_id INTEGER NOT NULL,
name TEXT NOT NULL,
description TEXT NOT NULL,
target_network_project TEXT NULL,
target_network_name TEXT NULL,
target_network_id INTEGER NULL,
UNIQUE (network_id, name),
UNIQUE (network_id, target_network_project, target_network_name),
UNIQUE (network_id, target_network_id),
FOREIGN KEY (network_id) REFERENCES "networks" (id) ON DELETE CASCADE
);
CREATE TABLE "networks_peers_config" (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
network_peer_id INTEGER NOT NULL,
key VARCHAR(255) NOT NULL,
value TEXT,
UNIQUE (network_peer_id, key),
FOREIGN KEY (network_peer_id) REFERENCES "networks_peers" (id) ON DELETE CASCADE
);
Upgrade handling
As these are new features, no upgrade handling is required.
Further information
The target_network_project
and target_network_name
fields in the networks_peers
table are used only during the initial peering process. Once both sides have mutually agreed the connection, the target_network_id
field will be populated on both sides with the ID of the respective peered network, and the target_network_project
and target_network_name
fields will be cleared. This is so that in the future if we add the ability to rename ovn
networks that the peerings will reflect the updated name (by looking it up via ID).
The state of the network peering will be derived from the value of target_network_id
. If it is <=0 then the state is “Pending”, if it is >0 then it is “Created”.
A network peering will be considered as a “user” of the peered network, and so will prevent network deletion until the peering is deleted.