OVN setup examples

I have a whole sequence of warnings in the syslog, I think from an earlier try at getting OVN working:

# journalctl -b 0 -o short-precise -u ovn* -u ovs* -u openvswitch* -u lxd -e
Apr 28 11:50:25.447981 albans ovn-controller[6331]: ovs|00315|patch|WARN|Bridge 'lxdovn25' not found for network 'k8sbr0'
Apr 28 11:51:25.448890 albans ovn-controller[6331]: ovs|00316|patch|WARN|Bridge 'lxdovn25' not found for network 'k8sbr0'
Apr 28 11:52:25.449200 albans ovn-controller[6331]: ovs|00317|patch|WARN|Bridge 'lxdovn25' not found for network 'k8sbr0'
Apr 28 11:53:25.449496 albans ovn-controller[6331]: ovs|00318|patch|WARN|Bridge 'lxdovn25' not found for network 'k8sbr0'
Apr 28 11:54:25.450258 albans ovn-controller[6331]: ovs|00319|patch|WARN|Bridge 'lxdovn25' not found for network 'k8sbr0'
Apr 28 11:55:25.450096 albans ovn-controller[6331]: ovs|00320|patch|WARN|Bridge 'lxdovn25' not found for network 'k8sbr0'
Apr 28 11:56:25.450353 albans ovn-controller[6331]: ovs|00321|patch|WARN|Bridge 'lxdovn25' not found for network 'k8sbr0'
Apr 28 11:57:25.450862 albans ovn-controller[6331]: ovs|00322|patch|WARN|Bridge 'lxdovn25' not found for network 'k8sbr0'
Apr 28 11:58:25.451603 albans ovn-controller[6331]: ovs|00323|patch|WARN|Bridge 'lxdovn25' not found for network 'k8sbr0'

there are many more before and they’re continuing.

$ lxc network list
+---------+----------+---------+--------------+------+---------------------------+---------+---------+
|  NAME   |   TYPE   | MANAGED |     IPV4     | IPV6 |        DESCRIPTION        | USED BY |  STATE  |
+---------+----------+---------+--------------+------+---------------------------+---------+---------+
| br-int  | bridge   | NO      |              |      |                           | 0       |         |
+---------+----------+---------+--------------+------+---------------------------+---------+---------+
| dmz0    | physical | NO      |              |      |                           | 0       |         |
+---------+----------+---------+--------------+------+---------------------------+---------+---------+
| eth0    | physical | NO      |              |      |                           | 0       |         |
+---------+----------+---------+--------------+------+---------------------------+---------+---------+
| lxdbr0  | bridge   | YES     | 10.99.0.1/16 | none | Default local LXD network | 2       | CREATED |
+---------+----------+---------+--------------+------+---------------------------+---------+---------+
| lxdfan0 | bridge   | YES     |              |      | LXD cluster network       | 3       | CREATED |
+---------+----------+---------+--------------+------+---------------------------+---------+---------+
# ovs-vsctl show
466b9882-6a72-4934-9dc9-1e939bb97950
    Bridge br-int
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "2.13.1"

Any advice on what I should delete from where to fix this?

Can you show output of sudo ovn-nbctl show please

And also sudo ovs-vsctl list open_vswitch

1 Like

# ovn-nbctl show produces no output.

# ovn-sbctl show
Chassis "486b381c-b94b-4172-978f-90635f048955"
    hostname: albans.domuz
    Encap geneve
        ip: "10.1.0.215"
        options: {csum="true"}
# ovs-vsctl list open_vswitch
_uuid               : 466b9882-6a72-4934-9dc9-1e939bb97950
bridges             : [04e7f203-69e8-4365-9e40-282877f98a80]
cur_cfg             : 17
datapath_types      : [netdev, system]
datapaths           : {}
db_version          : "8.2.0"
dpdk_initialized    : false
dpdk_version        : none
external_ids        : {hostname=albans.domuz, ovn-bridge-mappings="k8sbr0:lxdovn25", ovn-encap-ip="10.1.0.215", ovn-encap-type=geneve, ovn-remote="unix:/var/run/ovn/ovnsb_db.sock", rundir="/var/run/openvswitch", system-id="486b381c-b94b-4172-978f-90635f048955"}
iface_types         : [erspan, geneve, gre, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : []
next_cfg            : 17
other_config        : {}
ovs_version         : "2.13.1"
ssl                 : []
statistics          : {}
system_type         : ubuntu
system_version      : "20.04"

Its the ovn-bridge-mappings key that is the issue.

If you do sudo ovs-vsctl remove openvswitch . external_ids ovn-bridge-mappings that should clear it.

Thanks. That’s fixed it. :slight_smile:

As a matter of interest, where is this config stored? I deleted the /var/lib/ovn databases but didn’t find the open_vswitch config files.

Or is there a simple command to wipe all these configs when I want to restart from a clean setup?

They are in /var/lib/openvswitch/ I think.

1 Like

Hi,
I am also looking a fine documantation for ovn-lxd relation.
Regards.

1 Like

Have you got the single node example in the docs working? That would be the first step, and then to think about how you want to connect the virtual routers to the external network via a physical port or a bridge (such as lxdbr0).

Thanks @tomp, I’m going to test a little bit in my environment and share the result.
Regards.

hi @tomp a quick Q about LXD OVN cluster setup:
I have 3 x hosts, clustered in LXD, each with ovn-host and ovn-central installed, and I’d like OVN to have some resilience so if the “controller” (10.1.0.215) fails (which hasn’t much resource, but has the ingress & egress proxies etc.) the other two continue untroubled.

My specific question is: what whould I set external_ids:ovn-remote to?
It is, at present, a unix socket but should this be the local IP Address or something else?

I have the config:
on each server:

# ovs-vsctl set open_vswitch . \
	external_ids:system-id=$( hostname ) \
	external_ids:ovn-remote=unix:/var/run/ovn/ovnsb_db.sock \
    external_ids:ovn-encap-type=geneve \
    external_ids:ovn-encap-ip=$( hostname -I | grep -o '\b10\.1\.0\.[0-9]\+\b' )

and /etc/default/ovn-central & /etc/default/ovn-host:

OVN_CTL_OPTS= \
  --db-nb-addr=$( hostname -I | grep -o '\b10\.1\.0\.[0-9]\+\b' ) \
  --db-sb-addr=$( hostname -I | grep -o '\b10\.1\.0\.[0-9]\+\b' ) \
  --db-nb-cluster-local-addr=$( hostname -I | grep -o '\b10\.1\.0\.[0-9]\+\b' ) \
  --db-sb-cluster-local-addr=$( hostname -I | grep -o '\b10\.1\.0\.[0-9]\+\b' ) \
  --db-nb-cluster-remote-addr=10.1.0.215 \
  --db-sb-cluster-remote-addr=10.1.0.215 \
  --ovn-northd-nb-db=tcp:10.1.0.215:6641,tcp:10.1.0.213:6641,tcp:10.1.0.214:6641 \
  --ovn-northd-sb-db=tcp:10.1.0.215:6642,tcp:10.1.0.213:6642,tcp:10.1.0.214:6642

I think you set it to the same as you’ve done for the OVN_CTL_OPTS's ovn-northd-sb-db setting, i.e tcp:10.1.0.215:6642,tcp:10.1.0.213:6642,tcp:10.1.0.214:6642.

Also, ensure LXD knows about the multiple northbound DBs by using:

lxc config set network.ovn.northbound_connection=tcp:10.1.0.215:6641,tcp:10.1.0.213:6641,tcp:10.1.0.214:6641

This is important, as although right now we use ovn-nbctl under the hood, this may not always be the case (we are thinking about interacting with the DB directly), and so LXD will then not use the OVN_CTL_OPTS setting.

seems to be working for now. :slight_smile: :crossed_fingers:

Hmmm … that change seems to have broken LXD - OVN so I’ve reverted both changes. external_ids:ovn-remote is set as unix:/var/run/ovn/ovnsb_db.sock and lxc config network.ovn.northbound_connection is now set as unix:/var/run/ovn/ovnnb_db.sock.

with the suggested settings:
edit network lxdbr0 to have:


  ipv4.address: 10.1.1.1/24

  ipv4.dhcp.ranges: 10.1.1.8-10.1.1.127
  ipv4.ovn.ranges: 10.1.1.128-10.1.1.251
  ipv4.routes: 10.3.128.0/17, 241.0.0.0/8

then …

$ lxc network create test-ovn --type=ovn network=lxdbr0
Error: Failed to run: ovn-nbctl --db tcp:10.1.0.215:6641,tcp:10.1.0.213:6641,tcp:10.1.0.214:6641 ha-chassis-group-add lxd-net46: ovn-nbctl: tcp:10.1.0.215:6641,tcp:10.1.0.213:6641,tcp:10.1.0.214:6641: database connection failed (Connection refused)

$ lxc config set network.ovn.northbound_connection=unix:/var/run/ovn/ovnnb_db.sock
$ lxc network delete test-ovn 
Network test-ovn deleted

$ lxc network create test-ovn --type=ovn network=lxdbr0
Error: Failed getting OVS Chassis ID: invalid syntax
# ovs-vsctl list open_vswitch
[sudo] password for albans: 
_uuid               : 466b9882-6a72-4934-9dc9-1e939bb97950
bridges             : [04e7f203-69e8-4365-9e40-282877f98a80]
cur_cfg             : 21
datapath_types      : [netdev, system]
datapaths           : {}
db_version          : "8.2.0"
dpdk_initialized    : false
dpdk_version        : none
external_ids        : {hostname=albans.domuz, ovn-encap-ip="10.1.0.215", ovn-encap-type="geneve,vxlan", ovn-openflow-probe-interval="15000", ovn-remote="tcp:10.1.0.215:6641,tcp:10.1.0.213:6641,tcp:10.1.0.214:6641", ovn-remote-probe-interval="5000", rundir="/var/run/openvswitch", system-id=albans}
iface_types         : [erspan, geneve, gre, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : []
next_cfg            : 21
other_config        : {}
ovs_version         : "2.13.1"
ssl                 : []
statistics          : {}
system_type         : ubuntu
system_version      : "20.04"

# ovs-vsctl show
466b9882-6a72-4934-9dc9-1e939bb97950
    Bridge br-int
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "2.13.1"

# ovn-nbctl show
switch cec22a68-4d83-47ec-8331-5ad314cfc557 (lxd-net47-ls-ext)
    port lxd-net47-ls-ext-lsp-router
        type: router
        router-port: lxd-net47-lr-lrp-ext
    port lxd-net47-ls-ext-lsp-provider
        type: localnet
        addresses: ["unknown"]
switch fa827850-c229-4602-88a6-2592001ac3e8 (lxd-net48-ls-int)
    port lxd-net48-ls-int-lsp-router
        type: router
        router-port: lxd-net48-lr-lrp-int
switch 607470f8-fe19-4408-80d7-20a736f73a22 (lxd-net47-ls-int)
    port lxd-net47-ls-int-lsp-router
        type: router
        router-port: lxd-net47-lr-lrp-int
switch 44adb685-a6bf-418f-8060-7aa307f5c110 (lxd-net48-ls-ext)
    port lxd-net48-ls-ext-lsp-provider
        type: localnet
        addresses: ["unknown"]
    port lxd-net48-ls-ext-lsp-router
        type: router
        router-port: lxd-net48-lr-lrp-ext
router a31763a4-5706-4c8a-a8f8-e0f9a10ee514 (lxd-net48-lr)
    port lxd-net48-lr-lrp-int
        mac: "00:16:3e:09:d5:5d"
        networks: ["10.230.167.1/24"]
    port lxd-net48-lr-lrp-ext
        mac: "00:16:3e:09:d5:5d"
        networks: ["10.1.1.128/24"]
    nat a11f6768-b77d-4c02-b84d-b7934b35d81c
        external ip: "10.1.1.128"
        logical ip: "10.230.167.0/24"
        type: "snat"
router 91203d42-3c92-4c55-aa05-f23acbd094b3 (lxd-net47-lr)
    port lxd-net47-lr-lrp-ext
        mac: "00:16:3e:f7:eb:62"
        networks: ["10.1.1.128/24"]
    port lxd-net47-lr-lrp-int
        mac: "00:16:3e:f7:eb:62"
        networks: ["10.4.194.1/24"]
    nat bd1ab8ff-0322-48c8-af83-c34fa0eda54a
        external ip: "10.1.1.128"
        logical ip: "10.4.194.0/24"
        type: "snat"

# ovn-sbctl show
<<nothing>>

# ovn-appctl connection-status
not connected

#  ovs-vsctl set open_vswitch . \
 external_ids:system-id=$( hostname ) \
     external_ids:ovn-remote-probe-interval=5000 \
     external_ids:ovn-openflow-probe-interval=15000 \
 external_ids:ovn-remote=unix:/var/run/ovn/ovnsb_db.sock \
     external_ids:ovn-encap-type=geneve,vxlan \
     external_ids:ovn-encap-ip=$( hostname -I | grep -o '\b10\.1\.0\.[0-9]\+\b' )

<<stop & start the ovn-* services on each host>>

# ovn-appctl connection-status
connected

I’ve edited lxdbr0 to remove the ovn config parameters, so there’s no OVN left in LXD. Is it safe to just delete all the lxd* routers & switches which are listed?
Or better to just clear all the /var/lib/{ovn,ovs,open_vswitch}/* databases and start again?

Left out some detail. On each host:

# ovn-sbctl show
Chassis "486b381c-b94b-4172-978f-90635f048955"
    hostname: albans.domuz
    Encap vxlan
        ip: "10.1.0.215"
        options: {csum="true"}
    Encap geneve
        ip: "10.1.0.215"
        options: {csum="true"}

# ovn-sbctl show
Chassis "0393361e-b4dc-4241-8479-e8ef6849c4c6"
    hostname: grantham.domuz
    Encap geneve
        ip: "10.1.0.213"
        options: {csum="true"}
    Encap vxlan
        ip: "10.1.0.213"
        options: {csum="true"}

# ovn-sbctl show
Chassis "9267a9f6-8de7-45b0-b546-701510dc1591"
    hostname: uxbridge.domuz
    Encap vxlan
        ip: "10.1.0.214"
        options: {csum="true"}
    Encap geneve
        ip: "10.1.0.214"
        options: {csum="true"}

Good morning @tomp and thanks for your help so far.

Removing vxlan from the list of encapsulations got me a step further, and created a new crash:

$ sudo lxc network create test-ovn --type=ovn network=lxdbr0
Error: failed to notify peer 10.1.0.213:8443: Failed adding OVS chassis "0393361e-b4dc-4241-8479-e8ef6849c4c6" with priority 5421 to chassis group "lxd-net50": Failed to run: ovn-nbctl --db unix:/var/lib/snapd/hostfs/run/ovn/ovnnb_db.sock ha-chassis-group-add-chassis lxd-net50 0393361e-b4dc-4241-8479-e8ef6849c4c6 5421: 2021-05-05T05:32:29Z|00002|ovsdb_idl|WARN|OVN_Northbound database lacks BFD table (database needs upgrade?)
2021-05-05T05:32:29Z|00003|ovsdb_idl|WARN|Forwarding_Group table in OVN_Northbound database lacks external_ids column (database needs upgrade?)
2021-05-05T05:32:29Z|00004|ovsdb_idl|WARN|Load_Balancer table in OVN_Northbound database lacks options column (database needs upgrade?)
2021-05-05T05:32:29Z|00005|ovsdb_idl|WARN|Load_Balancer table in OVN_Northbound database lacks selection_fields column (database needs upgrade?)
2021-05-05T05:32:29Z|00006|ovsdb_idl|WARN|Logical_Router_Policy table in OVN_Northbound database lacks external_ids column (database needs upgrade?)
2021-05-05T05:32:29Z|00007|ovsdb_idl|WARN|Logical_Router_Policy table in OVN_Northbound database lacks nexthops column (database needs upgrade?)
2021-05-05T05:32:29Z|00008|ovsdb_idl|WARN|Logical_Router_Policy table in OVN_Northbound database lacks options column (database needs upgrade?)
2021-05-05T05:32:29Z|00009|ovsdb_idl|WARN|Logical_Router_Port table in OVN_Northbound database lacks ipv6_prefix column (database needs upgrade?)
2021-05-05T05:32:29Z|00010|ovsdb_idl|WARN|Logical_Router_Static_Route table in OVN_Northbound database lacks bfd column (database needs upgrade?)
2021-05-05T05:32:29Z|00011|ovsdb_idl|WARN|Logical_Router_Static_Route table in OVN_Northbound database lacks options column (database needs upgrade?)
2021-05-05T05:32:29Z|00012|ovsdb_idl|WARN|Meter table in OVN_Northbound database lacks fair column (database needs upgrade?)
2021-05-05T05:32:29Z|00013|ovsdb_idl|WARN|NAT table in OVN_Northbound database lacks allowed_ext_ips column (database needs upgrade?)
2021-05-05T05:32:29Z|00014|ovsdb_idl|WARN|NAT table in OVN_Northbound database lacks exempted_ext_ips column (database needs upgrade?)
2021-05-05T05:32:29Z|00015|ovsdb_idl|WARN|NAT table in OVN_Northbound database lacks external_port_range column (database needs upgrade?)
2021-05-05T05:32:29Z|00016|ovsdb_idl|WARN|NB_Global table in OVN_Northbound database lacks hv_cfg_timestamp column (database needs upgrade?)
2021-05-05T05:32:29Z|00017|ovsdb_idl|WARN|NB_Global table in OVN_Northbound database lacks nb_cfg_timestamp column (database needs upgrade?)
2021-05-05T05:32:29Z|00018|ovsdb_idl|WARN|NB_Global table in OVN_Northbound database lacks sb_cfg_timestamp column (database needs upgrade?)
ovn-nbctl: lxd-net50: ha_chassi_group name not found

Logs on the 3 hosts:
10.1.0.215 (controller):
<< nothing in that time period >>

10.1.0.213:

May 05 05:32:29.778736 grantham ovsdb-server[4290]: ovs|00005|jsonrpc|WARN|unix#0: receive error: Connection reset by peer
May 05 05:32:29.778861 grantham ovsdb-server[4290]: ovs|00006|reconnect|WARN|unix#0: connection dropped (Connection reset by peer)
May 05 05:32:29.885061 grantham ovsdb-server[4290]: ovs|00007|jsonrpc|WARN|unix#2: receive error: Connection reset by peer
May 05 05:32:29.885185 grantham ovsdb-server[4290]: ovs|00008|reconnect|WARN|unix#2: connection dropped (Connection reset by peer)
May 05 05:32:29.908084 grantham ovsdb-server[4290]: ovs|00009|jsonrpc|WARN|unix#3: receive error: Connection reset by peer
May 05 05:32:29.908203 grantham ovsdb-server[4290]: ovs|00010|reconnect|WARN|unix#3: connection dropped (Connection reset by peer)

4290 is the PID of ovnnb_db

10.1.0.214:

May 05 05:32:29.742857 uxbridge ovsdb-server[4583]: ovs|00005|jsonrpc|WARN|unix#0: receive error: Connection reset by peer
May 05 05:32:29.742910 uxbridge ovsdb-server[4583]: ovs|00006|reconnect|WARN|unix#0: connection dropped (Connection reset by peer)
May 05 05:32:29.826524 uxbridge ovsdb-server[4583]: ovs|00007|jsonrpc|WARN|unix#2: receive error: Connection reset by peer
May 05 05:32:29.826584 uxbridge ovsdb-server[4583]: ovs|00008|reconnect|WARN|unix#2: connection dropped (Connection reset by peer)
May 05 05:32:29.841995 uxbridge ovsdb-server[4583]: ovs|00009|jsonrpc|WARN|unix#3: receive error: Connection reset by peer
May 05 05:32:29.842055 uxbridge ovsdb-server[4583]: ovs|00010|reconnect|WARN|unix#3: connection dropped (Connection reset by peer)

4583 is also ovnnb_db
apart from, what looks to me, a spelling mistake ha_chassi_group, it looks like there’s a connection problem between the hosts. I’m investigating.

I’m not really clear on what you’re trying to achieve, for instance, why are you using a mixture of geneve and vxlans at the same time?

However, what I can say is that:

  1. You must certainly not use the OVN southbound connection details in lxc config set network.ovn.northbound_connection - this will not work, and if it is for you, then something else is wrong.
  2. I’m confused why your ovn-sbctl show command is showing different things on different hosts. This seems wrong to me, like they are not using the same northbound database as their source. Also I’m not clear what hosts are involved, do you have separate hosts for the OVN NB and SB database cluster vs the chassis hosts that run the actual LXD instances?

My recommendation is to start simple and then work up the complexity from there.
So I would start with a simple 4 node setup; 3x LXD nodes in a cluster and a separate OVN DB node.
Or use one of the nodes as a single NB/SB DB node.

This is what I described here:

Once you’re happy with that, then you can look at adding clustering to the OVN DB itself.
@stgraber has done this with the LXD infrastructure so he may be able to show you the connection strings he used for LXD’s northbound setting and OVS’s external_ids:ovn-remote setting?

I’m going to have a go at setting up a LXD and OVN cluster on 3 nodes later so will post my findings here once done.

I’ve given up, for now, trying to get a 3 node OVN cluster working. I’ve gone the simple route: creating a controller node and setting the other 2 to sync nb & sb from that.

Took about 2 hours, mainly because I had to repeat it 3 times to get it right, and I got a log flood & high CPU because of a typo in the controller config.

If anyone wants to repeat it, the primary sources are:
https://www.ovn.org/support/dist-docs/ovn-ctl.8.html
https://blog.oddbit.com/post/2019-12-19-ovn-and-dhcp/

I’ve gotten chance now to put down some notes on setting up an OVN cluster: