Error creating OVN network

I have OVN working on a cluster with a controller + 2 x worker nodes (no NB DB & SB DB sync’ed from controller). When I create a test ovn network I get:

$ lxc network create test-ovn --type=ovn network=lxdbr0
Error: Failed getting OVS Chassis ID: invalid syntax

$ lxc network show lxdbr0 
config:
  bridge.mtu: "7152"
  dns.domain: lxd
  dns.mode: managed
  ipv4.address: 10.3.1.1/24
  ipv4.dhcp.ranges: 10.3.1.8-10.3.1.127
  ipv4.firewall: "false"
  ipv4.nat: "false"
  ipv4.ovn.ranges: 10.3.1.128-10.3.1.251
  ipv4.routes: 10.3.128.0/17, 241.0.0.0/8
  ipv6.address: none
  ipv6.routing: "false"
description: Default local LXD network
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/test-uxbridge-0
- /1.0/instances/test-uxbridge-1
- /1.0/profiles/default
- /1.0/profiles/lxdbr0
managed: true
status: Created
locations:
- albans
- grantham
- uxbridge

What’s wrong with the config?

On the controller (albans 10.1.0.215):

# ovs-vsctl list open_vswitch
_uuid               : 9a88292a-b04a-4a33-8c55-e56ba2924359
bridges             : [b5ca091f-7b36-40a5-b668-2937a357a925]
cur_cfg             : 8
datapath_types      : [netdev, system]
datapaths           : {}
db_version          : "8.2.0"
dpdk_initialized    : false
dpdk_version        : none
external_ids        : {hostname=albans.domuz, ovn-encap-ip="10.1.0.215", ovn-encap-type=geneve, ovn-nb="unix:/var/run/ovn/ovnnb_db.sock", ovn-remote="unix:/var/run/ovn/ovnsb_db.sock", ovn-remote-probe-interval="5000", rundir="/var/run/openvswitch", system-id=albans}
iface_types         : [erspan, geneve, gre, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : []
next_cfg            : 8
other_config        : {}
ovs_version         : "2.13.1"
ssl                 : []
statistics          : {}
system_type         : ubuntu
system_version      : "20.04"

# ovs-vsctl show
9a88292a-b04a-4a33-8c55-e56ba2924359
    Bridge br-int
        fail_mode: secure
        Port br-int
            Interface br-int
                type: internal
        Port ovn-granth-0
            Interface ovn-granth-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.1.0.213"}
        Port ovn-uxbrid-0
            Interface ovn-uxbrid-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.1.0.214"}
    ovs_version: "2.13.1"

# ovn-nbctl --no-leader-only --timeout=1 show
switch 9adccf00-fbe6-4371-b688-cace26dc023a (lxd-net57-ls-int)
    port lxd-net57-ls-int-lsp-router
        type: router
        router-port: lxd-net57-lr-lrp-int
switch 56ef2829-ad4e-4cea-82a0-7ee6f827ed5b (lxd-net57-ls-ext)
    port lxd-net57-ls-ext-lsp-provider
        type: localnet
        addresses: ["unknown"]
    port lxd-net57-ls-ext-lsp-router
        type: router
        router-port: lxd-net57-lr-lrp-ext
router e479e6ea-91ea-45f6-9a8b-f41aac92f685 (lxd-net57-lr)
    port lxd-net57-lr-lrp-ext
        mac: "00:16:3e:1e:e2:84"
        networks: ["10.3.1.128/24"]
    port lxd-net57-lr-lrp-int
        mac: "00:16:3e:1e:e2:84"
        networks: ["10.217.155.1/24"]
    nat 3ff219e7-9217-4a99-a8ee-ab3cbe589bed
        external ip: "10.3.1.128"
        logical ip: "10.217.155.0/24"
        type: "snat"

# ovn-sbctl --no-leader-only --timeout=1 show
Chassis grantham
    hostname: grantham.domuz
    Encap geneve
        ip: "10.1.0.213"
        options: {csum="true"}
Chassis uxbridge
    hostname: uxbridge.domuz
    Encap geneve
        ip: "10.1.0.214"
        options: {csum="true"}
Chassis albans
    hostname: albans.domuz
    Encap geneve
        ip: "10.1.0.215"
        options: {csum="true"}

# ovn-nbctl --no-leader-only --timeout=1 get-connection
ptcp:6641
punix:/var/run/ovn/ovnnb_db.sock

# ovn-sbctl --no-leader-only --timeout=1 get-connection
read-only role="" ptcp:6642
read-write role="" punix:/var/run/ovn/ovnsb_db.sock

# ovn-appctl connection-status
connected

$ ss -tuxlpn     | grep -e '^\s*tcp\s.*\b:664[0-5]\b' -e '^\s*u_str\s.*\bovn\b'     | sed -r -e 's/\s+$//' ;  echou_str LISTEN 0      64                    /var/run/ovn/ovn-controller.67480.ctl 871559                                                 * 0                       users:(("ovn-controller",pid=67480,fd=9))
u_str LISTEN 0      64                        /var/run/ovn/ovn-northd.67639.ctl 871339                                                 * 0                       users:(("ovn-northd",pid=67639,fd=9))
u_str LISTEN 0      64                               /var/run/ovn/ovnsb_db.sock 871365                                                 * 0                       users:(("ovsdb-server",pid=67494,fd=15))
u_str LISTEN 0      64                                /var/run/ovn/ovnsb_db.ctl 871367                                                 * 0                       users:(("ovsdb-server",pid=67494,fd=16))
u_str LISTEN 0      64                               /var/run/ovn/ovnnb_db.sock 871970                                                 * 0                       users:(("ovsdb-server",pid=67493,fd=14))
u_str LISTEN 0      64                                /var/run/ovn/ovnnb_db.ctl 871973                                                 * 0                       users:(("ovsdb-server",pid=67493,fd=16))
tcp   LISTEN 0      10                                               10.1.0.215:6641                                             0.0.0.0:*                       users:(("ovsdb-server",pid=67493,fd=15))
tcp   LISTEN 0      10                                               10.1.0.215:6642                                             0.0.0.0:*                       users:(("ovsdb-server",pid=67494,fd=14))

thanks
David

added:

$ uname -a
Linux albans 5.4.0-72-generic #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

$ snap --version
snap    2.49.2
snapd   2.49.2
series  16
ubuntu  20.04
kernel  5.4.0-72-generic

$ lxc --version
4.13

$ ovs-vsctl --version
ovs-vsctl (Open vSwitch) 2.13.1
DB Schema 8.2.0

$ ovn-controller --version
ovn-controller 20.03.1
Open vSwitch Library 2.13.1
OpenFlow versions 0x4:0x4

Can you run:

ovs-vsctl get open_vswitch . external_ids:system-id

And see what the output is.

And also the output of:

lxc config get network.ovn.northbound_connection
# ovs-vsctl get open_vswitch . external_ids:system-id
albans

$ cat /etc/openvswitch/system-id.conf 
albans

$ lxc config get network.ovn.northbound_connection
tcp:10.1.0.215:6641

I’ve just rebooted all the hosts, stopped and restarted the ovn- services (as they start in the wrong order and fail) and tried with lxc config set network.ovn.northbound_connection=unix:/var/run/ovn/ovnnb_db.sock

$ lxc network create test-ovn --type=ovn network=lxdbr0
Error: Failed getting OVS Chassis ID: invalid syntax

so still the same … I think it’s talking to the db and barfing.

It uses the same ovs command though see https://github.com/lxc/lxd/blob/master/lxd/network/openvswitch/ovs.go#L180

What the issue might be is that on all the systems I’ve seen system-id was a uuid and apparently it was quoted when it came out of ovs, so we unquote it, but perhaps you using a single word as system-id has changed the quoting in the output and upset the parser.

I looked at the go source code, that’s the issue, apparently if system id is a uuid it gets quoted but not if a simple word. I’ll confirm that and put a fix in, but trying the automatically generated system Id should fix that.

Yup. I got there a few minutes ago. I’ve:
on each host:

# ovs-vsctl remove open_vswitch . external_ids system-id

# rm -v /etc/openvswitch/system-id.conf

<< restart ovn- services >>

on the controller:

$ lxc config set network.ovn.northbound_connection=tcp:10.1.0.215:6641

and then

$ lxc network create test-ovn --type=ovn network=lxdbr0
Network test-ovn created

$ lxc network list
+----------+----------+---------+--------------+------+---------------------------+---------+---------+
|   NAME   |   TYPE   | MANAGED |     IPV4     | IPV6 |        DESCRIPTION        | USED BY |  STATE  |
+----------+----------+---------+--------------+------+---------------------------+---------+---------+
| br-int   | bridge   | NO      |              |      |                           | 0       |         |
+----------+----------+---------+--------------+------+---------------------------+---------+---------+
| dmz0     | physical | NO      |              |      |                           | 0       |         |
+----------+----------+---------+--------------+------+---------------------------+---------+---------+
| eth0     | physical | NO      |              |      |                           | 0       |         |
+----------+----------+---------+--------------+------+---------------------------+---------+---------+
| lxdbr0   | bridge   | YES     | 10.3.1.1/24  | none | Default local LXD network | 5       | CREATED |
+----------+----------+---------+--------------+------+---------------------------+---------+---------+
| lxdfan0  | bridge   | YES     |              |      | LXD cluster network       | 3       | CREATED |
+----------+----------+---------+--------------+------+---------------------------+---------+---------+
| lxdovn22 | bridge   | NO      |              |      |                           | 0       |         |
+----------+----------+---------+--------------+------+---------------------------+---------+---------+
| test-ovn | ovn      | YES     | 10.0.53.1/24 |      |                           | 0       | CREATED |
+----------+----------+---------+--------------+------+---------------------------+---------+---------+

Thanks
David

I used the hostname as the system-id as it makes it much easier to recognise which OVN things belong to which host.

I think this may be the same defect I was hitting with the clustered nb_db config I tried before - certainly it’s the same error message I was getting in the end, before I gave up and switch to a simpler config.

1 Like

If the system-id starts with a number then it is quoted in the output. I’ll put a fix into our parser to handle the scenario where its not quoted.

This should fix it: