I am trying to get a cluster of 3 small bare metal hosts set up by referencing the example of the incus-deploy. I was able to use the incus-deploy to get a functioning cluster among several virtual machines, and I’m trying to replicate the same behavior on 3 physical hosts. As far as I can tell I have replicated the configuration and steps, but am seeing issues connecting to the OVN Northbound (NB) database on port 6641. The OVN SB database on port 6642 is also not working, and I suspect it’s related. I tried to walk through all the things I could find online to troubleshoot the issue. I expect I have things very close and likely one little thing is off, but am not finding anything pointing to my problem.
root@impa:~# source /etc/ovn/alias.sh
root@impa:~# ovn-nbctl show
ovn-nbctl: ssl:[10.10.30.20]:6641,ssl:[10.10.30.21]:6641,ssl:[10.10.30.22]:6641: database connection failed (Connection refused)
root@impa:/var/log/ovn# ovn-sbctl show
ovn-sbctl: ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642: database connection failed (Connection refused)
root@impa:~# ovs-appctl -t /run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
9a6d
Name: OVN_Northbound
Cluster ID: 52cf (52cf64ee-ca30-4076-b453-a9e413a34e98)
Server ID: 9a6d (9a6df369-bcfc-46a4-b15e-05a476c7936d)
Address: tcp:[10.10.30.20]:6643
Status: cluster member
Role: leader
Term: 8
Leader: self
Vote: self
Last Election started 543700 ms ago, reason: leadership_transfer
Last Election won: 543690 ms ago
Election timer: 1000
Log: [2, 11]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->03e1 ->5531 <-03e1 <-5531
Disconnections: 2
Servers:
03e1 (03e1 at tcp:[10.10.30.21]:6643) next_index=11 match_index=10 last msg 172 ms ago
9a6d (9a6d at tcp:[10.10.30.20]:6643) (self) next_index=10 match_index=10
5531 (5531 at tcp:[10.10.30.22]:6643) next_index=11 match_index=10 last msg 172 ms ago
From looking at the other functioning incus-deploy cluster it is showing that the same process that runs the ovn nb database (pid 2835
here) and is listening on port 6643
should also be listening on port 6641
, but it is not listening on port 6641
(or 6642
) on any of my 3 physical hosts even though port 6643
is listening and active, and I’m not sure what is missing.
root@impa:~# ps fauxww | grep 'ov[s|n]'
root 632 0.0 0.0 12216 6792 ? S<s Nov09 0:02 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
root 697 0.2 0.0 532028 10236 ? S<Lsl Nov09 0:44 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root 3139 0.0 0.0 3244 1792 pts/1 S+ 00:53 0:00 | \_ tail -f /var/log/openvswitch/ovs-vswitchd.log /var/log/openvswitch/ovsdb-server.log /var/log/ovn/ovn-controller.log /var/log/ovn/ovn-northd.log /var/log/ovn/ovsdb-server-nb.log /var/log/ovn/ovsdb-server-sb.log
root 1040 0.0 0.0 308788 8980 ? S<sl Nov09 0:01 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/ovn/baremetal.server.key --certificate=/etc/ovn/baremetal.server.crt --ca-cert=/etc/ovn/baremetal.ca.crt --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/var/run/ovn/ovn-controller.pid --detach
root 2552 0.0 0.0 3168 1920 ? Ss 00:17 0:00 /bin/sh /usr/share/ovn/scripts/ovn-ctl run_nb_ovsdb --db-nb-create-insecure-remote=no --db-sb-create-insecure-remote=no --db-nb-addr=[10.10.30.20] --db-sb-addr=[10.10.30.20] --db-nb-cluster-local-addr=[10.10.30.20] --db-sb-cluster-local-addr=[10.10.30.20] --ov-northd-ssl-key=/etc/ovn/baremetal.server.key --ovn-northd-ssl-cert=/etc/ovn/baremetal.server.crt --ovn-northd-ssl-ca-cert=/etc/ovn/baremetal.ca.crt --ovn-northd-nb-db=ssl:[10.10.30.20]:6641,ssl:[10.10.30.21]:6641,ssl:[10.10.30.22]:6641 --ovn-northd-sb-db=ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642
root 2835 0.2 0.0 159840 8576 ? Sl 00:17 0:46 \_ ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db
root 2557 0.0 0.0 3168 1792 ? Ss 00:17 0:00 /bin/sh /usr/share/ovn/scripts/ovn-ctl run_sb_ovsdb --db-nb-create-insecure-remote=no --db-sb-create-insecure-remote=no --db-nb-addr=[10.10.30.20] --db-sb-addr=[10.10.30.20] --db-nb-cluster-local-addr=[10.10.30.20] --db-sb-cluster-local-addr=[10.10.30.20] --ov-northd-ssl-key=/etc/ovn/baremetal.server.key --ovn-northd-ssl-cert=/etc/ovn/baremetal.server.crt --ovn-northd-ssl-ca-cert=/etc/ovn/baremetal.ca.crt --ovn-northd-nb-db=ssl:[10.10.30.20]:6641,ssl:[10.10.30.21]:6641,ssl:[10.10.30.22]:6641 --ovn-northd-sb-db=ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642
root 2843 0.2 0.0 159832 8704 ? Sl 00:17 0:45 \_ ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-sb.log --remote=punix:/var/run/ovn/ovnsb_db.sock --pidfile=/var/run/ovn/ovnsb_db.pid --unixctl=/var/run/ovn/ovnsb_db.ctl --remote=db:OVN_Southbound,SB_Global,connections --private-key=db:OVN_Southbound,SSL,private_key --certificate=db:OVN_Southbound,SSL,certificate --ca-cert=db:OVN_Southbound,SSL,ca_cert --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers /var/lib/ovn/ovnsb_db.db
root 2845 0.0 0.0 163200 7748 ? S<sl 00:17 0:02 ovn-northd --private-key=/etc/ovn/baremetal.server.key --certificate=/etc/ovn/baremetal.server.crt --ca-cert=/etc/ovn/baremetal.ca.crt -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db=ssl:[10.10.30.20]:6641,ssl:[10.10.30.21]:6641,ssl:[10.10.30.22]:6641 --ovnsb-db=ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642 --no-chdir --log-file=/var/log/ovn/ovn-northd.log --pidfile=/var/run/ovn/ovn-northd.pid --detach
root@impa:~# ss -tulpn | grep ov
tcp LISTEN 0 64 10.10.30.20:6643 0.0.0.0:* users:(("ovsdb-server",pid=2835,fd=20))
tcp LISTEN 0 64 10.10.30.20:6644 0.0.0.0:* users:(("ovsdb-server",pid=2843,fd=20))
It seems like a lot of the pieces are in place and functioning, but something is missing:
root@impa:~# /usr/share/ovn/scripts/ovn-ctl status_northd
ovn-northd is running with pid 2845
root@impa:~# /usr/share/ovn/scripts/ovn-ctl status_ovsdb
* OVN Northbound DB is running
* OVN Southbound DB is running
root@impa:~# /usr/share/ovn/scripts/ovn-ctl status_controller
ovn-controller is running with pid 1040
root@impa:~# ovs-vsctl list open_vswitch
_uuid : 61dfae23-e8ff-461c-a860-a65202943eed
bridges : [faae3f33-6b45-4810-93f4-5c304f5fbeac]
cur_cfg : 1
datapath_types : [netdev, system]
datapaths : {system=82cfabe9-8b3c-4331-bdd4-4bb381b616cc}
db_version : "8.7.0"
dpdk_initialized : false
dpdk_version : none
external_ids : {hostname=impa, ovn-encap-ip="10.10.30.20", ovn-encap-type=geneve, ovn-remote="ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642", rundir="/var/run/openvswitch", system-id="8c874de6-0571-4ebb-9d6f-127aa32e80ad"}
iface_types : [afxdp, afxdp-nonpmd, bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, srv6, stt, system, tap, vxlan]
manager_options : []
next_cfg : 1
other_config : {ovn-chassis-idx-8c874de6-0571-4ebb-9d6f-127aa32e80ad="", vlan-limit="0"}
ovs_version : "3.4.0"
ssl : []
statistics : {}
system_type : ubuntu
system_version : "22.04"
root@impa:~# ovs-vsctl show
61dfae23-e8ff-461c-a860-a65202943eed
Bridge br-int
fail_mode: secure
datapath_type: system
Port br-int
Interface br-int
type: internal
ovs_version: "3.4.0"
root@impa:/var/log/ovn# cat ovsdb-server-nb.log
2024-11-10T00:00:18.131Z|00033|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2024-11-10T00:17:49.111Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2024-11-10T00:17:49.118Z|00002|raft|INFO|local server ID is 9a6d
2024-11-10T00:17:49.121Z|00003|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.4.0
2024-11-10T00:17:49.121Z|00004|reconnect|INFO|tcp:[10.10.30.21]:6643: connecting...
2024-11-10T00:17:49.121Z|00005|reconnect|INFO|tcp:[10.10.30.22]:6643: connecting...
2024-11-10T00:17:49.122Z|00006|reconnect|INFO|tcp:[10.10.30.22]:6643: connected
2024-11-10T00:17:49.122Z|00007|reconnect|INFO|tcp:[10.10.30.21]:6643: connected
2024-11-10T00:17:49.443Z|00008|raft|INFO|server 03e1 is leader for term 6
2024-11-10T00:17:49.443Z|00009|raft|INFO|rejecting append_request because previous entry 6,8 not in local log (mismatch past end of log)
2024-11-10T00:17:49.765Z|00010|raft|INFO|tcp:10.10.30.22:36020: learned server ID 5531
2024-11-10T00:17:49.765Z|00011|raft|INFO|tcp:10.10.30.22:36020: learned remote address tcp:[10.10.30.22]:6643
2024-11-10T00:17:49.766Z|00012|raft|INFO|tcp:10.10.30.21:53938: learned server ID 03e1
2024-11-10T00:17:49.766Z|00013|raft|INFO|tcp:10.10.30.21:53938: learned remote address tcp:[10.10.30.21]:6643
2024-11-10T00:17:59.124Z|00014|memory|INFO|8576 kB peak resident set size after 10.0 seconds
2024-11-10T00:17:59.124Z|00015|memory|INFO|atoms:27 cells:34 monitors:0 n-weak-refs:0 raft-connections:4 raft-log:7 txn-history:2 txn-history-atoms:10
2024-11-10T04:40:44.940Z|00016|reconnect|INFO|tcp:[10.10.30.21]:6643: connection closed by peer
2024-11-10T04:40:44.948Z|00017|raft|INFO|server 5531 is leader for term 7
2024-11-10T04:40:45.151Z|00018|raft|INFO|tcp:10.10.30.21:44798: learned server ID 03e1
2024-11-10T04:40:45.151Z|00019|raft|INFO|tcp:10.10.30.21:44798: learned remote address tcp:[10.10.30.21]:6643
2024-11-10T04:40:45.942Z|00020|reconnect|INFO|tcp:[10.10.30.21]:6643: connecting...
2024-11-10T04:40:45.942Z|00021|reconnect|INFO|tcp:[10.10.30.21]:6643: connected
2024-11-10T04:40:53.551Z|00022|raft|INFO|received leadership transfer from 5531 in term 7
2024-11-10T04:40:53.551Z|00023|raft|INFO|term 8: starting election (vote)
2024-11-10T04:40:53.552Z|00024|reconnect|INFO|tcp:[10.10.30.22]:6643: connection closed by peer
2024-11-10T04:40:53.561Z|00025|raft|INFO|term 8: elected leader by 2+ of 3 servers
2024-11-10T04:40:53.764Z|00026|raft|INFO|tcp:10.10.30.22:45418: learned server ID 5531
2024-11-10T04:40:53.764Z|00027|raft|INFO|tcp:10.10.30.22:45418: learned remote address tcp:[10.10.30.22]:6643
2024-11-10T04:40:54.552Z|00028|reconnect|INFO|tcp:[10.10.30.22]:6643: connecting...
2024-11-10T04:40:54.553Z|00029|reconnect|INFO|tcp:[10.10.30.22]:6643: connected
root@impa:/var/log/ovn# ovs-vsctl --format=json --pretty get open_vswitch . external_ids:ovn-remote || echo "returned non-zero exit code: $?"
"ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642"
root@impa:/var/log/ovn# ovs-vsctl --format=json --pretty get open_vswitch . external_ids:ovn-encap-type
geneve
root@impa:/var/log/ovn# ovs-vsctl --format=json --pretty get open_vswitch . external_ids:ovn-encap-ip
"10.10.30.20"
root@impa:/var/log/ovn# ovs-vsctl --format=json --pretty get open_vswitch . external_ids:ovn-is-interconn
ovs-vsctl: no key "ovn-is-interconn" in Open_vSwitch record "." column external_ids
I have rebooted the servers and restarted services, deleted ovn databases, reinstalled, rechecked, and compared configurations to my working incus-deploy cluster and I’m not sure what else to look at. Any help on what I should be looking at would be appreciated.