Ovn cluster - ovn-nbctl 6641: database connection failed (Connection refused)

I am trying to get a cluster of 3 small bare metal hosts set up by referencing the example of the incus-deploy. I was able to use the incus-deploy to get a functioning cluster among several virtual machines, and I’m trying to replicate the same behavior on 3 physical hosts. As far as I can tell I have replicated the configuration and steps, but am seeing issues connecting to the OVN Northbound (NB) database on port 6641. The OVN SB database on port 6642 is also not working, and I suspect it’s related. I tried to walk through all the things I could find online to troubleshoot the issue. I expect I have things very close and likely one little thing is off, but am not finding anything pointing to my problem.

root@impa:~# source /etc/ovn/alias.sh
root@impa:~# ovn-nbctl show
ovn-nbctl: ssl:[10.10.30.20]:6641,ssl:[10.10.30.21]:6641,ssl:[10.10.30.22]:6641: database connection failed (Connection refused)

root@impa:/var/log/ovn# ovn-sbctl show
ovn-sbctl: ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642: database connection failed (Connection refused)
root@impa:~# ovs-appctl -t /run/ovn/ovnnb_db.ctl cluster/status OVN_Northbound
9a6d
Name: OVN_Northbound
Cluster ID: 52cf (52cf64ee-ca30-4076-b453-a9e413a34e98)
Server ID: 9a6d (9a6df369-bcfc-46a4-b15e-05a476c7936d)
Address: tcp:[10.10.30.20]:6643
Status: cluster member
Role: leader
Term: 8
Leader: self
Vote: self

Last Election started 543700 ms ago, reason: leadership_transfer
Last Election won: 543690 ms ago
Election timer: 1000
Log: [2, 11]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->03e1 ->5531 <-03e1 <-5531
Disconnections: 2
Servers:
    03e1 (03e1 at tcp:[10.10.30.21]:6643) next_index=11 match_index=10 last msg 172 ms ago
    9a6d (9a6d at tcp:[10.10.30.20]:6643) (self) next_index=10 match_index=10
    5531 (5531 at tcp:[10.10.30.22]:6643) next_index=11 match_index=10 last msg 172 ms ago

From looking at the other functioning incus-deploy cluster it is showing that the same process that runs the ovn nb database (pid 2835 here) and is listening on port 6643 should also be listening on port 6641, but it is not listening on port 6641 (or 6642) on any of my 3 physical hosts even though port 6643 is listening and active, and I’m not sure what is missing.

root@impa:~# ps fauxww | grep 'ov[s|n]'
root         632  0.0  0.0  12216  6792 ?        S<s  Nov09   0:02 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
root         697  0.2  0.0 532028 10236 ?        S<Lsl Nov09   0:44 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root        3139  0.0  0.0   3244  1792 pts/1    S+   00:53   0:00  |                       \_ tail -f /var/log/openvswitch/ovs-vswitchd.log /var/log/openvswitch/ovsdb-server.log /var/log/ovn/ovn-controller.log /var/log/ovn/ovn-northd.log /var/log/ovn/ovsdb-server-nb.log /var/log/ovn/ovsdb-server-sb.log
root        1040  0.0  0.0 308788  8980 ?        S<sl Nov09   0:01 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --private-key=/etc/ovn/baremetal.server.key --certificate=/etc/ovn/baremetal.server.crt --ca-cert=/etc/ovn/baremetal.ca.crt --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/var/run/ovn/ovn-controller.pid --detach
root        2552  0.0  0.0   3168  1920 ?        Ss   00:17   0:00 /bin/sh /usr/share/ovn/scripts/ovn-ctl run_nb_ovsdb --db-nb-create-insecure-remote=no --db-sb-create-insecure-remote=no --db-nb-addr=[10.10.30.20] --db-sb-addr=[10.10.30.20] --db-nb-cluster-local-addr=[10.10.30.20] --db-sb-cluster-local-addr=[10.10.30.20] --ov-northd-ssl-key=/etc/ovn/baremetal.server.key --ovn-northd-ssl-cert=/etc/ovn/baremetal.server.crt --ovn-northd-ssl-ca-cert=/etc/ovn/baremetal.ca.crt --ovn-northd-nb-db=ssl:[10.10.30.20]:6641,ssl:[10.10.30.21]:6641,ssl:[10.10.30.22]:6641 --ovn-northd-sb-db=ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642
root        2835  0.2  0.0 159840  8576 ?        Sl   00:17   0:46  \_ ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db
root        2557  0.0  0.0   3168  1792 ?        Ss   00:17   0:00 /bin/sh /usr/share/ovn/scripts/ovn-ctl run_sb_ovsdb --db-nb-create-insecure-remote=no --db-sb-create-insecure-remote=no --db-nb-addr=[10.10.30.20] --db-sb-addr=[10.10.30.20] --db-nb-cluster-local-addr=[10.10.30.20] --db-sb-cluster-local-addr=[10.10.30.20] --ov-northd-ssl-key=/etc/ovn/baremetal.server.key --ovn-northd-ssl-cert=/etc/ovn/baremetal.server.crt --ovn-northd-ssl-ca-cert=/etc/ovn/baremetal.ca.crt --ovn-northd-nb-db=ssl:[10.10.30.20]:6641,ssl:[10.10.30.21]:6641,ssl:[10.10.30.22]:6641 --ovn-northd-sb-db=ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642
root        2843  0.2  0.0 159832  8704 ?        Sl   00:17   0:45  \_ ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-sb.log --remote=punix:/var/run/ovn/ovnsb_db.sock --pidfile=/var/run/ovn/ovnsb_db.pid --unixctl=/var/run/ovn/ovnsb_db.ctl --remote=db:OVN_Southbound,SB_Global,connections --private-key=db:OVN_Southbound,SSL,private_key --certificate=db:OVN_Southbound,SSL,certificate --ca-cert=db:OVN_Southbound,SSL,ca_cert --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers /var/lib/ovn/ovnsb_db.db
root        2845  0.0  0.0 163200  7748 ?        S<sl 00:17   0:02 ovn-northd --private-key=/etc/ovn/baremetal.server.key --certificate=/etc/ovn/baremetal.server.crt --ca-cert=/etc/ovn/baremetal.ca.crt -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db=ssl:[10.10.30.20]:6641,ssl:[10.10.30.21]:6641,ssl:[10.10.30.22]:6641 --ovnsb-db=ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642 --no-chdir --log-file=/var/log/ovn/ovn-northd.log --pidfile=/var/run/ovn/ovn-northd.pid --detach

root@impa:~# ss -tulpn | grep ov
tcp   LISTEN 0      64       10.10.30.20:6643      0.0.0.0:*    users:(("ovsdb-server",pid=2835,fd=20))  
tcp   LISTEN 0      64       10.10.30.20:6644      0.0.0.0:*    users:(("ovsdb-server",pid=2843,fd=20))  

It seems like a lot of the pieces are in place and functioning, but something is missing:

root@impa:~# /usr/share/ovn/scripts/ovn-ctl status_northd
ovn-northd is running with pid 2845

root@impa:~# /usr/share/ovn/scripts/ovn-ctl status_ovsdb
 * OVN Northbound DB is running
 * OVN Southbound DB is running

root@impa:~# /usr/share/ovn/scripts/ovn-ctl status_controller
ovn-controller is running with pid 1040

root@impa:~# ovs-vsctl list open_vswitch
_uuid               : 61dfae23-e8ff-461c-a860-a65202943eed
bridges             : [faae3f33-6b45-4810-93f4-5c304f5fbeac]
cur_cfg             : 1
datapath_types      : [netdev, system]
datapaths           : {system=82cfabe9-8b3c-4331-bdd4-4bb381b616cc}
db_version          : "8.7.0"
dpdk_initialized    : false
dpdk_version        : none
external_ids        : {hostname=impa, ovn-encap-ip="10.10.30.20", ovn-encap-type=geneve, ovn-remote="ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642", rundir="/var/run/openvswitch", system-id="8c874de6-0571-4ebb-9d6f-127aa32e80ad"}
iface_types         : [afxdp, afxdp-nonpmd, bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, srv6, stt, system, tap, vxlan]
manager_options     : []
next_cfg            : 1
other_config        : {ovn-chassis-idx-8c874de6-0571-4ebb-9d6f-127aa32e80ad="", vlan-limit="0"}
ovs_version         : "3.4.0"
ssl                 : []
statistics          : {}
system_type         : ubuntu
system_version      : "22.04"

root@impa:~# ovs-vsctl show
61dfae23-e8ff-461c-a860-a65202943eed
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "3.4.0"
root@impa:/var/log/ovn# cat ovsdb-server-nb.log
2024-11-10T00:00:18.131Z|00033|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2024-11-10T00:17:49.111Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2024-11-10T00:17:49.118Z|00002|raft|INFO|local server ID is 9a6d
2024-11-10T00:17:49.121Z|00003|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.4.0
2024-11-10T00:17:49.121Z|00004|reconnect|INFO|tcp:[10.10.30.21]:6643: connecting...
2024-11-10T00:17:49.121Z|00005|reconnect|INFO|tcp:[10.10.30.22]:6643: connecting...
2024-11-10T00:17:49.122Z|00006|reconnect|INFO|tcp:[10.10.30.22]:6643: connected
2024-11-10T00:17:49.122Z|00007|reconnect|INFO|tcp:[10.10.30.21]:6643: connected
2024-11-10T00:17:49.443Z|00008|raft|INFO|server 03e1 is leader for term 6
2024-11-10T00:17:49.443Z|00009|raft|INFO|rejecting append_request because previous entry 6,8 not in local log (mismatch past end of log)
2024-11-10T00:17:49.765Z|00010|raft|INFO|tcp:10.10.30.22:36020: learned server ID 5531
2024-11-10T00:17:49.765Z|00011|raft|INFO|tcp:10.10.30.22:36020: learned remote address tcp:[10.10.30.22]:6643
2024-11-10T00:17:49.766Z|00012|raft|INFO|tcp:10.10.30.21:53938: learned server ID 03e1
2024-11-10T00:17:49.766Z|00013|raft|INFO|tcp:10.10.30.21:53938: learned remote address tcp:[10.10.30.21]:6643
2024-11-10T00:17:59.124Z|00014|memory|INFO|8576 kB peak resident set size after 10.0 seconds
2024-11-10T00:17:59.124Z|00015|memory|INFO|atoms:27 cells:34 monitors:0 n-weak-refs:0 raft-connections:4 raft-log:7 txn-history:2 txn-history-atoms:10
2024-11-10T04:40:44.940Z|00016|reconnect|INFO|tcp:[10.10.30.21]:6643: connection closed by peer
2024-11-10T04:40:44.948Z|00017|raft|INFO|server 5531 is leader for term 7
2024-11-10T04:40:45.151Z|00018|raft|INFO|tcp:10.10.30.21:44798: learned server ID 03e1
2024-11-10T04:40:45.151Z|00019|raft|INFO|tcp:10.10.30.21:44798: learned remote address tcp:[10.10.30.21]:6643
2024-11-10T04:40:45.942Z|00020|reconnect|INFO|tcp:[10.10.30.21]:6643: connecting...
2024-11-10T04:40:45.942Z|00021|reconnect|INFO|tcp:[10.10.30.21]:6643: connected
2024-11-10T04:40:53.551Z|00022|raft|INFO|received leadership transfer from 5531 in term 7
2024-11-10T04:40:53.551Z|00023|raft|INFO|term 8: starting election (vote)
2024-11-10T04:40:53.552Z|00024|reconnect|INFO|tcp:[10.10.30.22]:6643: connection closed by peer
2024-11-10T04:40:53.561Z|00025|raft|INFO|term 8: elected leader by 2+ of 3 servers
2024-11-10T04:40:53.764Z|00026|raft|INFO|tcp:10.10.30.22:45418: learned server ID 5531
2024-11-10T04:40:53.764Z|00027|raft|INFO|tcp:10.10.30.22:45418: learned remote address tcp:[10.10.30.22]:6643
2024-11-10T04:40:54.552Z|00028|reconnect|INFO|tcp:[10.10.30.22]:6643: connecting...
2024-11-10T04:40:54.553Z|00029|reconnect|INFO|tcp:[10.10.30.22]:6643: connected
root@impa:/var/log/ovn# ovs-vsctl --format=json --pretty get open_vswitch . external_ids:ovn-remote || echo "returned non-zero exit code: $?"
"ssl:[10.10.30.20]:6642,ssl:[10.10.30.21]:6642,ssl:[10.10.30.22]:6642"

root@impa:/var/log/ovn# ovs-vsctl --format=json --pretty get open_vswitch . external_ids:ovn-encap-type
geneve

root@impa:/var/log/ovn# ovs-vsctl --format=json --pretty get open_vswitch . external_ids:ovn-encap-ip
"10.10.30.20"

root@impa:/var/log/ovn# ovs-vsctl --format=json --pretty get open_vswitch . external_ids:ovn-is-interconn
ovs-vsctl: no key "ovn-is-interconn" in Open_vSwitch record "." column external_ids

I have rebooted the servers and restarted services, deleted ovn databases, reinstalled, rechecked, and compared configurations to my working incus-deploy cluster and I’m not sure what else to look at. Any help on what I should be looking at would be appreciated.