OVN networks stop working

Suddenly some of my containers lose internet connection. I’m using ovn networks and lxd projects.

Seems like the ovn issue, but i can’t figure out the cause. After reboot some networks start working while others that worked before stop.

Here is output from two containers and network info

root@xxx-playground-1:~# lxc shell c1 --project=8856
root@c1:~# ping -c 1 google.com
PING google.com (142.250.191.110) 56(84) bytes of data.
64 bytes from ord38s28-in-f14.1e100.net (142.250.191.110): icmp_seq=1 ttl=44 time=18.0 ms

--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 17.973/17.973/17.973/0.000 ms
root@c1:~# traceroute -m 3 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 3 hops max, 60 byte packets
 1  xxx-playground-1 (172.32.0.1)  0.960 ms  0.719 ms  0.782 ms
 2  _gateway.lxd (10.0.0.1)  0.578 ms  0.561 ms  0.535 ms
 3  * * *
root@c1:~# logout
root@xxx-playground-1:~# lxc shell c2 --project=8856
root@c2:~# ping -c 1 google.com
ping: google.com: Temporary failure in name resolution
root@c2:~# traceroute -m 3 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 3 hops max, 60 byte packets
 1  172.32.0.1 (172.32.0.1)  0.833 ms  0.632 ms  0.761 ms
 2  * * *
 3  * * *
root@c2:~# logout
root@xxx-playground-1:~# lxc network list --project=8856
+------------------+------+---------+---------------+------+-------------+---------+---------+
|       NAME       | TYPE | MANAGED |     IPV4      | IPV6 | DESCRIPTION | USED BY |  STATE  |
+------------------+------+---------+---------------+------+-------------+---------+---------+
| b-dunx2ryqeemzax | ovn  | YES     | 172.32.0.1/24 | none |             | 2       | CREATED |
+------------------+------+---------+---------------+------+-------------+---------+---------+
| b-tpxa1tfesa0y9p | ovn  | YES     | 172.32.0.1/24 | none |             | 1       | CREATED |
+------------------+------+---------+---------------+------+-------------+---------+---------+
| b-yi1ikajgyibw5j | ovn  | YES     | 172.32.0.1/24 | none |             | 3       | CREATED |
+------------------+------+---------+---------------+------+-------------+---------+---------+
root@xxx-playground-1:~# lxc network show b-tpxa1tfesa0y9p --project=8856
config:
  bridge.mtu: "1500"
  dns.domain: xxx.internal
  ipv4.address: 172.32.0.1/24
  ipv4.nat: "true"
  ipv6.address: none
  network: b-uplink
  volatile.network.ipv4.address: 10.0.1.19
description: ""
name: b-tpxa1tfesa0y9p
type: ovn
used_by:
- /1.0/instances/c1?project=8856
managed: true
status: Created
locations:
- xxx-playground-1
root@xxx-playground-1:~# lxc network show b-yi1ikajgyibw5j --project=8856
config:
  bridge.mtu: "1500"
  dns.domain: xxx.internal
  ipv4.address: 172.32.0.1/24
  ipv4.nat: "true"
  ipv6.address: none
  network: b-uplink
  volatile.network.ipv4.address: 10.0.0.14
description: ""
name: b-yi1ikajgyibw5j
type: ovn
used_by:
- /1.0/instances/c2?project=8856
- /1.0/instances/docker?project=8856
- /1.0/instances/martin-apache-mysql?project=8856
managed: true
status: Created
locations:
- xxx-playground-1
root@xxx-playground-1:~# lxc network show b-uplink
config:
  ipv4.address: 10.0.0.1/8
  ipv4.dhcp.ranges: 10.0.0.2-10.0.0.2
  ipv4.nat: "true"
  ipv4.ovn.ranges: 10.0.0.3-10.255.255.254
  ipv6.address: none
description: ""
name: b-uplink
type: bridge
used_by:
- /1.0/networks/b-a07z9pusv1v9ta
- /1.0/networks/b-a07z9pusv1v9ta?project=147269
- /1.0/networks/b-a1tgywhnvs4m8p?project=148084
- /1.0/networks/b-a866m5ijdn6ygj?project=146033
...
- /1.0/networks/b-ze24jnpzeb9yja?project=148142
- /1.0/networks/b-zoisght76r5ull?project=148032
- /1.0/networks/b-zqs6oxruy957et?project=145814
managed: true
status: Created
locations:
- xxx-playground-1

When this occurs, can you test whether the instances can ping the OVN router’s internal IP
172.32.0.1?

And whether the OVN router’s external IP 10.0.1.19 is reachable from the uplink network?

Also, can the instances still ping each other inside the OVN network?

upgrading ovn to 22.03 fixed the issue so far.
as for your questions i think they could not ping each other or the gateway

1 Like

the problem persists. i figured out that it happens every time lxd get an upgrade.
another clue might be that we have a cluster of 2 nodes. somewhere i read that minimum is 3. might that be the cause?