Ok, so this happened to me yesterday and I managed to fix it by wiping the node’s OVN databases and redeploying … but as I mentioned on another thread I suspected it would be back. TaDa!
So, this is a stand-alone INCUS, running on a local OVN, connected to a remote cluster. It can be set up and runs fine, but after rebooting, the OVN DHCP just doesn’t do any DHCP. It appears to set up all the ports and switches properly, it just doesn’t send any DHCP to Incus, which results in the instances not getting any IPV4 addresses. I’m saying IPV4, it would appear that it IS getting IPV6 addresses …
Looking at the OVN logs, when it’s working I see DHCPOFFER/ACK entries in the ovn controller log, after rebooting, these no longer appear. Clearing the OVN databases and re-deploying brings it all back again …
So it would appear DHCP is happy to start and work on a blank database, and on a database populated by INCUS, however it doesn’t then seem happy to ‘restart’ with the INCUS populated database. Can’t see any failure messages, it just doesn’t respond to DHCP.
I will continue to debug, but if anyone has any ideas as to what might be causing a problem like this, I’m all ears. DHCP in the cluster at the other end of the IC link seems fine … so it’s something very specific to this node …
Ok, so it kind of looks like Incus is starting instances on the wrong switch port, hence not getting the DHCP packets … at least that seems to be the observed effect. All the other workings from the logs seem to work going to plan …
But apparently not applied. I’m not sure how Incus gets leases for the network leases table, but I suspect it may be via a different route to instances, which seem to rely on DHCP OFFER/ACK packets, which don’t seem to be forthcoming. At this point, a wipe and restart is now failing to fix the issue for me, I seem stuck with no Ip’s … I guess wiping the Incus config is next …
Notably, despite being issued with IPV6 addresses, the to nodes can’t ping each other’s IPV6 address, so it looks like they’re getting the assignment, but not the associated connectivity … as if the container isn’t on the port the addresses were issues against (?)
bridge not found for localnet port 'incus-net28-ls-ext-lsp-provider' with network name 'UPLINK'; skipping
It’s a startup sequence issue, something there seems to be a bunch of with Debian at the moment. (the other biggie is that I need to manually “incus admin shutdown” when I reboot or it hangs)
It I manually restart ovn-controller following a reboot, DHCP springs back to life. As a random guess, incus is creating “UPLINK” and ovn-controller is starting up first … will check the deps …
So first, I’ve managed to solve the rebooting timing issue with the following, this yields an almost instant reboot rather them 6 minutes, the Timeout probably isn’t needed.
(reboot)
# incus ls -c ns4ts
+------+---------+------+-----------------+---------+
| NAME | STATE | IPV4 | TYPE | STATE |
+------+---------+------+-----------------+---------+
| demo | RUNNING | | CONTAINER | RUNNING |
+------+---------+------+-----------------+---------+
| npm | STOPPED | | CONTAINER (APP) | STOPPED |
+------+---------+------+-----------------+---------+
| temp | RUNNING | | CONTAINER | RUNNING |
+------+---------+------+-----------------+---------+
# service openvswitch-switch restart
.. wait a few secs ..
# incus ls -c ns4ts
+------+---------+-------------------+-----------------+---------+
| NAME | STATE | IPV4 | TYPE | STATE |
+------+---------+-------------------+-----------------+---------+
| demo | RUNNING | 10.103.0.3 (eth0) | CONTAINER | RUNNING |
+------+---------+-------------------+-----------------+---------+
| npm | STOPPED | | CONTAINER (APP) | STOPPED |
+------+---------+-------------------+-----------------+---------+
| temp | RUNNING | 10.103.0.4 (eth0) | CONTAINER | RUNNING |
+------+---------+-------------------+-----------------+---------+
So we appear to have a dependency hell where Incus creates “UPLINK”, openvswitch-switch then enables the DHCP service, then Incus starts up instances which use the DHCP service … but they’re not synchronised … so at the moment openvswitch-switch is failing to initialise properly which results in the DHCP service not being available.
I’m guessing the proper fix needs some adjustment in Incus re; timing and sync, however in the meantime I have the following;
(Apologies, this is a complete hack but it seems to work for me for now)