Not sure what’s happened, but I have three ongoing operations against an instance, two restart add one update, all say running, none can be cancelled. I’ve tried a reboot to no avail. Everything else “seems” to be Ok, but I can’d “do” anything to this instance.
Any ideas how I can recover from this?
I could sustain a delete of the Instance as I have backups… but I suspect as nothing else is working delete is going to freeze too.
Nothing of note in the logs so far as I can see … cluster, OVN raft etc all showing 100%.
??
Ok, found it. Looks like a.n.other node was experiencing an issue in the background (hung task problem as reported elsewhere) which was causing “some” operations on my node to ‘stuck’. On rebooting the node with the stuck kernel task, my operations cleared.
So a problem on one node seems able to lock up another node doing unrelated things (?)
So a problem on one node seems able to lock up another node doing unrelated things (?)
Yes. Welcome to clustering.
The alternative (which I use) is to run standalone incus instances without clustering. You can add each one as a separate remote, and still have the ability to manage and move instances across all nodes.
However, if you have a shared storage backend, and you want to do live migration of VMs, then I think you’ll still need clustering.
Mmm, no, I’m all containers … the reason I want clustering is so that I can use OVN to move traffic transparently between various points, rather than having to use fixed tunnels. So I can have a reverse proxy on my edge (in the cloud) that points directly to an address in the cluster that provides the associated service. When you have hundreds of containers, not having to worry about where they all are becomes a big issue, especially when they migrate between nodes … with OVN they always keep the same address … I can get half-way there with bridging, and indeed I may go back to that, but OVN feels like the “right” way to do it …
Looks vaguely like it might be fixed from 6.13-ish onwards … I’m currently trying 6.6 as it’s not something I saw prior to recent upgrades. (I’m currently on 6.12 which is the latest stable for RPi)