While trying to add a new cluster member, a failed init seemed to add the ceph pool as an incus storage item. The join failed and I tried to reset to try again. Noticed the ceph pool and did an incus storage delete remote
Later, I noticed that the pool was missing from ceph osd pool ls and my instances in my cluster were storage zombies. Is there a safer way to cleanup a failed cluster join?
Hmm, that would point towards missing validation on the Ceph storage driver.
I know that a few of the storage drivers will confirm that there is no leftover data on the underlying pool (osd pool here) before allowing it to be deleted.
This wonât really solve the problem of a bad join needing cleanup, but it will resolve the more problematic data loss side of things.
In many cases of a failed join, your approach is the right one. Remote storage is the outlier here where deleting can be a problem and youâd instead be better off cleaning it up from the DB.
The other pretty reliable alternative is to reboot the machine, wipe /var/lib/incus and then try again. Thatâs effectively what we do with Operations Center and IncusOS these days where we perform a factory reset of the Incus application for such cases.