Did I just delete my storage pool?

While trying to add a new cluster member, a failed init seemed to add the ceph pool as an incus storage item. The join failed and I tried to reset to try again. Noticed the ceph pool and did an incus storage delete remote

Later, I noticed that the pool was missing from ceph osd pool ls and my instances in my cluster were storage zombies. Is there a safer way to cleanup a failed cluster join?

client incus: 1:7.0-ubuntu22.04-202605061506

server incus: 1:6.21-ubuntu22.04-202602110127

Thank you.

Hmm, that would point towards missing validation on the Ceph storage driver.
I know that a few of the storage drivers will confirm that there is no leftover data on the underlying pool (osd pool here) before allowing it to be deleted.

Could you file an issue for that at Sign in to GitHub · GitHub?

This won’t really solve the problem of a bad join needing cleanup, but it will resolve the more problematic data loss side of things.

In many cases of a failed join, your approach is the right one. Remote storage is the outlier here where deleting can be a problem and you’d instead be better off cleaning it up from the DB.

The other pretty reliable alternative is to reboot the machine, wipe /var/lib/incus and then try again. That’s effectively what we do with Operations Center and IncusOS these days where we perform a factory reset of the Incus application for such cases.

1 Like