Restore instances from the ceph storage after removing cluster node

I was surprised to see that the all instances attached to the node which was removed from the cluster disappeared from the cluster. I had to forcing node removal after problem with the system storage on that node.
All instances in the cluster located on shared Ceph storages and all instances’ volumes including instances listed in the cluster.
What is the correct way to recover these instances ?

incus admin recover should be able to find them

Thank you Stephane for a quick replay.

I have tried it already. It stops with the message: Error: Failed validation request: Failed checking volumes on pool "remote": Instance "<instance_name>" in project "<project_name>" already has storage DB record.
As I wrote earlier, volumes are on the ceph storages and they are registered in the incus cluster already.

You probably want to look at incus admin sql global "SELECT * FROM instances" to confirm that the instance records are gone.

Then look at incus admin sql global "SELECT * FROM storage_volumes" to check for volume entries that didn’t get lost. Those entries will need to be deleted from the database before the recovery can succeed.

Thank you for a suggestion.

I think I need to make volumes backup before.

P.S.
How can I check what other entity used this volume (instance root fs):

$ incus storage volume ls <storage> -f simple name=dns-03 
+-----------+--------+-------------+--------------+---------+----------+
|   TYPE    |  NAME  | DESCRIPTION | CONTENT-TYPE | USED BY | LOCATION |
+-----------+--------+-------------+--------------+---------+----------+
| container | dns-03 |             | filesystem   | 1       |          |
+-----------+--------+-------------+--------------+---------+----------+

After deleting orphan entities recovery process went further, but stopped with Error: Failed import request: Failed importing instance "dc-03" in project "infra": Invalid option for volume "<volume_name>" option "volatile.uuid".

Ah, that’s a former LXD install that was migrated before we tweaked our tooling to strip that volatile.uuid stuff…

That’s going to be a bit annoying to deal with…

Basically what you need to do is manually rbd map the Ceph volume, then mount it, finally edit the backup.yaml file to remove that volatile.uuid entry. The unmount and unmap it and try the recovery again.

That’s the plan for me now!
Thanks a lot for the assistance!

P.S.
Actually I can rebuild all of the instances except one as those instances are scripted already and data stored on separated volumes. However, I’m aware I need to delete root fs volumes before. Now I can’t todo that with incus commands as I cleared references in DB.

Everything has been recovered.
Much appreciated!

2 Likes