I was surprised to see that the all instances attached to the node which was removed from the cluster disappeared from the cluster. I had to forcing node removal after problem with the system storage on that node.
All instances in the cluster located on shared Ceph storages and all instances’ volumes including instances listed in the cluster.
What is the correct way to recover these instances ?
incus admin recover
should be able to find them
Thank you Stephane for a quick replay.
I have tried it already. It stops with the message: Error: Failed validation request: Failed checking volumes on pool "remote": Instance "<instance_name>" in project "<project_name>" already has storage DB record
.
As I wrote earlier, volumes are on the ceph
storages and they are registered in the incus
cluster already.
You probably want to look at incus admin sql global "SELECT * FROM instances"
to confirm that the instance records are gone.
Then look at incus admin sql global "SELECT * FROM storage_volumes"
to check for volume entries that didn’t get lost. Those entries will need to be deleted from the database before the recovery can succeed.
Thank you for a suggestion.
I think I need to make volumes backup before.
P.S.
How can I check what other entity used this volume (instance root fs
):
$ incus storage volume ls <storage> -f simple name=dns-03
+-----------+--------+-------------+--------------+---------+----------+
| TYPE | NAME | DESCRIPTION | CONTENT-TYPE | USED BY | LOCATION |
+-----------+--------+-------------+--------------+---------+----------+
| container | dns-03 | | filesystem | 1 | |
+-----------+--------+-------------+--------------+---------+----------+
After deleting orphan entities recovery process went further, but stopped with Error: Failed import request: Failed importing instance "dc-03" in project "infra": Invalid option for volume "<volume_name>" option "volatile.uuid"
.
Ah, that’s a former LXD install that was migrated before we tweaked our tooling to strip that volatile.uuid
stuff…
That’s going to be a bit annoying to deal with…
Basically what you need to do is manually rbd map
the Ceph volume, then mount
it, finally edit the backup.yaml
file to remove that volatile.uuid
entry. The unmount and unmap it and try the recovery again.
That’s the plan for me now!
Thanks a lot for the assistance!
P.S.
Actually I can rebuild all of the instances except one as those instances are scripted already and data stored on separated volumes. However, I’m aware I need to delete root fs
volumes before. Now I can’t todo that with incus commands as I cleared references in DB.
Everything has been recovered.
Much appreciated!