How to add "config trust server" after token was deleted

I was cleaning up our “config trust” client tokens; I deleted the wrong server tokens for our three node storage cluster. I broke the “cluster” and our cluster cannot communicated with each other and our instances are listed in “ERROR” state. If we are to “incus list –all-projects”; the instances showing running are only on the node quarry the database directly

time=“2026-05-10T07:26:06Z” level=warning msg=“Failed adding member event listener client” err=“websocket: bad handshake\nnot authorized\nnot authorized” local=“127.0.0.16:8443” remote=“127.0.0.15:8443”
time=“2026-05-10T07:26:06Z” level=warning msg=“Rejecting request from untrusted client” ip=“172.20.40.15:55014”
time=“2026-05-10T07:26:13Z” level=warning msg=“Failed adding member event listener client” err=“websocket: bad handshake\nnot authorized\nnot authorized” local=“127.0.0.16:8443” remote=“127.0.0.15:8443”

The cluster nodes become offline and I have yet been able to rebuild new tokens. I have gone through and regenerated the certs and copied the new certificates on each node. No matter what I do.. I cannot regenerate new server tokens

I have tried many different ways but our cluster can’t speak together.

incus cluster update-certificate server.crt server.key
Error: not authorized
incus cluster add 3k
Error: The cluster already has a member with name: 3k
incus admin cluster list-database
+-------------------+
|      Address      |
+-------------------+
| 127.0.0.14:8443   |
+-------------------+
| 127.0.0.15:8443   |
+-------------------+
| 127.0.0.16:8443   |
+-------------------+
incus admin sql local "SELECT * FROM raft_nodes"
+----+-------------------+------+---------------------+
| id |      address      | role |        name         |
+----+-------------------+------+---------------------+
| 1  | 127.0.0.14:8443   | 0    | 3k                  |
+----+-------------------+------+---------------------+
| 2  | 127.0.0.15:8443   | 0    | 3m                  |
+----+-------------------+------+---------------------+
| 3  | 127.0.0.16:8443   | 0    | 3l                  |
+----+-------------------+------+---------------------+

All the research mention, do not panic and explain how to fix the solution in so many ways; none have been able to get our storage cluster back on line.

I ran: incus admin cluster recover-from-quorum-loss It appears we are still having certifiate issues with our cluster:

When starting incus daemon back up we received this:

“Failed adding member event listener client” err=“websocket: bad handshake\nnot authorized\nnot authorized”

msg=“Rejecting request from untrusted client”
level=warning msg=“No local trusted server certificates found, falling back to trusting network certificate” fingerprint=69ccaf070652b28b1bef17327fba9617fad464f>

msg=“Failed initializing instance” err=“Failed getting root disk: No root device could be found”

double free or corruption (out)

incus.service: Main process exited, code=dumped, status=6/ABRT

This is from one of our healthier incus clusters:

incus config trust list
+---------------------+--------+-------------+--------------+----------------------+
|        NAME         |  TYPE  | DESCRIPTION | FINGERPRINT  |     EXPIRY DATE      |
+---------------------+--------+-------------+--------------+----------------------+
|                  pc | server |             | 14aa7d8914aa | 2036/04/07 17:31 UTC |
+---------------------+--------+-------------+--------------+----------------------+
|                  pd | server |             | 72a6ce4beef0 | 2036/04/07 17:31 UTC |
+---------------------+--------+-------------+--------------+----------------------+
|                  pf | server |             | 05e47bc5a8e3 | 2036/04/07 17:31 UTC |
+---------------------+--------+-------------+--------------+----------------------+

These are what were accidentally removed on the cluster having issues. It appears they are required but we can’t.

All our cluster nodes within the list are listed as healthy

Hmm, this is a case where recovering right after you deleted those certificates probably wouldn’t have been too difficult. But now that you’ve run recovery-from-quorum-loss, it’s going to be pretty hard to figure out exactly what’s going on with the database and getting it back to normal…

After your mistake, I suspect that incus admin sql global "SELECT * FROM certificates" would probably still have worked, and so would have incus admin sql global "INSERT INTO certificates", but your recovery attempts made this impossible now :frowning:

The recover-from-quorum-loss made the server you ran this onto assume that it’s the only surviving database node. That’s obviously not true and the other servers are now trying to reach out to re-connect the database and are getting a bunch of auth errors.

On the upside, you now have access to the database again on that one server, but the downside to having run that command is that there is now no way for the servers to talk to each other, so the one server with a working database can’t get any state info from the other two and the other two are unable to access the database.

I don’t think you’re in an unrecoverable situation quite yet, but you’re definitely in a much worse situation than you were after deleting those certificates. I’ve never done that kind of recovery myself, but I suspect you should be able to:

  • Manually re-add the certificate entries to the global database on the server that has the working database
  • Make sure that the nodes table looks right
  • Stop Incus on all servers
  • Make sure that the raft_nodes table in the local database looks right (as well as the certificates entries) on all servers
  • Use incus admin cluster edit on the server with the working database and re-configure the raft cluster to be as it should be (3 voters)
  • Transfer the database/global content from the working server to the other two
  • Start Incus on all three servers and see if they talk to each other again

Stephane,

Thank you for your reply. I had to laugh, if only I took a step back before panicing. An AIML search did say.. “Don’t panic” and I should have listened. :open_mouth: I found some comments after I was able to get some rest, where I was better off if I had not spent the time trying to recover. Sadly, the panic rose. I was quite nervous running the recovery-from-quarm-loss.

Let me see what I can do tonight. I am thinking I need to add the cluster certs for sure.. Do I need to add the server certs as well?

Again, thank you.

Stephane,

That worked. Everything else I checked on was fine. The trick was both regeneration of the server tokens and changing the token type and coping the DB over to the other nodes. I can’t believe how it is/was to recreate the server tokens. If only I had just done that correctly.

Thank you so much. :slight_smile: