"Failed to join cluster: server address already in use"

I’ve kicked and tried to rejoin a host to my incus cluster but it keeps throwing this message:

Failed to join cluster: server address already in use (1)

I can’t seem to find any reference to the IP address 192.168.1.34 anywhere so I’m unsure how to debug or proceed here :frowning:

Any hints as to where to go from here?

Version

incus version
Client version: 6.0.0
Server version: 6.0.0

Cluster Information

incus cluster list
+----------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
|   NAME   |            URL            |      ROLES      | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+----------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| incus-01 | https://192.168.1.31:8443 | database        | x86_64       | default        |             | ONLINE | Fully operational |
+----------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| incus-02 | https://192.168.1.9:8443  | database-leader | x86_64       | default        |             | ONLINE | Fully operational |
|          |                           | database        |              |                |             |        |                   |
+----------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+

Cluster Join Information

config: {}
networks: []
storage_pools: []
profiles: []
projects: []
cluster:
  server_name: incus-04
  enabled: true
  member_config:
  - entity: network
    name: eno1.30
    key: parent
    value: eno1.30
    description: '"parent" property for network "eno1.30"'
  - entity: storage-pool
    name: default
    key: lvm.thinpool_name
    value: ""
    description: '"lvm.thinpool_name" property for storage pool "default"'
  - entity: storage-pool
    name: default
    key: lvm.vg_name
    value: ""
    description: '"lvm.vg_name" property for storage pool "default"'
  - entity: storage-pool
    name: default
    key: source
    value: ""
    description: '"source" property for storage pool "default"'
  cluster_address: 192.168.1.9:8443
  cluster_certificate: |
    -----BEGIN CERTIFICATE-----
    MIICAzCCAYm[REDACTED]bqpQgQcAfzfy5EAQctL
    -----END CERTIFICATE-----
  server_address: 192.168.1.34:8443
  cluster_token: ""
  cluster_certificate_path: ""

Error: Failed to join cluster: Failed to join cluster: server address already in use (1)

Can you show incus admin cluster show on one of the existing servers?

I’m sorry I didn’t get this back to you sooner.

I will mention that the logs kept reporting this host as part of raft.

I ended up doing the following out of desperation:

sudo  incus admin sql local "DELETE FROM nodes WHERE name='incus-04';"

As there was still an entry.

After that I no longer got the error posted above but then the incus-04 node would just hang on cluster join.


I noticed the logs would mention raft was trying to reach the IP of 192.168.1.34:8443 from the other two servers.

I did this with no luck as the record would instantly come back (sorry about my moment of desperation :p):

while true; do incus admin sql local "DELETE FROM raft_nodes WHERE address='192.168.1.34:8443';"; sleep 10; done 
Rows affected: 1
Rows affected: 1
Rows affected: 1
...

I finally found this in the documentation on removing raft members:

sudo /usr/libexec/incus/incusd cluster remove-raft-node 192.168.1.34
You should run this command only if you ended up in an
inconsistent state where a node has been uncleanly removed (i.e. it doesn't show
up in "incus cluster list" but it's still in the raft configuration).

Do you want to proceed? (yes/no): yes

(P.s. many commands seem incorrect around cluster management? I had to use incusd but even then I had to mod the command)