Help! LXD Cluster - Rejected when trying to add a node

I’m attempting to build a cluster from raspberry pis. Each raspberry pi is running void linux as the host OS with LXD version 4.14 (also tried 4.15) installed on the system (not snap). When trying to join a new node to the cluster with a password, i get a client rejected response. When connecting with a join token, i also get a client rejected response. Also tried preseed file, same result. Any idea what is going on?

Cluster (confirm version, set password and generate join token)

[void@51ead745 ~]$ lxc cluster list
+---------------+----------------------------+----------+--------------+----------------+-------------+--------+-------------------+
|     NAME      |            URL             | DATABASE | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+---------------+----------------------------+----------+--------------+----------------+-------------+--------+-------------------+
| rpi4-51ead745 | https://192.168.10.27:8443 | YES      | aarch64      | default        |             | ONLINE | Fully operational |
+---------------+----------------------------+----------+--------------+----------------+-------------+--------+-------------------+
[void@51ead745 ~]$ lxd --version
4.14
[void@51ead745 ~]$ lxc config set core.trust_password abc123
[void@51ead745 ~]$ lxc config get core.trust_password
true
[void@51ead745 ~]$ lxc cluster add rpi3-74f50651
Member rpi3-74f50651 join token: eyJzZXJ2ZXJfbmFtZSI6InJwaTMtNzRmNTA2NTEiLCJmaW5nZXJwcmludCI6IjZiODA0NzI0M2YzZGMxZjgwM2I2ZmE1MWRkZWMyMDZhMzllZGJlNDY5YTJhOGE1ODUxYzA5NGJhNzdiZTIzMzMiLCJhZGRyZXNzZXMiOlsiMTkyLjE2OC4xMC4yNzo4NDQzIl0sInNlY3JldCI6IjhjODczMjI1MzhjMGFiNmI5OTlhZDJlYjU0OWQzYWU2N2Q3MjQ2YmViMmNkOTE2YzI3ZDBkMjU1NTRhYTYzY2MifQ==

Node (attempt join with token)

[void@74f50651 ~]$ doas lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=74f50651]: rpi3-74f50651
What IP address or DNS name should be used to reach this node? [default=192.168.10.136]:
Are you joining an existing cluster? (yes/no) [default=no]: yes
Do you have a join token? (yes/no) [default=no]: yes
Please provide join token: eyJzZXJ2ZXJfbmFtZSI6InJwaTMtNzRmNTA2NTEiLCJmaW5nZXJwcmludCI6IjZiODA0NzI0M2YzZGMxZjgwM2I2ZmE1MWRkZWMyMDZhMzllZGJlNDY5YTJhOGE1ODUxYzA5NGJhNzdiZTIzMzMiLCJhZGRyZXNzZXMiOlsiMTkyLjE2OC4xMC4yNzo4NDQzIl0sInNlY3JldCI6IjhjODczMjI1MzhjMGFiNmI5OTlhZDJlYjU0OWQzYWU2N2Q3MjQ2YmViMmNkOTE2YzI3ZDBkMjU1NTRhYTYzY2MifQ==
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Error: Failed to retrieve cluster information: not authorized

Node (attempt join with password):

Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=74f50651]: rpi3-74f50651
What IP address or DNS name should be used to reach this node? [default=192.168.10.136]:
Are you joining an existing cluster? (yes/no) [default=no]: yes
Do you have a join token? (yes/no) [default=no]: no
IP address or FQDN of an existing cluster node: 192.168.10.27
Cluster fingerprint: 6b8047243f3dc1f803b6fa51ddec206a39edbe469a2a8a5851c094ba77be2333
You can validate this fingerprint by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password:
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Error: Failed to retrieve cluster information: not authorized

Cluster LXD Daemon’s debug output:

WARN[06-17|04:42:01] Rejecting request from untrusted client  ip=192.168.10.136:36536
WARN[06-17|04:50:44] Rejecting request from untrusted client  ip=192.168.10.136:36546

Update:
I also tried a preseed file, same result:

[void@74f50651 ~]$ cat seed.yml
cluster:
  enabled: true
  server_name: rpi3-74f50651
  server_address: 192.168.10.136:8443
  cluster_address: 192.168.10.27:8443
  cluster_certificate: "-----BEGIN CERTIFICATE-----

MIICCjCCAY+gAwIBAgIQW2sH5gsc895aY2XVHf2rgjAKBggqhkjOPQQDAzA2MRww

GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRYwFAYDVQQDDA1yb290QDUxZWFk

NzQ1MB4XDTIxMDYxNzAzNTE0M1oXDTMxMDYxNTAzNTE0M1owNjEcMBoGA1UEChMT

bGludXhjb250YWluZXJzLm9yZzEWMBQGA1UEAwwNcm9vdEA1MWVhZDc0NTB2MBAG

ByqGSM49AgEGBSuBBAAiA2IABM6Btr/pDLjv3s/loRHBZm+UDOeZqEgEajdLSP40

lPCBBKpFU+e+BMpP5QsauR96e/YGLr3AHnHiS/VCvtGCc0wtPPL4oUkeJCBqDFdQ

yCveMhURxTK+lEmWjYMafKzKAaNiMGAwDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQM

MAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwKwYDVR0RBCQwIoIINTFlYWQ3NDWH

BH8AAAGHEAAAAAAAAAAAAAAAAAAAAAEwCgYIKoZIzj0EAwMDaQAwZgIxALG+tOAW

LZqKHfS5M7+mlyJ0/vb1f33v1gFvzsPz9BOYvjIs//gGWFTnNp9H9+jbgAIxAOH8

tNq348FbhzIHrXvEOK5HFeujnWGEs3Bjj9/dJ5lo0q8JuUv+5R2UC081Y2tMjA==

-----END CERTIFICATE-----
"
  cluster_password: abc123
  member_config:
  - entity: storage-pool
    name: default
    key: source
    value: ""
[void@74f50651 ~]$ cat seed.yml | lxd init --preseed
Error: Failed to join cluster: Failed to update cluster trust: Failed getting existing certificate: not authorized

## on the node's daemon:
[...]
DBUG[06-17|05:19:47] Sending request to LXD                   method=GET url=https://192.168.10.27:8443/1.0/certificates/4fd40a4fa70182713c9d334f2fbd8d3f5c8f47873863512dead0b00344d7a91f etag=
DBUG[06-17|05:19:47] Failure for task operation: 4b5008e1-6ece-4375-9ddf-c52446d1ac80: Failed to update cluster trust: Failed getting existing certificate: not authorized

## on the cluster daemon: 
WARN[06-17|05:19:47] Rejecting request from untrusted client  ip=192.168.10.136:36584

Any help is appreciated.

jsav0

Oh, maybe this is related to the issue (notice the issue/expiry date)

void@51ead745 ~> lxc config trust list
+--------------+---------------+------------------------------+-------------------------------+
| FINGERPRINT  |  COMMON NAME  |          ISSUE DATE          |          EXPIRY DATE          |
+--------------+---------------+------------------------------+-------------------------------+
| 4fd40a4fa701 | root@74f50651 | Jan 1, 1970 at 12:00am (UTC) | Dec 30, 1979 at 12:00am (UTC) |
+--------------+---------------+------------------------------+-------------------------------+
| 13365771a61c | root@51ead745 | Jun 17, 2021 at 3:32am (UTC) | Jun 15, 2031 at 3:32am (UTC)  |
+--------------+---------------+------------------------------+-------------------------------+

Not sure how that happened.

Update: If i remove the trust, it re-adds it with an incorrect timestamp when i attempt to join again.

# cluster removes the trust
void@51ead745 ~> lxc config trust remove 4fd40a4fa701

# a new node attempts to join
[void@74f50651 ~]$ cat seed.yml | doas lxd init --preseed
Error: Failed to join cluster: Failed to update cluster trust: Failed getting existing certificate: not authorized

# cluster adds the trust again with incorrect timestamp. 
void@51ead745 ~> lxc config trust list
+--------------+---------------+------------------------------+-------------------------------+
| FINGERPRINT  |  COMMON NAME  |          ISSUE DATE          |          EXPIRY DATE          |
+--------------+---------------+------------------------------+-------------------------------+
| 4fd40a4fa701 | root@74f50651 | Jan 1, 1970 at 12:00am (UTC) | Dec 30, 1979 at 12:00am (UTC) |
+--------------+---------------+------------------------------+-------------------------------+
| 13365771a61c | root@51ead745 | Jun 17, 2021 at 3:32am (UTC) | Jun 15, 2031 at 3:32am (UTC)  |
+--------------+---------------+------------------------------+-------------------------------+

# the date is correct on the established cluster and future nodes
void@51ead745 ~> date
Thu Jun 17 05:44:32 UTC 2021
 
[void@74f50651 ~]$ date
Thu Jun 17 05:44:43 UTC 2021

Solved. Must have been because i installed LXD from within a chroot environment before system ever booted and got an updated time from the NTP daemon. Uninstalling LXD, removing /var/lib/lxd and installing LXD again fixed it, eventually. Now the trust is added with valid timestamps and all methods of joining (password, join token and preseed) are all working.

Glad you got it resolved. The time issue has come up before. I’d like to find a way to better report that aside from just “not authorized” to give a better indication as to what is wrong.

Hi @tomp Is there a way to fix it without removing/reinstalling?
Is it possible to regenerate the certs post installation to fix the timestamps?

So you can remove a trusted cert from the LXD cluster trust store using:

lxc config trust list remove <fingerprint>

This will allow you to remove a problem certificate from the cluster trust store.

Then to get LXD to regenerate its certificate you can do:

rm /var/snap/lxd/common/lxd/{server.crt,server.key}
sudo systemctl reload snap.lxd.daemon

The problem at the moment though is that LXD won’t re-add it’s newly generated server cert to the cluster’s trust store, which would mean it would not be able to communicate with the other members.

I’ll see if we can easily get LXD to add itself to the trust store (if not already added) when it starts up.

I don’t use snap, but i see what you mean. it sounds like reloading the daemon generates the cert if it doesn’t exist in the lxd data directory.

Yeah in that case it would be:

rm /var/lib/lxd/{server.key,server.crt}

And then reload LXD.

The problem then is getting the new cert into the trust store with a type of “server” so that its used for intra-cluster communication.

That works. Though I’m not quite sure what you mean about the last bit. If I delete the fingerprint from the cluster’s trust store, then regenerate the cert on the prospective node, attempting to join the cluster populates the cluster’s store with the new cert as type “server” and it is authorized and joins as expected.

1 Like

Ah yeah if you’re joining a new member then that would work indeed (as the joining node adds itself to the cluster trust store using the join token when it joins).
I was coming at it from the perspective of replacing an expired cert on an already joined member.

Ohhh, that makes sense indeed! Thanks