Fail to add cluster node with "Mismatching config for network lxdbr0: different values for keys: volatile.bridge.hwaddr"

The cluster nodes are now using LXD 4.4

Today I wanted to add a new node to the cluster. Freshly installed Ubuntu 20.04 on that node yesterday. I’m following the same procedure as for the two recent nodes.

$ sudo lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=roer]: 
What IP address or DNS name should be used to reach this node? [default=172.16.16.33]: 
Are you joining an existing cluster? (yes/no) [default=no]: yes
IP address or FQDN of an existing cluster node: ijssel.ghs.nl
Cluster fingerprint: f3a7079038205003c4806208104f643ade069877304ac647019c3455320d92a6
You can validate this fingerprint by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password: 
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "lvm.vg_name" property for storage pool "local": 
Choose "source" property for storage pool "local": /dev/md1
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: yes
config: {}
networks: []
storage_pools: []
profiles: []
cluster:
  server_name: roer
  enabled: true
  member_config:
  - entity: storage-pool
    name: local
    key: lvm.vg_name
    value: ""
    description: '"lvm.vg_name" property for storage pool "local"'
  - entity: storage-pool
    name: local
    key: source
    value: /dev/md1
    description: '"source" property for storage pool "local"'
  cluster_address: ijssel.ghs.nl:8443
  cluster_certificate: |
    -----BEGIN CERTIFICATE-----
    MIICAzCCAYmgAwIBAgIQNCGwcCf7Asx45h/1jwyJZDAKBggqhkjOPQQDAzA0MRww
...
    -----END CERTIFICATE-----
  server_address: 172.16.16.33:8443
  cluster_password: ...

Error: Failed to join cluster: Failed request to add member: Mismatching config for network lxdbr0: different values for keys: volatile.bridge.hwaddr

After this error I installed bridge-utils. Could that be of influence? Anyway, I’ll try again.

Can you show output from the other nodes of:

lxc network show lxdbr0
lxc network show lxdbr0 --target=<your node1>
lxc network show lxdbr0 --target=<your node2>

New node is called roer

root@roer:~# lxc network show lxdbr0
config:
  ipv4.address: 10.189.232.1/24
  ipv4.dhcp.ranges: 10.189.232.100-10.189.232.249
  ipv4.nat: "true"
  ipv6.address: none
  volatile.bridge.hwaddr: 00:16:3e:d1:c2:2a
description: ""
name: lxdbr0
type: bridge
used_by: []
managed: true
status: Created
locations:
- none

From ijssel

root@ijssel:~# lxc network show lxdbr0
config:
  ipv4.address: 10.189.232.1/24
  ipv4.dhcp.ranges: 10.189.232.100-10.189.232.249
  ipv4.nat: "true"
  ipv6.address: none
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/jenkins-template
- /1.0/profiles/default
managed: true
status: Created
locations:
- ijssel
- luts
- rijn

root@ijssel:~# lxc network show lxdbr0 --target rijn
config:
  ipv4.address: 10.189.232.1/24
  ipv4.dhcp.ranges: 10.189.232.100-10.189.232.249
  ipv4.nat: "true"
  ipv6.address: none
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/jenkins-template
- /1.0/profiles/default
managed: true
status: Created
locations:
- ijssel
- luts
- rijn

root@ijssel:~# lxc network show lxdbr0 --target luts
config:
  ipv4.address: 10.189.232.1/24
  ipv4.dhcp.ranges: 10.189.232.100-10.189.232.249
  ipv4.nat: "true"
  ipv6.address: none
description: ""
name: lxdbr0
type: bridge
used_by:
- /1.0/instances/jenkins-template
- /1.0/profiles/default
managed: true
status: Created
locations:
- ijssel
- luts
- rijn

Were the original two nodes added to the cluster before LXD 4.4 out of interest?

I have a similar issue, in my case, it seems that the preexisting nodes in the cluster have not set the

volatile.bridge.hwaddr

value, whereas the onw that I try to add does put some value there automatically.

Yes, they all were using LXD 4.3

In my case, yes, the preexisting cluster was created a few weeks ago.

OK so this looks like a bug introduced by LXD 4.4 that introduces a stable MAC address across all nodes in a cluster, but too avoid disruption of live nodes we only apply the new stable MAC on network config edit. But this appears to be breaking adding new nodes where the new MAC shouldnt be generated. I’ll look into it.

@keesbghs

I’ve got a workaround for you in the meantime. However please do not use it if you’re using the fan networking mode as this might also have a separate issue that I need to look into.

Before adding the new node, ensure existing nodes are upgraded to LXD 4.4 and then on one of the nodes run:

lxc network edit lxdbr0

Then exit out of the edit window without making any changes.

You network will briefly drop and restart your lxdbr0 interface and you should then see a volatile.bridge.hwaddr key in the output of lxc network show lxdbr0.

At this point you should be able to add the new node.

Thanks!

I am using fan network though :frowning:

OK hold off for now, ive got fixes coming for both issues.

First part was OK, I now have a volatile.bridge.hwaddr on the cluster.

Then I did a reinstall (doing the whole dance of cleaning up the LVM volume group, uninstalling lxd, rebooting, installing lxd). Doing the same lxd init now gave me this error.

Error: Failed to join cluster: Failed to initialize member: Failed to initialize storage pools and networks: Failed to create storage pool 'local': Custom loop file locations are not supported

It’s something with my /dev/md1, which did not get back after reboot. I’ll figure this out.

OK, a few reboots later … /dev/md1 is available again.

@tomp The workaround was successful. Thanks.

I’ve got four nodes in my cluster now.

$ sudo lxc cluster list
+--------+---------------------------+----------+--------+-------------------+--------------+----------------+
|  NAME  |            URL            | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE | FAILURE DOMAIN |
+--------+---------------------------+----------+--------+-------------------+--------------+----------------+
| ijssel | https://172.16.16.54:8443 | YES      | ONLINE | fully operational | x86_64       | default        |
+--------+---------------------------+----------+--------+-------------------+--------------+----------------+
| luts   | https://172.16.16.45:8443 | YES      | ONLINE | fully operational | x86_64       | default        |
+--------+---------------------------+----------+--------+-------------------+--------------+----------------+
| rijn   | https://172.16.16.59:8443 | YES      | ONLINE | fully operational | x86_64       | default        |
+--------+---------------------------+----------+--------+-------------------+--------------+----------------+
| roer   | https://172.16.16.33:8443 | NO       | ONLINE | fully operational | x86_64       | default        |
+--------+---------------------------+----------+--------+-------------------+--------------+----------------+

Good stuff. I’ve got a PR up that will sort out the issue for fan bridges too:

And will follow up with a fix for adding LXD 4.4 nodes to a network that was defined pre-4.4 without needing the workaround.

Thanks!

Any clue when will it be available for upgrade in snap?

PR to fix the situation for new nodes without the workaround above:

@stgraber can advise once it has been added to the snap.

PR to fix this has been merged now so should be in the snap soon.