Failed to join cluster due to missing "config" part in "lxc profile show"

OK so I’ve recreated your issue by removing the key manually:

lxd sql global 'delete from storage_pools_config where value="LXDThinPool"'
root@cluster-v3:~# lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=cluster-v3]: 
What IP address or DNS name should be used to reach this node? [default=10.109.89.60]: 
Are you joining an existing cluster? (yes/no) [default=no]: yes
IP address or FQDN of an existing cluster node: 10.109.89.20
Cluster fingerprint: 4f7cefc7b40d0d525d11cc6b05a30bcbb24ff3cd0564944fb270582fdaeffaae
You can validate this fingerprint by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password: 
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "lvm.vg_name" property for storage pool "local": 
Choose "size" property for storage pool "local": 
Choose "source" property for storage pool "local": 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 
Error: Failed to join cluster: Failed request to add member: Mismatching config for storage pool local: different values for keys: lvm.thinpool_name

I don’t know how it happened in your case, probably something to do with mixing older and newer versions of the code and trying to join to older node. However I’ve not been able to re-create that issue with LXD 4.3.

To fix it, here is what I did:

First wipe the new node you’re trying to add to the cluster so there is no existing LVM config or LXD config:

snap remove lxd
vgremove /dev/local
pvremove /dev/local
reboot

Now lets fix the missing config key in your first node:

lxc shell cluster-v1
lxd sql global 'insert into storage_pools_config(storage_pool_id,key,value) VALUES(1,"lvm.thinpool_name","LXDThinPool")'

Now lets reinstall LXD on your new node:

lxc shell cluster-v3
snap install lxd
lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=cluster-v3]: 
What IP address or DNS name should be used to reach this node? [default=10.109.89.60]: 
Are you joining an existing cluster? (yes/no) [default=no]: yes
IP address or FQDN of an existing cluster node: 10.109.89.20
Cluster fingerprint: 4f7cefc7b40d0d525d11cc6b05a30bcbb24ff3cd0564944fb270582fdaeffaae
You can validate this fingerprint by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password: 
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "size" property for storage pool "local": 
Choose "source" property for storage pool "local": 
Choose "lvm.vg_name" property for storage pool "local": 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 

This works for me.

However please note that this still results in your first node apparently using a loopback file for LVM.

To confirm this run pvs on your first node, if you see something like /dev/loop3 rather than /dev/md1 then your first node isn’t using /devmd1.

In that case if possible I would suggest blowing away node 1 (although make sure you have backups of any containers on there) and then starting again with a fresh cluster, as it means you also won’t have to do the fix above.

If not possible then you’d probably need to look at creating a new storage pool manually, moving across your containers to the new pool, then updating your profiles to use the new pool and remove the old one. At that point you can then add your 2nd node without using the fix above too.

Thanks, I’ll give that a try.

BTW. My pvs looks quite normal

root@ijssel:~# pvs
  PV         VG    Fmt  Attr PSize   PFree 
  /dev/md0   vg0   lvm2 a--  199,87g 49,87g
  /dev/md1   local lvm2 a--    3,41t     0 

How would I be able to see if this local.img is a loopback mount? Although I have been working with UNIX/Linux for ages (30+ years) I have a hard time understanding how to analyze snap environments.

Don’t we need to set the node_id as well? I tried the command above and the query now gives

$ sudo lxd sql global 'select * from storage_pools_config'
+----+-----------------+---------+-------------------+------------------------------------------+
| id | storage_pool_id | node_id |        key        |                  value                   |
+----+-----------------+---------+-------------------+------------------------------------------+
| 3  | 1               | 1       | source            | /var/snap/lxd/common/lxd/disks/local.img |
| 4  | 1               | 1       | lvm.vg_name       | local                                    |
| 5  | 1               | <nil>   | lvm.thinpool_name | LXDThinPool                              |
+----+-----------------+---------+-------------------+------------------------------------------+

No node_id, as that key is isn’t node-specific (thats the issue, its missing and other nodes need it).

losetup -a should list all the loopback sources.

sudo pvdisplay -m should show you what physical devices back your volume group.

OK. I’ll continue with this then.

But I noticed that on my very first cluster attempt (Ubuntu 18.04, LXD 3.0.3) the query gives:

$ sudo lxd sql global 'select * from storage_pools_config'
+----+-----------------+---------+-------------------------+-------------+
| id | storage_pool_id | node_id |           key           |    value    |
+----+-----------------+---------+-------------------------+-------------+
| 30 | 4               | 1       | source                  | local       |
| 31 | 4               | 1       | volatile.initial_source | /dev/md1    |
| 33 | 4               | 1       | lvm.thinpool_name       | LXDThinPool |
| 34 | 4               | 1       | lvm.vg_name             | local       |
+----+-----------------+---------+-------------------------+-------------+

This looks quite normal. Nothing with loopback mounts

pvdisplay -m
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               local
  PV Size               3,41 TiB / not usable 4,00 MiB
  Allocatable           yes (but full)
  PE Size               4,00 MiB
  Total PE              894180
  Free PE               0
  Allocated PE          894180
  PV UUID               VAFIq7-EMB4-mAND-6Ezp-W7Yu-9BXx-y1c1uj
   
  --- Physical Segments ---
  Physical extent 0 to 255:
    Logical volume	/dev/local/lvol0_pmspare
    Logical extents	0 to 255
  Physical extent 256 to 893923:
    Logical volume	/dev/local/LXDThinPool_tdata
    Logical extents	0 to 893667
  Physical extent 893924 to 894179:
    Logical volume	/dev/local/LXDThinPool_tmeta
    Logical extents	0 to 255
   
  --- Physical volume ---
  PV Name               /dev/md0
  VG Name               vg0
  PV Size               199,87 GiB / not usable 3,00 MiB
  Allocatable           yes 
  PE Size               4,00 MiB
  Total PE              51167
  Free PE               12767
  Allocated PE          38400
  PV UUID               AgFNEY-81xR-2xHn-oRIw-Qvj0-tTXB-0oQ4gn
   
  --- Physical Segments ---
  Physical extent 0 to 25599:
    Logical volume	/dev/vg0/root
    Logical extents	0 to 25599
  Physical extent 25600 to 38399:
    Logical volume	/dev/vg0/home
    Logical extents	0 to 12799
  Physical extent 38400 to 51166:
    FREE

In driver_lvm.go I found this piece

// Create creates the storage pool on the storage device.
func (d *lvm) Create() error {
	d.config["volatile.initial_source"] = d.config["source"]

	defaultSource := loopFilePath(d.name)
	var err error
	var pvExists, vgExists bool
	var pvName string
	var vgTags []string

	revert := revert.New()
	defer revert.Fail()

	if d.config["source"] == "" || d.config["source"] == defaultSource {
		// We are using an LXD internal loopback file.
		d.config["source"] = defaultSource
		if d.config["lvm.vg_name"] == "" {
			d.config["lvm.vg_name"] = d.name
		}

loopFilePath is creating the file name that we see.

// loopFilePath returns the loop file path for a storage pool.
func loopFilePath(poolName string) string {
	return filepath.Join(shared.VarPath("disks"), fmt.Sprintf("%s.img", poolName))
}

I don’t see a loopback mount for that /var/snap/lxd/common/lxd/disks/local.img

$ sudo losetup -a | grep -v delete
/dev/loop15: [64768]:5113996 (/var/lib/snapd/snaps/core_9436.snap)
/dev/loop4: [64768]:5115209 (/var/lib/snapd/snaps/core_9665.snap)
/dev/loop21: [64768]:5112538 (/var/lib/snapd/snaps/lxd_16100.snap)
/dev/loop5: [64768]:5115357 (/var/lib/snapd/snaps/core18_1754.snap)
/dev/loop22: [64768]:5114957 (/var/lib/snapd/snaps/core18_1880.snap)
/dev/loop20: [64768]:5115147 (/var/lib/snapd/snaps/lxd_16044.snap)

OK good, so because your volume group defined in lvm.vg_name exists in /dev/ already, LXD isn’t trying to mount the loopback file specified in your config. Which is why its working, even though it appears you have some old config left in your node from previous iterations of trying to set this up.

Now, do you think I should change the source value to /dev/md1?

Yes its probably worth changing it to the vg name for consistency.

The second node is now part of the cluster, yippee :slight_smile:

1 Like

Thanks @tomp, not only for finding a solution, but also for the nice mini tutorial to setup a virtual cluster (see above).

1 Like