Failed to join cluster due to missing "config" part in "lxc profile show"

tomp · July 28, 2020, 9:22am

I’m also a bit confused why your 2nd node is using a dedicated block device /dev/md1 but your first node is using a local loopback image.

Its not wrong per-se, but loopback images are really only suitable for development, and with the 2nd node you’re trying to use /dev/md1 suggests not development purposes, so just want to flag up the first node is not using the same type of block device and won’t be as performant as the 2nd node.

keesbghs · July 28, 2020, 9:23am

lvs output on the first node

root@ijssel:~# lvs
  LV                                                                      VG    Attr       LSize   Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
  LXDThinPool                                                             local twi-aotz--  <3,41t                    7,89   4,74                            
  containers_artifac                                                      local Vwi---tz-k  <9,32g LXDThinPool                                               
  containers_gitlab--ci--01                                               local Vwi-aotz-k <51,23g LXDThinPool        35,69                                  
  containers_gitlab--ci--03                                               local Vwi-aotz-k <18,63g LXDThinPool        37,62                                  
  containers_jenkins--master1                                             local Vwi-aotz-k <18,63g LXDThinPool        20,11                                  
  containers_jenkins--slave001                                            local Vwi-aotz-k <18,63g LXDThinPool        12,32                                  
  images_4e15a9bde9a8d5d5e96b722c32b047f78aa0bd686a2755b2d428bd665c6a37de local Vwi---tz-k  <9,32g LXDThinPool                                               
  home                                                                    vg0   -wi-ao----  50,00g                                                           
  root                                                                    vg0   -wi-ao---- 100,00g

tomp · July 28, 2020, 9:24am

OK let me try and craft a DB query to fix this.

keesbghs · July 28, 2020, 9:25am

I saw that too. I am not sure how or why that happened. For sure it was not intentional.

tomp · July 28, 2020, 9:52am

OK so I’ve recreated your issue by removing the key manually:

lxd sql global 'delete from storage_pools_config where value="LXDThinPool"'

root@cluster-v3:~# lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=cluster-v3]: 
What IP address or DNS name should be used to reach this node? [default=10.109.89.60]: 
Are you joining an existing cluster? (yes/no) [default=no]: yes
IP address or FQDN of an existing cluster node: 10.109.89.20
Cluster fingerprint: 4f7cefc7b40d0d525d11cc6b05a30bcbb24ff3cd0564944fb270582fdaeffaae
You can validate this fingerprint by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password: 
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "lvm.vg_name" property for storage pool "local": 
Choose "size" property for storage pool "local": 
Choose "source" property for storage pool "local": 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 
Error: Failed to join cluster: Failed request to add member: Mismatching config for storage pool local: different values for keys: lvm.thinpool_name

I don’t know how it happened in your case, probably something to do with mixing older and newer versions of the code and trying to join to older node. However I’ve not been able to re-create that issue with LXD 4.3.

To fix it, here is what I did:

First wipe the new node you’re trying to add to the cluster so there is no existing LVM config or LXD config:

snap remove lxd
vgremove /dev/local
pvremove /dev/local
reboot

Now lets fix the missing config key in your first node:

lxc shell cluster-v1
lxd sql global 'insert into storage_pools_config(storage_pool_id,key,value) VALUES(1,"lvm.thinpool_name","LXDThinPool")'

Now lets reinstall LXD on your new node:

lxc shell cluster-v3
snap install lxd
lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=cluster-v3]: 
What IP address or DNS name should be used to reach this node? [default=10.109.89.60]: 
Are you joining an existing cluster? (yes/no) [default=no]: yes
IP address or FQDN of an existing cluster node: 10.109.89.20
Cluster fingerprint: 4f7cefc7b40d0d525d11cc6b05a30bcbb24ff3cd0564944fb270582fdaeffaae
You can validate this fingerprint by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password: 
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose "size" property for storage pool "local": 
Choose "source" property for storage pool "local": 
Choose "lvm.vg_name" property for storage pool "local": 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

This works for me.

However please note that this still results in your first node apparently using a loopback file for LVM.

To confirm this run pvs on your first node, if you see something like /dev/loop3 rather than /dev/md1 then your first node isn’t using /devmd1.

In that case if possible I would suggest blowing away node 1 (although make sure you have backups of any containers on there) and then starting again with a fresh cluster, as it means you also won’t have to do the fix above.

If not possible then you’d probably need to look at creating a new storage pool manually, moving across your containers to the new pool, then updating your profiles to use the new pool and remove the old one. At that point you can then add your 2nd node without using the fix above too.

keesbghs · July 28, 2020, 10:17am

Thanks, I’ll give that a try.

BTW. My pvs looks quite normal

root@ijssel:~# pvs
  PV         VG    Fmt  Attr PSize   PFree 
  /dev/md0   vg0   lvm2 a--  199,87g 49,87g
  /dev/md1   local lvm2 a--    3,41t     0

keesbghs · July 28, 2020, 10:38am

How would I be able to see if this local.img is a loopback mount? Although I have been working with UNIX/Linux for ages (30+ years) I have a hard time understanding how to analyze snap environments.

keesbghs · July 28, 2020, 12:21pm

tomp:

Now lets fix the missing config key in your first node:

lxc shell cluster-v1
lxd sql global 'insert into storage_pools_config(storage_pool_id,key,value) VALUES(1,"lvm.thinpool_name","LXDThinPool")'

Don’t we need to set the node_id as well? I tried the command above and the query now gives

$ sudo lxd sql global 'select * from storage_pools_config'
+----+-----------------+---------+-------------------+------------------------------------------+
| id | storage_pool_id | node_id |        key        |                  value                   |
+----+-----------------+---------+-------------------+------------------------------------------+
| 3  | 1               | 1       | source            | /var/snap/lxd/common/lxd/disks/local.img |
| 4  | 1               | 1       | lvm.vg_name       | local                                    |
| 5  | 1               | <nil>   | lvm.thinpool_name | LXDThinPool                              |
+----+-----------------+---------+-------------------+------------------------------------------+

tomp · July 28, 2020, 12:26pm

No node_id, as that key is isn’t node-specific (thats the issue, its missing and other nodes need it).

tomp · July 28, 2020, 12:27pm

losetup -a should list all the loopback sources.

tomp · July 28, 2020, 12:29pm

sudo pvdisplay -m should show you what physical devices back your volume group.

keesbghs · July 28, 2020, 12:31pm

OK. I’ll continue with this then.

But I noticed that on my very first cluster attempt (Ubuntu 18.04, LXD 3.0.3) the query gives:

$ sudo lxd sql global 'select * from storage_pools_config'
+----+-----------------+---------+-------------------------+-------------+
| id | storage_pool_id | node_id |           key           |    value    |
+----+-----------------+---------+-------------------------+-------------+
| 30 | 4               | 1       | source                  | local       |
| 31 | 4               | 1       | volatile.initial_source | /dev/md1    |
| 33 | 4               | 1       | lvm.thinpool_name       | LXDThinPool |
| 34 | 4               | 1       | lvm.vg_name             | local       |
+----+-----------------+---------+-------------------------+-------------+

keesbghs · July 28, 2020, 12:33pm

This looks quite normal. Nothing with loopback mounts

pvdisplay -m
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               local
  PV Size               3,41 TiB / not usable 4,00 MiB
  Allocatable           yes (but full)
  PE Size               4,00 MiB
  Total PE              894180
  Free PE               0
  Allocated PE          894180
  PV UUID               VAFIq7-EMB4-mAND-6Ezp-W7Yu-9BXx-y1c1uj
   
  --- Physical Segments ---
  Physical extent 0 to 255:
    Logical volume	/dev/local/lvol0_pmspare
    Logical extents	0 to 255
  Physical extent 256 to 893923:
    Logical volume	/dev/local/LXDThinPool_tdata
    Logical extents	0 to 893667
  Physical extent 893924 to 894179:
    Logical volume	/dev/local/LXDThinPool_tmeta
    Logical extents	0 to 255
   
  --- Physical volume ---
  PV Name               /dev/md0
  VG Name               vg0
  PV Size               199,87 GiB / not usable 3,00 MiB
  Allocatable           yes 
  PE Size               4,00 MiB
  Total PE              51167
  Free PE               12767
  Allocated PE          38400
  PV UUID               AgFNEY-81xR-2xHn-oRIw-Qvj0-tTXB-0oQ4gn
   
  --- Physical Segments ---
  Physical extent 0 to 25599:
    Logical volume	/dev/vg0/root
    Logical extents	0 to 25599
  Physical extent 25600 to 38399:
    Logical volume	/dev/vg0/home
    Logical extents	0 to 12799
  Physical extent 38400 to 51166:
    FREE

keesbghs · July 28, 2020, 12:38pm

In driver_lvm.go I found this piece

// Create creates the storage pool on the storage device.
func (d *lvm) Create() error {
	d.config["volatile.initial_source"] = d.config["source"]

	defaultSource := loopFilePath(d.name)
	var err error
	var pvExists, vgExists bool
	var pvName string
	var vgTags []string

	revert := revert.New()
	defer revert.Fail()

	if d.config["source"] == "" || d.config["source"] == defaultSource {
		// We are using an LXD internal loopback file.
		d.config["source"] = defaultSource
		if d.config["lvm.vg_name"] == "" {
			d.config["lvm.vg_name"] = d.name
		}

loopFilePath is creating the file name that we see.

// loopFilePath returns the loop file path for a storage pool.
func loopFilePath(poolName string) string {
	return filepath.Join(shared.VarPath("disks"), fmt.Sprintf("%s.img", poolName))
}

keesbghs · July 28, 2020, 12:41pm

I don’t see a loopback mount for that /var/snap/lxd/common/lxd/disks/local.img

$ sudo losetup -a | grep -v delete
/dev/loop15: [64768]:5113996 (/var/lib/snapd/snaps/core_9436.snap)
/dev/loop4: [64768]:5115209 (/var/lib/snapd/snaps/core_9665.snap)
/dev/loop21: [64768]:5112538 (/var/lib/snapd/snaps/lxd_16100.snap)
/dev/loop5: [64768]:5115357 (/var/lib/snapd/snaps/core18_1754.snap)
/dev/loop22: [64768]:5114957 (/var/lib/snapd/snaps/core18_1880.snap)
/dev/loop20: [64768]:5115147 (/var/lib/snapd/snaps/lxd_16044.snap)

tomp · July 28, 2020, 12:46pm

OK good, so because your volume group defined in lvm.vg_name exists in /dev/ already, LXD isn’t trying to mount the loopback file specified in your config. Which is why its working, even though it appears you have some old config left in your node from previous iterations of trying to set this up.

keesbghs · July 28, 2020, 1:11pm

Now, do you think I should change the source value to /dev/md1?

tomp · July 28, 2020, 1:15pm

Yes its probably worth changing it to the vg name for consistency.

keesbghs · July 28, 2020, 1:18pm

The second node is now part of the cluster, yippee

keesbghs · July 28, 2020, 1:32pm

Thanks @tomp, not only for finding a solution, but also for the nice mini tutorial to setup a virtual cluster (see above).