Hi, I’ve been using Incus for quite some time and have a bunch of machines all running happily, so today I thought I’d finally try to cluster a few. Tearing my hair out now, any help would be much appreciated.
Two clean servers.
Server # 1, ZFS, pool=“local”, network=“host”, all set up, Web UI shows “fully operational”, “database leader” etc. All happy.
Server #2, have been retrying this to no avail. At the last hurdle I end up with;
Error: Failed to join cluster: Failed to initialize member: Failed to initialize storage pools and networks: Failed to update storage pool "local": Config key "source" is cluster member specific
My process is two passes at “incus admin init”;
On pass #1 I create the storage pool (local) and the network (host), this works fine.
On pass #2, I opt for clustering, join cluster, enter token, I’ve tried entering all manner of things for “local source” and “zfs.pool_name”, including blanks, all to no avail.
In between each try I clear down everything and re-install incus, including clearing any trust certificates on the first server.
reset.sh
service incus stop
rm -rf /var/lib/incus
rm -rf /etc/incus
zpool destroy local
apt remove --purge incus
apt autoremove
I’ve read through all the forum posts and docs I can find, but nothing seems to be denting it. Anyone got any ideas about what I’m doing wrong? It “feels” like I’m not clearing something from a previous fail - but I can’t see what … ??
Ok, so after looking at the source code it would appear that my issue stems from the “config” storage being different on the first node to the second … on the primary it reads;
# incus storage show local
config: {}
description: ""
name: local
driver: zfs
On the new node to add;
config:
source: local
volatile.initial_source: local
zfs.pool_name: local
description: ""
name: local
driver: zfs
It appears that however I seem to create the “local” storage on the new node, I always end up with details in “config”, which seems to trigger the problem. I don’t know how I managed to get an empty config on the primary node, indeed I don’t know how it works without knowing the zfs pool name …
Ok, so I now have a fully-working cluster … for me, it turns out the trick is to have a ZFS pool configured and ready to go with the right name, but when running “incus admin init”, don’t specify any network devices or storage pools. Then for the first node, add in a storage pool and network device using the UI, then for subsequent nodes, add with no networking or storage, and it automatically creates devices as required when the cluster is joined.
Maybe I’m reading the docs wrong, but it doesn’t seem clear that this is the way to proceed when setting up a cluster. You can set up relatively easily if you’re using ZFS on local storage as a loopback device, but if you’re trying to use ZFS on a disk partition, things seem to go south very quickly.