Unattended clustering setup and min node amount

spike · August 20, 2018, 2:03am

Hi,

I’m migrating to 3.0 and hoping to setup clustering after that in the hope to simplify backup/redundancy instead of having to maintain my own monitoring and zfs syncing etc.

Two questions:

the docs say "It is recommended that the number of nodes in the cluster be at least three, so the cluster can survive the loss of at least one node and still be able to establish quorum for its distributed state " does that mean I cannot use only two nodes? recommended doesn’t sound mandatory, but then I don’t see how it isn’t mandatory if with two nodes it can’t keep its state. Right now I have a main server and backup one, and hope to cluster the two. Does that not make sense?
reading the docs there’s a way to setup clustering using lxd init or using a preseed file. Could someone explain what those do so that I can add equivalent commands to my automation? right now I’m using ansible to setup the lxd node and I’d like to add whatever I need to setup clustering

thanks,

Spike

freeekanayaka · August 20, 2018, 7:03am

Hi spike,

For any operation involving the database (so pretty much everything), LXD clustering requires a majority of the nodes to be online. The majority of 2 is 2, so if you have a cluster of 2 nodes and one of them goes offline then you can’t operate the cluster anymore during the time that node stays offline (“offline” means everything, from a simple reboot to a hardware failure). If that’s acceptable to you, sure, however it seems suboptimal for most use cases.
What’s not clear exactly about the documentation? It has examples about both interactive and automated setup.

spike · August 21, 2018, 2:39am

@freeekanayaka can you clarify when you say “LXD clustering requires a majority of the nodes to be online” what you mean with operate? do you mean that I would not be able to start new containers etc, or that the running containers would stop working? I’m ok with former, but obviously not with latter. If latter is the way it works I guess I’ll have to stay with zfs snapshots sync and ansible setting up the same containers on both servers.

about 2. , I use ansible to setup my lxd server and containers and it just works by using lxc commands and the built-in lxd_container plugin. The “manual” way outlined in the doc uses a preseed file. If there’s no other way I guess I will rework my playbooks to use that, but I’d very much appreciate an lxc cluster command to do that if available. Also to clarify I’m hoping to upgrade an older lxd server to 3.0 and turn it into a cluster. I noted that in a previous response on March 26 you said:

Regarding turning an existing LXD instance with running containers into a new node of a cluster, you’re reading is correct: it’s not currently possible. You can however create a brand new LXD instance, join a cluster as empty node, and use “lxc copy” or “lxc move” to migrate your existing containers to the new node.

is that still the case? I’m not asking to be spoonfed, but I’ve been unable to find a clear list of commands to do the above and manage clustering in the common lxc xyz command way that has been the standard so far.

thanks,

stgraber · August 21, 2018, 5:43am

You need a majority of your database nodes online for the API to be accessible.
By default LXD will pick the first 3 servers to be database servers, additional servers after that are only running containers.

This allows for one database server to go done for updates keeping the APi online, but if two go down, any API call will freeze until you’re back to two database servers being online.

The containers themselves are fine and all processes keep running inside of them, but you won’t be able to list them, spawn new ones, reconfigure them, … effectively anything which involves the lxc command will just freeze until a majority of the database nodes are online again.

freeekanayaka · August 21, 2018, 6:28am

@spkie regarding what “LXD clustering requires a majority of the nodes to be online” means, see @stgraber reply.

about 2, there is a “lxc cluster enable” command which lets you turn a non-clustered LXD instance (with or without existing configuration, containers, storage pools, etc, it doesn’t matter) into the first node of a new LXD cluster. However to join other nodes currently there is no “lxc cluster join” command, the only way is to use “lxd init” (either interactively or non-interactively by passing --preseed). The difficulty here is that at the very moment a new node requests to join a cluster, that node needs to have exactly the same storage pools and networks of the other nodes in the cluster, possibly with some configuration that is specific to the joining node (for instance if there’s a zfs pool defined, the joining node might want to specify its own backing disk device for it). This is described in the documentation. Now, since there can be any number of networks and storage pools, and since there are several per-node container config keys, coming up with a simple “lxc cluster join” command line is not straightforward, however it’s not impossible. Also, I don’t see a “lxc cluster join” command adding that much value compared to “lxd init --preseed < your-conf.yaml”, it would just be a different syntax (i.e. you’d have the yaml contained in “your-conf.yaml” turned into command line flags). Is there a reason why “lxc cluster join” would make your life easier? It seems to me that both “lxd init --preseed” and a possible “lxc cluster join” alternative work fine with ansible. @stgraber do we want to plan adding such a command?

Regarding joining nodes with existing containers, yes, that’s still the case, it’s not possible to have them join a cluster. Only the first node of the cluster can have existing configuration, new nodes must be “clean”.

I think there is room to improve the documentation about all this, although every bit I mentioned should be already in there, please correct me if I am wrong and we’ll amend the docs.

stgraber · August 21, 2018, 3:27pm

@freeekanayaka lxc cluster join doesn’t make quite that much sense when the joining node must be completely empty to join a cluster. It’d make sense if there was some way to merge the existing content into the cluster, but that’s probably more problems than we want to deal with.

spike · August 23, 2018, 4:18am

@freeekanayaka thanks for getting back to me. Understood about the creation/joining process, I can enable my current master and the wipe+join the current slave. Having the same network/storage etc is not a problem since the servers are identically built with ansible.

About the preseed/unattended part of my question, the problem I have I guess is more related to init… can I just pass to preseed the cluster parameters? will in that case all existing settings be retained? Generally speaking, while there’s value in preseed like configuration methods, I tend to prefer dconf like approaches allowing to set individual variables. So for example in this case something like lxc config cluster would be my preferred way, but it’s indeed just a preference and maybe it makes no sense for others.

Last but not least, documentation, this may very well be clear enough already, don’t take my lack of understanding as a lack of clarity in the documentation. That said I don’t see where it’s mentioned the behavior when only two nodes exist and one fails. I think that’s an important detail and the way I read it there’s only a recommendation to have at least 3 nodes with no explanation of side effects with two, but maybe I missed it.

best,

Spike