LXD cluster limitations and usage scenario

Nick_Knutov · November 7, 2022, 7:40am

What are the cases when it’s good/better to use LXD cluster?

For example, I have few servers with different number of disks each (all different sizes). I have to use local drives due to specific production load and 1gb network. I use nictype=routed for all containers with public ipv4. I do often migrate containers between physical nodes to rebalance CPU/memory/disk load so I have to do lxc remote add ... for each server to each, which requires manual work or some external scripting/orchestration.

Can I have benefits for this scenario if I’ll setup LXD cluster? Is it possible at all in case of different disks number? From How to configure storage for a cluster - LXD documentation - “All members of a cluster must have identical storage pools”, which is possible for me only if I’ll do zfs stripe on all disks in each node which is bad in terms of data loss.

Are there any other ways to optimize LXD setup for described scenario?

mratt · November 7, 2022, 2:20pm

Hi Nick

I believe that they just have to all have the same storage pools, but they can be backed by devices unique to that server.

Had the same issue with a storage server with a completely different storage layout that would run some instances, but in the end there was no real reason for it to be part of the cluster. Only ended up with some additional work with LXC client trusts and LXD profile configuration updates. I did bump into this issue recently though which had worked before, but this only affects custom storage volumes that I’m aware of and not actual instance transfers, when copying a custom volume from a cluster member to the stand-alone server using lxc storage volume copy {storagepool/volume} {standaloneserverremote:storagepool/volume}:

github.com/lxc/lxd

"No cluster member error" when copying custom storage volume to another server

opened 10:44PM - 26 Oct 22 UTC

markrattray

Incomplete

# Required information * Distribution: Ubuntu * Distribution version: 22.0…4 x86_64 * The output of "lxc info" or if that fails: * Kernel version: 5.15.0-48-generic * LXC version: 5.7 * LXD version: 5.7-749a602 * Storage backend in use: ZFS on RAID volumes # Information to attach - [ ] Any relevant kernel output (`dmesg`) - [ ] Container log (`lxc info NAME --show-log`) - [ ] Container configuration (`lxc config show NAME --expanded`) - [ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log) - [x] Output of the client with --debug - [ ] Output of the daemon with --debug (alternatively output of `lxc monitor` while reproducing the issue) Hi Using LXD Snap 5.7 on Ubuntu 22.04 and trying to copy a custom storage volume between two servers but getting: ``` Error: Failed storage volume creation: No cluster member called 'server2.domain.tld' ``` * `server1` is the source and a member of a cluster. * `server2` is the destination and is a stand-alone server for storage. * `server2` is added as a remote in `server1`'s lxc client and is functional when using as "`server2:`" in commands, e.g: `lxc list server2:` * Physical RAID volumes back the ZFS storage pools. * Both servers have multiple projects but I'm using the default which is unrestricted. create the storage volume on `server1`: ``` lxc storage volume create sp01 testcopyvol ``` attempt to copy the volume to `server2`, and note with this command the error states `server1` ``` lxc storage volume copy sp01/testcopyvol server2:sp03/testcopyvol Error: Failed storage volume creation: No cluster member called 'server1.domain.tld' ``` attempt #2 to copy the volume to `server2`, and not with this command the error now states `server2`: ``` lxc storage volume copy sp01/testcopyvol server2:sp03/testcopyvol --target server1.domain.tld --destination-target server2.domain.tld Error: Failed storage volume creation: No cluster member called 'server2.domain.tld' ``` same with: * `--mode` set to `push` or `relay`. * block devices * devices with data in them * using the client on `server2` and trying to copy with `server1:sp01/testcopyvol sp03/testcopyvol` In the `lxc remote add` command, the token provided the IP address instead of the hostname, so the `{internalIPofServer2}` is shown in the logs and in the URL of the remote. ``` DEBUG [2022-10-26T18:28:11-04:00] Connected to the websocket: wss://{internalIPofServer2}:8443/1.0/events?target=server2.domain.tld DEBUG [2022-10-26T18:28:11-04:00] Sending request to LXD etag= method=POST url="https://{internalIPofServer2}:8443/1.0/storage-pools/sp03/volumes/custom?target=server2.domain.tld" ... Error: Failed storage volume creation: No cluster member called 'server2.domain.tld' ``` Is there something I'm missing? Reference: https://linuxcontainers.org/lxd/docs/master/howto/storage_move_volume/#copy-or-move-between-lxd-servers Thanks

My original thought to get around storage pools on the storage server not being present on other cluster members, was to use loopback or directory backed storage pools which wouldn’t get used on the main cluster members that didn’t have the necessary physical devices… bit of a bad idea but gets around the requirement. Just needed to remember not to use them on the main cluster members, so perhaps establishing a good naming convention for this solution would be a little better so that you don’t start using them on servers that don’t have appropriate storage devices. There is also LVM that you could provide chunks of storage to back pools that don’t exist.

There are 2 ideas but I’d avoid the second one. Hopefully someone could provide better ideas. Also depends on what you need, and the above could be useless to you.

Nick_Knutov · November 8, 2022, 6:57am

The thing I was not able to find in google and github issues is why lxd cluster was made as it is made now?

May be there is good reason for this and I’m using it wrong way?

Or may be it’s hard to implement flexible lxd cluster for all scenarios and current way is just most common used and I should do my own for different scenario?

I thought about making fake minimal size storage volumes for servers which have less disks than other servers but it looks like ugly solution and it will be hard to support as disks can be randomly added to servers.

tomp · November 14, 2022, 10:03am

See Different storages the cluster? distinct pool'for one node?

Nick_Knutov · November 14, 2022, 10:12am

But for what reason it was made so?

tomp · November 14, 2022, 10:19am

It was before my time so I don’t know.

But I’m not really understanding the problem.

When creating a storage pool on a cluster, yes, each member must have a pool of the same name and type. But the source property used can be different on each member.

So I am assuming you want a ZFS pool as you mentioned ZFS. LXD supports using an existing zpool or ZFS dataset for its source. So you can manually create the zpool/top-level dataset however you wish first and then instruct LXD to use it.

Nick_Knutov · November 14, 2022, 11:47am

But I have different number of disks and different number of zfs pools.

Now I do new LXD storage per zvol and do manually set which storage should be used for each container. I want to do so with LXD cluster too.

tomp · November 14, 2022, 11:54am

If you have different number of zpools then LXD cluster isn’t appropriate for your use case.

I thought you only had different amounts of disks per zpool, which would have been fine.

Nick_Knutov · November 15, 2022, 7:33am

I actually need almost all lxd cluster features (and some of them especially nice like DNS in fan mode), but I want to manage storage manually.

Does it mean I literally have to implement myself all this features from scratch?

May be some hack for native cluster exists to make it work with manually managing storages?

tomp · November 15, 2022, 9:04am

What is the reason for needing different zpools per machine, rather than having a common set of zpools with varying amounts of disks in them?

Nick_Knutov · November 15, 2022, 9:19am

Servers are very different from different years. They all have different number of different disks.

In my case I can not change those servers to something new and expect new servers will be the same (and sometimes they will have upgrade like more memory and more disks to empty slots).

tomp · November 15, 2022, 9:19am

OK, yes it sounds like this scenario is too adhoc for LXD clustering.

Nick_Knutov · December 6, 2022, 12:17am

If I always do manually select storage to move container on any lxc move and in case I have different number of disks (and storages) per server - is it ok to make the first storage the same name on any server and have “fake” near zero size loop-based storages just to match the requirements of LXD clustering?

Example:

server1:
  storage1: /dev/...
  storage2: # minimal sized loop-based
  storage3: # minimal sized loop-based

server2:
  storage1: /dev/...
  storage2: /dev/...
  storage3: # minimal sized loop-based
  
server3:
  storage1: /dev/...
  storage2: /dev/...
  storage3: /dev/...

Will this cheat lead me to some problems in usage or is it totally ok?

tomp · December 6, 2022, 8:15am

Yes that should work fine as its understood by LXD that different cluster members can have different sized pools.

Nick_Knutov · December 6, 2022, 11:59am

What should I do If I have new server with even more disks? Can I add new “fake” storage on all cluster members online or I should stop cluster someway before it or what?

tomp · December 6, 2022, 12:54pm

Yes you can create new storage pools like this:

lxc storage create <pool> <driver> [member options, e.g size=10GiB] --target=<member1>
...
lxc storage create <pool> <driver> [member options, e.g size=1GiB] --target=<memberN>
lxc storage create <pool> <driver> [global options] # Finalises pool on all pending members

We have documentation on this too: