LXD 3.01 - Cluster - Storage create fails - btrfs

I have 4x LXD 3.01 servers running on Ubuntu 18.01 - one named LXD-Manager, and three named LXD-Server{1,2,3}. Each server has a local boot drive and an additional drive to host the containers. Once the cluster was up and running, I tried adding a new storage pool via the commands below:

root@LXD-3-Manager:~# lxc storage create SDB btrfs source=/dev/sdb --target LXD-3-Manager
Storage pool SDB pending on member LXD-3-Manager

root@LXD-3-Manager:~# lxc storage create SDB btrfs source=/dev/sdb --target LXD-Server1
Storage pool SDB pending on member LXD-Server1

root@LXD-3-Manager:~# lxc storage create SDB btrfs source=/dev/sdb --target LXD-Server2
Storage pool SDB pending on member LXD-Server2

root@LXD-3-Manager:~# lxc storage create SDB btrfs source=/dev/sdb --target LXD-Server3
Storage pool SDB pending on member LXD-Server3


root@LXD-3-Manager:~# lxc storage list
+-------+-------------+--------+---------+---------+
| NAME  | DESCRIPTION | DRIVER |  STATE  | USED BY |
+-------+-------------+--------+---------+---------+
| SDB   |             | btrfs  | PENDING | 0       |
+-------+-------------+--------+---------+---------+
| local |             | btrfs  | CREATED | 2       |
+-------+-------------+--------+---------+---------+

Next, I want to initialize the new volume and put it online:

root@LXD-3-Manager:~# lxc storage create SDB  btrfs
Error: Failed to create the BTRFS pool: /dev/sdb appears to contain an existing filesystem (btrfs).
Use the -f option to force overwrite.
btrfs-progs v4.4
See http://btrfs.wiki.kernel.org for more information.

I have reinitialized the new /dev/sdb disks on all servers, re-created a new disk label, etc. Still, I can’t get a new volume created on the cluster. It appears the btrfs volume was successfully created on two of the four servers (via “partprobe” commands). Unfortunately, the error message above does not tell me which server is failing.

Any hints?

@freeekanayaka

@rkelleyrtp I’ve not run into this situation before, but I wonder if you need to use lxc storage delete SDB --target ... for all servers, then re-create it for all servers and then add it to the cluster maybe?

And yeah, when an operation like this is done on multiple cluster members (storage pool or networks mostly), indicating what node the error occurred on would be useful.

Thanks, but the “–target” option is not valid:

root@LXD-3-Manager:~# lxc storage delete SDB --target LXD-Server1
Error: unknown flag: --target

Description:
  Delete storage pools

Usage:
  lxc storage delete [<remote>:]<pool> [flags]

Aliases:
  delete, rm

Global Flags:
      --debug         Show all debug messages
      --force-local   Force using the local unix socket
  -h, --help          Print help
  -v, --verbose       Show all information messages
      --version       Print version number

Also, it seems the “lxc storage delete” command is not working when specifying a remote server. For example:

root@LXD-3-Manager:~# lxc storage delete LXD-Server3:SDB
Error: The remote "LXD-Server3" doesn't exist

I know for certain the remote server “LXD-Server3” exists as per this output:

root@LXD-3-Manager:~# lxc cluster list
+---------------+--------------------------+----------+--------+-------------------+
|     NAME      |           URL            | DATABASE | STATE  |      MESSAGE      |
+---------------+--------------------------+----------+--------+-------------------+
| LXD-3-Manager | https://10.30.50.60:8443 | YES      | ONLINE | fully operational |
+---------------+--------------------------+----------+--------+-------------------+
| LXD-Server1   | https://10.30.50.61:8443 | YES      | ONLINE | fully operational |
+---------------+--------------------------+----------+--------+-------------------+
| LXD-Server2   | https://10.30.50.62:8443 | YES      | ONLINE | fully operational |
+---------------+--------------------------+----------+--------+-------------------+
| LXD-Server3   | https://10.30.50.63:8443 | NO       | ONLINE | fully operational |
+---------------+--------------------------+----------+--------+-------------------+

It seems cluster mode only allows the storage name - you can’t specify the host.

Ok, so in your current state, what does lxc storage list and lxc storage show SDB get you?
And what happens if you try the last create again?

Listing the current cluster status

root@LXD-3-Manager:~# lxc cluster list
+---------------+--------------------------+----------+--------+-------------------+
|     NAME      |           URL            | DATABASE | STATE  |      MESSAGE      |
+---------------+--------------------------+----------+--------+-------------------+
| LXD-3-Manager | https://10.30.50.60:8443 | YES      | ONLINE | fully operational |
+---------------+--------------------------+----------+--------+-------------------+
| LXD-Server1   | https://10.30.50.61:8443 | YES      | ONLINE | fully operational |
+---------------+--------------------------+----------+--------+-------------------+
| LXD-Server2   | https://10.30.50.62:8443 | YES      | ONLINE | fully operational |
+---------------+--------------------------+----------+--------+-------------------+
| LXD-Server3   | https://10.30.50.63:8443 | NO       | ONLINE | fully operational |
+---------------+--------------------------+----------+--------+-------------------+

Listing the storage volume

root@LXD-3-Manager:~# lxc storage list
+-------+-------------+--------+---------+---------+
| NAME  | DESCRIPTION | DRIVER |  STATE  | USED BY |
+-------+-------------+--------+---------+---------+
| SDB   |             | btrfs  | PENDING | 0       |
+-------+-------------+--------+---------+---------+
| local |             | btrfs  | CREATED | 2       |
+-------+-------------+--------+---------+---------+
root@LXD-3-Manager:~#

Showing the Storage

root@LXD-3-Manager:~# lxc storage show SDB
config: {}
description: ""
name: SDB
driver: btrfs
used_by: []
status: Pending
locations:
- LXD-3-Manager
- LXD-Server1
- LXD-Server2
- LXD-Server3
root@LXD-3-Manager:~#

Trying to recreate the volume:

root@LXD-3-Manager:~# lxc storage create SDB  btrfs
Error: Failed to create the BTRFS pool: /dev/sdb appears to contain an existing filesystem (btrfs).
Use the -f option to force overwrite.
btrfs-progs v4.4
See http://btrfs.wiki.kernel.org for more information.

It’s pretty annoying that it won’t tell you what drive still contains the btrfs header…

Can you try running dd if=/dev/zero of=/dev/sdb bs=4M count=10 on each of your system, then try the create operation again? That really should wipe any leftover header (and I don’t believe btrfs is using a magic at the end of the partition/disk)

It seems wiping the MBR fixed the issue. This is really odd since I specifically recreated the disk header (parted mklabel gpt) when debugging the issue. A suggestion is to add a CLI option that allows the “–force” option to forcefully (re)create the new volume.

root@LXD-3-Manager:~# lxc storage create SDB  btrfs
Storage pool SDB created

The storage creation took about 3secs to run on 4 nodes. Very nice.

As for the overall storage volume setup with LXD clustering, this feature seems rather fragile. Some of the operations fail without details (as shown above), some of the CLI syntax seems wrong, etc. Debugging the issues is difficult without some sort of detailed log file.

I would also like to have the ability to specify a Manager server with different resources (network, storage, etc) than the worker nodes. This allows us to separate the management tasks from the workers. For example, in my case, I had to add an additional drive to the manager server just so the storage creation part worked properly.

Yes, UX and error handling need to be improved in this area. To recap, we’ll probably need to:

  1. report more clearly what went wrong where, in case of partial failure of a cluster-wide operation (most noticeably storage and network creation or deletion)

  2. add a way to get you unstuck when things are broken due to a partial failure (something along the --force option you suggest, details to be defined)

Regarding the ability of specifying a “Manager”, I probably don’t have enough context here, and I’m not entirely sure what you mean. Other LXD team members might provide more insight on this.

Yeah, the force option will likely be a property on the storage pool (node-specific) to indicate to force the creation, likely block.force-format which if set to true will have us pass -f to mkfs.
@brauner what do you think of that part?

For the manager server part, that’s not in our plans right now.

Having storage pools or networks only visible on some of the nodes isn’t well suited for our current API, would break backward compatibility with any non-cluster-aware clients and would be pretty tricky to manage.

If you have a need for a specific host to be a unique snowflake as far as storage and network, making the containers running on it entirely tied to that machine and not able to move to any of the other nodes, then I’d argue that this machine should be kept as a standalone LXD server and not be put in the cluster.

Thanks Stephane. I understand and can appreciate the viewpoint about the “snowflake” machine given the current API and underpinnings of LXD.

That said, I ask you to consider a much larger viewpoint for LXD - namely in a data center with many LXD servers providing compute resources. From my past experience, enterprise customers want separation of duties - management vs workload (think vCenter vs ESX server). The management server is not a snowflake machine; it’s job is to provision the compute servers with the right resources (network, storage, etc). And, each server may or may not have identical configurations with each other (different storage pools, network configurations, etc). Finally, customers want availability zones to separate workloads for HA purposes or use in multi-tenant environments (separate compute clusters).

I definitely think the new cluster option in LXD 3 is a great step forward. I sincerely hope you will consider the above suggestions when planning out the roadmap for the next version of LXD.