LXD 3.1 storage problem with nested LXD cluster in a LXD container

Hello LXD crowd, I’ve been playing around with the new clustering and it’s awesome! The fan network implementation is also impressive. GREAT WORK !!

For dev / training purposes, I would like to run a LXD cluster inside a single cloud instance (or VM). So I have setup LXD 3.1 on Ubuntu 18.04 on which I have deployed 3 Ubuntu 18.04 containers to run my 3 nested LXD cluster hosts (from what I have read this should work, correct me if I’m wrong).

My host LXD setup is working fine:

lxc list
+---------+---------+-------------------+------+------------+-----------+
|  NAME   |  STATE  |       IPV4        | IPV6 |    TYPE    | SNAPSHOTS |
+---------+---------+-------------------+------+------------+-----------+
| lxd-az1 | RUNNING | 10.1.0.101 (eth0) |      | PERSISTENT | 0         |
+---------+---------+-------------------+------+------------+-----------+
| lxd-az2 | RUNNING | 10.1.0.102 (eth0) |      | PERSISTENT | 0         |
+---------+---------+-------------------+------+------------+-----------+
| lxd-az3 | RUNNING | 10.1.0.103 (eth0) |      | PERSISTENT | 0         |
+---------+---------+-------------------+------+------------+-----------+

 lxc network list
+---------+----------+---------+-------------+---------+
|  NAME   |   TYPE   | MANAGED | DESCRIPTION | USED BY |
+---------+----------+---------+-------------+---------+
| ens33   | physical | NO      |             | 0       |
+---------+----------+---------+-------------+---------+
| hostbr0 | bridge   | YES     |             | 3       |
+---------+----------+---------+-------------+---------+

lxc network show hostbr0
config:
  ipv4.address: 10.1.0.1/24
  ipv4.nat: "true"
  ipv6.address: none
description: ""
name: hostbr0
type: bridge
used_by:
- /1.0/containers/lxd-az1
- /1.0/containers/lxd-az2
- /1.0/containers/lxd-az3
managed: true
status: Created
locations:
- none

ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:50:56:23:1d:e6 brd ff:ff:ff:ff:ff:ff
    inet 192.168.168.154/24 brd 192.168.168.255 scope global dynamic ens33
       valid_lft 1555sec preferred_lft 1555sec
    inet6 fe80::250:56ff:fe23:1de6/64 scope link 
       valid_lft forever preferred_lft forever
3: hostbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fe:fa:43:f0:8e:fb brd ff:ff:ff:ff:ff:ff
    inet 10.1.0.1/24 scope global hostbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::c8fd:f6ff:fe06:e84/64 scope link 
       valid_lft forever preferred_lft forever
5: vethR4DISA@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master hostbr0 state UP group default qlen 1000
    link/ether fe:fa:43:f0:8e:fb brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::fcfa:43ff:fef0:8efb/64 scope link 
       valid_lft forever preferred_lft forever
7: vethP3EPSE@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master hostbr0 state UP group default qlen 1000
    link/ether fe:fc:66:51:23:f9 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::fcfc:66ff:fe51:23f9/64 scope link 
       valid_lft forever preferred_lft forever
9: vethI18TQI@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master hostbr0 state UP group default qlen 1000
    link/ether fe:fd:5b:32:a3:00 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::fcfd:5bff:fe32:a300/64 scope link 
       valid_lft forever preferred_lft forever

lxc storage list
+---------+-------------+--------+--------------------------------------------+---------+
|  NAME   | DESCRIPTION | DRIVER |                   SOURCE                   | USED BY |
+---------+-------------+--------+--------------------------------------------+---------+
| default |             | btrfs  | /var/snap/lxd/common/lxd/disks/default.img | 6       |
+---------+-------------+--------+--------------------------------------------+---------+

lxc storage info default
info:
  description: ""
  driver: btrfs
  name: default
  space used: 1.79GB
  total space: 20.00GB
used by:
  containers:
  - lxd-az1
  - lxd-az2
  - lxd-az3
  images:
  - 75068be14e2a11620a7963b920a082ec335fd5b354a348b99ec807bd9cc8363e
  - db24c55f847e24dbe60fd665cf8f894ca1ace2d6419ba742f274966b94d4ca30
  profiles:
  - default

lxc storage volume list default
+-----------+------------------------------------------------------------------+-------------+---------+
|   TYPE    |                               NAME                               | DESCRIPTION | USED BY |
+-----------+------------------------------------------------------------------+-------------+---------+
| container | lxd-az1                                                          |             | 1       |
+-----------+------------------------------------------------------------------+-------------+---------+
| container | lxd-az2                                                          |             | 1       |
+-----------+------------------------------------------------------------------+-------------+---------+
| container | lxd-az3                                                          |             | 1       |
+-----------+------------------------------------------------------------------+-------------+---------+
| image     | 75068be14e2a11620a7963b920a082ec335fd5b354a348b99ec807bd9cc8363e |             | 1       |
+-----------+------------------------------------------------------------------+-------------+---------+
| image     | db24c55f847e24dbe60fd665cf8f894ca1ace2d6419ba742f274966b94d4ca30 |             | 1       |
+-----------+------------------------------------------------------------------+-------------+---------+

I then initiated my cluster starting with my first container (lxd-az1) which was initialised with privileged settings:

lxc init ubuntu:18.04 lxd-az1 -c security.nesting=true

Once inside the container, I started the cluster creation and everything seemed fine for the first node:

lxd init

Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=lxd-az1]: 
What IP address or DNS name should be used to reach this node? [default=10.1.0.101]: 
Are you joining an existing cluster? (yes/no) [default=no]: 
Setup password authentication on the cluster? (yes/no) [default=yes]: 
Trust password for new clients: 
Again: 
Do you want to configure a new local storage pool? (yes/no) [default=yes]: 
Name of the storage backend to use (btrfs, dir) [default=btrfs]: 
Would you like to create a new btrfs subvolume under /var/snap/lxd/common/lxd? (yes/no) [default=yes]: 
Do you want to configure a new remote storage pool? (yes/no) [default=no]: 
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: 
Would you like to create a new Fan overlay network? (yes/no) [default=yes]
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: yes
config:
  core.https_address: 10.1.0.101:8443
  core.trust_password: **REDACTED**
cluster:
  server_name: lxd-az1
  enabled: true
  cluster_address: ""
  cluster_certificate: ""
  cluster_password: ""
networks:
- config:
    bridge.mode: fan
  description: ""
  managed: false
  name: lxdfan0
  type: ""
storage_pools:
- config:
    source: /var/snap/lxd/common/lxd/storage-pools/local
  description: ""
  name: local
  driver: btrfs
profiles:
- config: {}
  description: ""
  devices:
    eth0:
      name: eth0
      nictype: bridged
      parent: lxdfan0
      type: nic
    root:
      path: /
      pool: local
      type: disk
  name: default

lxc cluster list
+---------+-------------------------+----------+--------+-------------------+
|  NAME   |           URL           | DATABASE | STATE  |      MESSAGE      |
+---------+-------------------------+----------+--------+-------------------+
| lxd-az1 | https://10.1.0.101:8443 | YES      | ONLINE | fully operational |
+---------+-------------------------+----------+--------+-------------------+

But then when I tried to setup the second node, something went wrong with the storage pool creation:

lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=lxd-az2]: 
What IP address or DNS name should be used to reach this node? [default=10.1.0.102]: 
Are you joining an existing cluster? (yes/no) [default=no]: yes
IP address or FQDN of an existing cluster node: 10.1.0.101
Cluster fingerprint: **REDACTED**
You can validate this fingerpring by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password: 
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Choose the local disk or dataset for storage pool "local" (empty for loop disk): 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: yes
config:
  core.https_address: 10.1.0.102:8443
cluster:
  server_name: lxd-az2
  enabled: true
  cluster_address: 10.1.0.101:8443
  cluster_certificate: |
    -----BEGIN CERTIFICATE-----
    **REDACTED**
    -----END CERTIFICATE-----
  cluster_password: **REDACTED**
networks:
- config:
    bridge.mode: fan
    fan.underlay_subnet: 10.1.0.0/24
  description: ""
  managed: true
  name: lxdfan0
  type: bridge
storage_pools:
- config:
    source: ""
  description: ""
  name: local
  driver: btrfs
profiles:
- config: {}
  description: ""
  devices: {}
  name: default

Error: Failed to create storage pool 'local': failed to prepare loop device: bad file descriptor

It seems LXD is unhappy that /var/snap/lxd/common/lxd/storage-pools/local already exists?

I’m kind of stuck at this point, I don’t have this problem when setting up the same cluster outside of LXD (with no nesting).

Any thoughts?

Thanks again for the great work!

Ok I’ve made some progress, can someone tell me if this solution makes sense and if it is sustainable going forward?

I started over creating the nested LXD cluster.

First node:

lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=lxd-az1]: 
What IP address or DNS name should be used to reach this node? [default=10.1.0.101]: 
Are you joining an existing cluster? (yes/no) [default=no]: no
Setup password authentication on the cluster? (yes/no) [default=yes]: 
Trust password for new clients: 
Again: 
Do you want to configure a new local storage pool? (yes/no) [default=yes]: no
Do you want to configure a new remote storage pool? (yes/no) [default=no]: no
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: 
Would you like to create a new Fan overlay network? (yes/no) [default=yes]
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

As you can see, I did not create a storage pool at this point.

On second and third nodes:

lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=lxd-az2]: 
What IP address or DNS name should be used to reach this node? [default=10.1.0.102]: 
Are you joining an existing cluster? (yes/no) [default=no]: yes
IP address or FQDN of an existing cluster node: 10.1.0.101
Cluster fingerprint: 6166005b23d74c7f3dd22dfb3848b3c05b65c50b80fa04630626671d49fa376b
You can validate this fingerpring by running "lxc info" locally on an existing node.
Is this the correct fingerprint? (yes/no) [default=no]: yes
Cluster trust password: 
All existing data is lost when joining a cluster, continue? (yes/no) [default=no] yes
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

Then again, no storage pool.

Now the cluster is up and running:

lxc cluster list
+---------+-------------------------+----------+--------+-------------------+
|  NAME   |           URL           | DATABASE | STATE  |      MESSAGE      |
+---------+-------------------------+----------+--------+-------------------+
| lxd-az1 | https://10.1.0.101:8443 | YES      | ONLINE | fully operational |
+---------+-------------------------+----------+--------+-------------------+
| lxd-az2 | https://10.1.0.102:8443 | YES      | ONLINE | fully operational |
+---------+-------------------------+----------+--------+-------------------+
| lxd-az3 | https://10.1.0.103:8443 | YES      | ONLINE | fully operational |
+---------+-------------------------+----------+--------+-------------------+

So I then prepared a new storage pool on each node:

lxc storage create clusterpool btrfs --target lxd-az1 source=/var/snap/lxd/common/lxd/storage-pools/clusterpool
lxc storage create clusterpool btrfs --target lxd-az2 source=/var/snap/lxd/common/lxd/storage-pools/clusterpool
lxc storage create clusterpool btrfs --target lxd-az3 source=/var/snap/lxd/common/lxd/storage-pools/clusterpool

At which point the storage pool got created as pending:

lxc storage list
+-------------+-------------+--------+---------+---------+
|    NAME     | DESCRIPTION | DRIVER |  STATE  | USED BY |
+-------------+-------------+--------+---------+---------+
| clusterpool |             | btrfs  | PENDING | 0       |
+-------------+-------------+--------+---------+---------+

Then I actually created the storage pool:

lxc storage create clusterpool btrfs
lxc storage list
+-------------+-------------+--------+---------+---------+
|    NAME     | DESCRIPTION | DRIVER |  STATE  | USED BY |
+-------------+-------------+--------+---------+---------+
| clusterpool |             | btrfs  | CREATED | 0       |
+-------------+-------------+--------+---------+---------+

And finally I added the newly created storage pool to the default container profile:

lxc profile device add default root disk path=/ pool=clusterpool

At which point I was able to create a first container on my cluster:

lxc init images:alpine/edge alp1
Creating alp1

lxc list
+------+---------+------+------+------------+-----------+----------+
| NAME |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS | LOCATION |
+------+---------+------+------+------------+-----------+----------+
| alp1 | STOPPED |      |      | PERSISTENT | 0         | lxd-az1  |
+------+---------+------+------+------------+-----------+----------+

And then a few more:

for i in $(seq 2 7); do lxc launch images:alpine/edge "alp${i}"; done

lxc list
+------+---------+----------------------+------+------------+-----------+----------+
| NAME |  STATE  |         IPV4         | IPV6 |    TYPE    | SNAPSHOTS | LOCATION |
+------+---------+----------------------+------+------------+-----------+----------+
| alp1 | RUNNING | 240.101.0.139 (eth0) |      | PERSISTENT | 0         | lxd-az1  |
+------+---------+----------------------+------+------------+-----------+----------+
| alp2 | RUNNING | 240.102.0.138 (eth0) |      | PERSISTENT | 0         | lxd-az2  |
+------+---------+----------------------+------+------------+-----------+----------+
| alp3 | RUNNING | 240.103.0.72 (eth0)  |      | PERSISTENT | 0         | lxd-az3  |
+------+---------+----------------------+------+------------+-----------+----------+
| alp4 | RUNNING | 240.101.0.245 (eth0) |      | PERSISTENT | 0         | lxd-az1  |
+------+---------+----------------------+------+------------+-----------+----------+
| alp5 | RUNNING | 240.102.0.30 (eth0)  |      | PERSISTENT | 0         | lxd-az2  |
+------+---------+----------------------+------+------------+-----------+----------+
| alp6 | RUNNING | 240.103.0.84 (eth0)  |      | PERSISTENT | 0         | lxd-az3  |
+------+---------+----------------------+------+------------+-----------+----------+
| alp7 | RUNNING | 240.101.0.188 (eth0) |      | PERSISTENT | 0         | lxd-az1  |
+------+---------+----------------------+------+------------+-----------+----------+

Now looking at the volumes created, it seems consistent with the created containers:

lxc storage volume list clusterpool
+-----------+------------------------------------------------------------------+-------------+---------+----------+
|   TYPE    |                               NAME                               | DESCRIPTION | USED BY | LOCATION |
+-----------+------------------------------------------------------------------+-------------+---------+----------+
| container | alp1                                                             |             | 1       | lxd-az1  |
+-----------+------------------------------------------------------------------+-------------+---------+----------+
| container | alp2                                                             |             | 1       | lxd-az2  |
+-----------+------------------------------------------------------------------+-------------+---------+----------+
| container | alp3                                                             |             | 1       | lxd-az3  |
+-----------+------------------------------------------------------------------+-------------+---------+----------+
| container | alp4                                                             |             | 1       | lxd-az1  |
+-----------+------------------------------------------------------------------+-------------+---------+----------+
| container | alp5                                                             |             | 1       | lxd-az2  |
+-----------+------------------------------------------------------------------+-------------+---------+----------+
| container | alp6                                                             |             | 1       | lxd-az3  |
+-----------+------------------------------------------------------------------+-------------+---------+----------+
| container | alp7                                                             |             | 1       | lxd-az1  |
+-----------+------------------------------------------------------------------+-------------+---------+----------+
| image     | 57245ee93b0df8fcf9c37b7a12f858ea12e89896c246064e6953139fdb939674 |             | 1       | lxd-az1  |
+-----------+------------------------------------------------------------------+-------------+---------+----------+
| image     | 57245ee93b0df8fcf9c37b7a12f858ea12e89896c246064e6953139fdb939674 |             | 1       | lxd-az2  |
+-----------+------------------------------------------------------------------+-------------+---------+----------+
| image     | 57245ee93b0df8fcf9c37b7a12f858ea12e89896c246064e6953139fdb939674 |             | 1       | lxd-az3  |
+-----------+------------------------------------------------------------------+-------------+---------+----------+

I’m really looking forward to some input on this setup. Is this a state-of-the-art use of a nested cluster storage?

Thanks in advance for any feedback.

Your second solution seems fine.

As for the original problem, setting the source of the storage pool to /var/snap/lxd/common/lxd/storage-pools/local would likely have avoided the problem.

The error you got is because the source was set to “” and so LXD attempted to setup a loop file and format it with btrfs rather than just create a directory.

Ok thanks for this clarification. I see it now in the yaml outputs. On the first node the default storage pool source configuration ended up being:

- config:
    source: /var/snap/lxd/common/lxd/storage-pools/local
  description: ""
  name: local
  driver: btrfs

while on the second one it was an empty string and thus a loop disk:

- config:
    source: ""
  description: ""
  name: local
  driver: btrfs

So when creating the additional cluster nodes, to the input: Choose the local disk or dataset for storage pool "local" (empty for loop disk): I should have entered the same setting as what was set on node1, which was: /var/snap/lxd/common/lxd/storage-pools/local

Again amazing work, the fan networking along with the clustering functionalities are really impressive.