LXD stuck when creation Ceph storage pool

yang · April 20, 2021, 10:55am

I have been trying to create a Ceph storage pool with LXD however the following command gets stuck:

lxc storage create lxd-pool ceph ceph.cluster_name=ceph ceph.osd.pool_name=lxd-storage source=lxd-storage --debug

The specific debug message it gets stuck at is the following:

DBUG[04-20|10:23:55] Sending request to LXD                   method=POST url=http://unix.socket/1.0/storage-pools etag=
DBUG[04-20|10:23:55] 
	{
		"config": {
			"ceph.cluster_name": "ceph",
			"ceph.osd.pool_name": "lxd-storage",
			"source": "lxd-storage"
		},
		"description": "",
		"name": "lxd-pool",
		"driver": "ceph"
	}

Since this is a test setup, the cluster only has 1 node and 1 OSD disk:

$ ceph -s
  cluster:
    id:     11c69d00-a1bf-11eb-96bb-f9f1c82ae2a0
    health: HEALTH_WARN
            client is using insecure global_id reclaim
            mon is allowing insecure global_id reclaim
            Reduced data availability: 33 pgs inactive
            Degraded data redundancy: 33 pgs undersized
            OSD count 1 < osd_pool_default_size 3
 
  services:
    mon: 1 daemons, quorum node3 (age 45m)
    mgr: node3.jwbkmg(active, since 44m)
    osd: 1 osds: 1 up (since 11m), 1 in (since 11m)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 0 objects, 0 B
    usage:   1.0 GiB used, 345 GiB / 346 GiB avail
    pgs:     100.000% pgs not active
             33 undersized+peered

$ ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
ssd    346 GiB  345 GiB  120 KiB   1.0 GiB       0.29
TOTAL  346 GiB  345 GiB  120 KiB   1.0 GiB       0.29
 
--- POOLS ---
POOL                   ID  PGS  STORED  OBJECTS  USED  %USED  MAX AVAIL
lxd-storage             1   32     0 B        0   0 B      0    109 GiB
device_health_metrics   2    1     0 B        0   0 B      0    109 GiB

Any tips are appreciated.

stgraber · April 20, 2021, 1:02pm

I’d guess it’s either because of your Ceph running on a very recent release and our older client being a bit confused, if that’s the case this may help:

snap set lxd ceph.external=true
systemctl reload snap.lxd.daemon

Or it may just be that Ceph wasn’t configured to allow for less than the standard 3 replicas and since you only have a single OSD, it’s impossible for any write to complete.

I believe there is a ceph.conf config key to set the default number of replicas which in your case probably should be set to 1 to avoid issues.

yang · April 20, 2021, 1:13pm

Setting ceph.external solved the issue. Thank you!

yang · May 28, 2021, 10:04am

So I have tried setting up a cluster again, this time with a 3 node cluster, I have set ceph.external and reloaded the daemon on all 3 nodes. The Ceph cluster looks as follows:

$ ceph -s
  cluster:
    id:     92031cf6-bf96-11eb-a07c-5b3f8f9b90b4
    health: HEALTH_WARN
            Degraded data redundancy: 10 pgs undersized
 
  services:
    mon: 3 daemons, quorum node1,node2,node3 (age 6m)
    mgr: node1.yvpsor(active, since 32m), standbys: node2.zsvbgi
    osd: 3 osds: 3 up (since 6m), 3 in (since 6m)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 11 TiB / 11 TiB avail
    pgs:     23 active+clean
             10 active+undersized

$ sudo ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP  META   AVAIL    %USE  VAR    PGS  STATUS
 0    hdd  7.17670   1.00000  7.2 TiB  1.0 GiB    3 MiB   0 B  1 GiB  7.2 TiB  0.01   0.51   33      up
 1    hdd  3.53419   1.00000  3.5 TiB  1.0 GiB    3 MiB   0 B  1 GiB  3.5 TiB  0.03   1.04   33      up
 2    ssd  0.33800   1.00000  346 GiB  1.0 GiB  192 KiB   0 B  1 GiB  345 GiB  0.29  10.88   23      up
                       TOTAL   11 TiB  3.0 GiB  6.2 MiB   0 B  3 GiB   11 TiB  0.03                    
MIN/MAX VAR: 0.51/10.88  STDDEV: 0.15

$ sudo ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd     11 TiB   11 TiB  6.1 MiB   2.0 GiB       0.02
ssd    346 GiB  345 GiB  200 KiB   1.0 GiB       0.29
TOTAL   11 TiB   11 TiB  6.3 MiB   3.0 GiB       0.03
 
--- POOLS ---
POOL                   ID  PGS  STORED  OBJECTS  USED  %USED  MAX AVAIL
device_health_metrics   1    1     0 B        0   0 B      0    3.5 TiB
lxd-storage             2   32     0 B        0   0 B      0    3.5 TiB

I have tried running lxd init both through a preseed file and through the CLI prompt and both seem to get stuck when attempting to initiate the cluster.

Following is the input given on lxd init:

Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=node1]: 
What IP address or DNS name should be used to reach this node? [default=192.168.6.10]: 192.168.1.110
Are you joining an existing cluster? (yes/no) [default=no]: 
Setup password authentication on the cluster? (yes/no) [default=yes]: 
Trust password for new clients: 
Again: 
Do you want to configure a new local storage pool? (yes/no) [default=yes]: no
Do you want to configure a new remote storage pool? (yes/no) [default=no]: yes
Name of the storage backend to use (ceph, cephfs) [default=ceph]: ceph
Create a new CEPH pool? (yes/no) [default=yes]: no
Name of the existing CEPH cluster [default=ceph]: 
Name of the existing OSD storage pool [default=lxd]: lxd-storage
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: 
Would you like to create a new Fan overlay network? (yes/no) [default=yes]: no
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: yes
config:
  core.https_address: 192.168.1.110:8443
  core.trust_password: secret
networks: []
storage_pools:
- config:
    ceph.cluster_name: ceph
    ceph.osd.pool_name: lxd-storage
    source: lxd-storage
  description: ""
  name: remote
  driver: ceph
profiles:
- config: {}
  description: ""
  devices:
    root:
      path: /
      pool: remote
      type: disk
  name: default
projects: []
cluster:
  server_name: node1
  enabled: true
  member_config: []
  cluster_address: ""
  cluster_certificate: ""
  server_address: ""
  cluster_password: ""

### lxd blocks here and does not return ###

Has anyone else experienced this behaviour?

yang · May 28, 2021, 10:27am

I have tried reloading the daemon yet again on the bootstrap node and noticed the following error message when executing lxd init:

Error: Failed to create storage pool 'default': Storage pool directory "/var/snap/lxd/common/lxd/storage-pools/default" already exists

Removing the directory was not sufficient as the init process would still get stuck. I then realized LXD was already listening on port 8443 so I had to unset the core.https_address:

$ sudo netstat -atulpen | grep lxd
tcp        0      0 192.168.1.110:8443      0.0.0.0:*               LISTEN      0          210086     20984/lxd

$ lxd init

### init process blocks like the previous attempt ###
$ lxc config unset core.https_address
$ lxd init

### init runs fine this time around ###

$ lxc cluster list
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| NAME  |            URL             | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE | FAILURE DOMAIN |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| node1 | https://192.168.1.110:8443 | YES      | ONLINE | Fully operational | x86_64       | default        |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+

This however does not seem like an optimal solution. Any ideas as what could cause LXD init blocking on the first init sequence are appreciated.

yang · May 28, 2021, 1:28pm

It seems the successful cluster creation might have been a false positive. While the storage pool is created on LXD, it seems to be unusable as an instance creation command has been stuck for about 3 hours:

$ lxc cluster list
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| NAME  |            URL             | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE | FAILURE DOMAIN |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| node1 | https://192.168.1.110:8443 | YES      | ONLINE | Fully operational | x86_64       | default        |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| node2 | https://192.168.1.111:8443 | YES      | ONLINE | Fully operational | x86_64       | default        |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| node3 | https://192.168.1.112:8443 | YES      | ONLINE | Fully operational | x86_64       | default        |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+

$ lxc operation show 032e5d0b-394c-4c62-8fee-e22c209f102a
id: 032e5d0b-394c-4c62-8fee-e22c209f102a
class: task
description: Creating instance
created_at: 2021-05-28T10:36:43.146917735Z
updated_at: 2021-05-28T10:37:48.412303475Z
status: Running
status_code: 103
resources:
  containers:
  - /1.0/containers/selected-chigger
  instances:
  - /1.0/instances/selected-chigger
metadata:
  download_progress: 'rootfs: 100% (5.88MB/s)'
may_cancel: false
err: ""
location: node1

$ timedatectl
               Local time: Fri 2021-05-28 13:25:58 UTC
           Universal time: Fri 2021-05-28 13:25:58 UTC
                 RTC time: Fri 2021-05-28 13:25:58    
                Time zone: Etc/UTC (UTC, +0000)       
System clock synchronized: yes                        
              NTP service: n/a                        
          RTC in local TZ: no

stgraber · May 28, 2021, 2:12pm

Hmm, can you try snap set lxd ceph.external=true followed by systemctl reload snap.lxd.daemon to restart the snap (or if completely stuck, reboot the system maybe?)

This will make the snap use the same version of the ceph tools as your system. We’ve found this to be required in some environments depending on the version of ceph running on the server side.

yang · May 28, 2021, 4:00pm

After setting, reloading the daemon and rebooting all nodes I tried to create a new instance and failed:

$ lxc init ubuntu:20.04 container-test -p container-private -s default
Creating container-test
Error: Failed instance creation: Failed creating instance from image: Failed to create mount directory "/var/snap/lxd/common/lxd/storage-pools/default/images/52c9bf12cbd3b06d591c5f56f8d9a185aca4a9a7da4d6e9f26f0ba44f68867b7": mkdir /var/snap/lxd/common/lxd/storage-pools/default/images/52c9bf12cbd3b06d591c5f56f8d9a185aca4a9a7da4d6e9f26f0ba44f68867b7: no such file or directory

The image however seems to be present:

$ lxc image list
+-------+--------------+--------+---------------------------------------------+--------------+-----------+----------+-------------------------------+
| ALIAS | FINGERPRINT  | PUBLIC |                 DESCRIPTION                 | ARCHITECTURE |   TYPE    |   SIZE   |          UPLOAD DATE          |
+-------+--------------+--------+---------------------------------------------+--------------+-----------+----------+-------------------------------+
|       | 52c9bf12cbd3 | no     | ubuntu 20.04 LTS amd64 (release) (20210510) | x86_64       | CONTAINER | 358.34MB | May 28, 2021 at 10:37am (UTC) |
+-------+--------------+--------+---------------------------------------------+--------------+-----------+----------+-------------------------------+

Weird thing is I have both LXD and Ceph cluster setup through Puppet, and I had times where the setup works and times that the LXD cluster bootstrap phase blocks my Puppet run. Though the latter is more common.

stgraber · May 28, 2021, 4:17pm

Seems to suggest that the previous failure prevented /var/snap/lxd/common/lxd/storage-pools/default or its sub-directories (images, containers, containers-snapshots, virtual-machines, virtual-machines-snapshots, custom and custom-snapshots) from getting created.

Manually creating them may fix it.

yang · May 28, 2021, 4:20pm

I went ahead and nuked my cluster to reinstall both clusters from scratch
I will attempt to setup again and report any findings. Thank you!

yang · May 28, 2021, 5:08pm

After reconfiguring the cluster it the bootstrap was configured without issues:

Following is the monitor output which is what I guess the expected normal cluster bootstrap operation:

$ lxc monitor --pretty --type=logging
INFO[05-28|17:04:56] Update network address 
INFO[05-28|17:04:56]  - binding TCP socket                    socket=192.168.6.10:8443
INFO[05-28|17:04:56] Update cluster address 
DBUG[05-28|17:04:56] Handling                                 ip=@ method=GET protocol=unix url=/1.0/storage-pools username=root
DBUG[05-28|17:04:56] Handling                                 ip=@ method=POST protocol=unix url=/1.0/storage-pools username=root
DBUG[05-28|17:04:59] create started                           clientType=normal config="map[ceph.cluster_name:ceph ceph.osd.pool_name:lxd-storage ceph.user.name:admin source:lxd-storage]" description= driver=ceph pool=default
DBUG[05-28|17:05:04] create finished                          clientType=normal config="map[ceph.cluster_name:ceph ceph.osd.pg_num:32 ceph.osd.pool_name:lxd-storage ceph.user.name:admin source:lxd-storage volatile.initial_source:lxd-storage volatile.pool.pristine:false]" description= driver=ceph pool=default
DBUG[05-28|17:05:04] Mount finished                           driver=ceph pool=default
DBUG[05-28|17:05:04] Mount started                            driver=ceph pool=default
DBUG[05-28|17:05:04] Marked storage pool local status as created pool=default
DBUG[05-28|17:05:04] Handling                                 method=GET protocol=unix url=/1.0/cluster username=root ip=@
DBUG[05-28|17:05:04] Handling                                 username=root ip=@ method=GET protocol=unix url=/1.0/events
DBUG[05-28|17:05:04] New event listener: 275099a9-d954-40bf-86bf-fe0c46c4ca2e 
DBUG[05-28|17:05:04] Handling                                 ip=@ method=PUT protocol=unix url=/1.0/cluster username=root
DBUG[05-28|17:05:04] New task Operation: 6f841599-14d0-4d0d-9c46-83c8c6fad2d8 
DBUG[05-28|17:05:04] Started task operation: 6f841599-14d0-4d0d-9c46-83c8c6fad2d8 
EROR[05-28|17:05:04] Failed to get leader node address: Node is not clustered 
DBUG[05-28|17:05:04] Acquiring exclusive lock on cluster db 
INFO[05-28|17:05:04] Stop database gateway 
DBUG[05-28|17:05:04] Handling                                 ip=@ method=GET protocol=unix url=/1.0/operations/6f841599-14d0-4d0d-9c46-83c8c6fad2d8 username=root
DBUG[05-28|17:05:04] Initializing database gateway 
DBUG[05-28|17:05:04] Start database node                      address=192.168.6.10:8443 id=1 role=voter
DBUG[05-28|17:05:04] Bootstrap database gateway ID:1 Address:192.168.6.10:8443 
DBUG[05-28|17:05:04] Releasing exclusive lock on cluster db 
DBUG[05-28|17:05:04] Dqlite: network connection lost: write unix @->@00016: write: broken pipe 

( ... multiple Dqlite messages to identical to the one above )

DBUG[05-28|17:05:04] Found cert                               name=0
DBUG[05-28|17:05:04] Triggering an out of schedule hearbeat   address=192.168.6.10:8443
DBUG[05-28|17:05:04] Starting heartbeat round (full update) 
DBUG[05-28|17:05:04] Dqlite: attempt 0: server 192.168.6.10:8443: connected 
DBUG[05-28|17:05:04] Heartbeat updating local raft nodes to [{ID:1 Address:192.168.6.10:8443 Role:voter}] 
DBUG[05-28|17:05:04] Success for task operation: 6f841599-14d0-4d0d-9c46-83c8c6fad2d8 
DBUG[05-28|17:05:04] Event listener finished: 275099a9-d954-40bf-86bf-fe0c46c4ca2e 
DBUG[05-28|17:05:04] Disconnected event listener: 275099a9-d954-40bf-86bf-fe0c46c4ca2e 
DBUG[05-28|17:05:04] Completed heartbeat round 
DBUG[05-28|17:05:04] Cluster node is up-to-date 
DBUG[05-28|17:05:14] Heartbeat updating local raft nodes to [{ID:1 Address:192.168.6.10:8443 Role:voter}] 
DBUG[05-28|17:05:14] Starting heartbeat round 
DBUG[05-28|17:05:14] Completed heartbeat round 
DBUG[05-28|17:05:24] Starting heartbeat round 
DBUG[05-28|17:05:24] Heartbeat updating local raft nodes to [{ID:1 Address:192.168.6.10:8443 Role:voter}] 
DBUG[05-28|17:05:24] Completed heartbeat round 
DBUG[05-28|17:05:34] Starting heartbeat round 
DBUG[05-28|17:05:34] Heartbeat updating local raft nodes to [{ID:1 Address:192.168.6.10:8443 Role:voter}] 
DBUG[05-28|17:05:34] Completed heartbeat round

Next time I’ll remember to monitor again when the bootstrap locks up.

stgraber · May 28, 2021, 7:00pm

Yeah, that looks good.