Storage pool is unavailable on this server: Placeholder volume does not exist

terryng · December 24, 2022, 3:38am

I am having a LXD cluster using CEPH for backend sotrage. The host OS is Ubuntu 20.04. It has been running smoothly and the host OS has been updated without problem. After the recent update and reboot, it is not possible to start the container. The error reported was

Error: Storage pool "remote" unavailable on this server

remote is the name of the storage pool.

lxc storage list

showed

+----------+--------+-------------+---------+---------+
|   NAME   | DRIVER | DESCRIPTION | USED BY |  STATE  |
+----------+--------+-------------+---------+---------+
| local    | zfs    |             | 1       | CREATED |
+----------+--------+-------------+---------+---------+
| remote   | ceph   |             | 72      | CREATED |
+----------+--------+-------------+---------+---------+
| remotefs | cephfs |             | 0       | CREATED |
+----------+--------+-------------+---------+---------+

Content of /var/snap/lxd/common/lxd/logs/lxd.log

time="2022-12-24T10:43:19+08:00" level=warning msg=" - Couldn't find the CGroup blkio.weight, disk priority will be ignored"
time="2022-12-24T10:43:19+08:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
time="2022-12-24T10:43:22+08:00" level=warning msg="Dqlite: attempt 1: server 192.168.1.16:8443: no known leader"
time="2022-12-24T10:43:22+08:00" level=warning msg="Dqlite: attempt 1: server 192.168.1.17:8443: no known leader"
time="2022-12-24T10:43:32+08:00" level=error msg="Failed mounting storage pool" err="Placeholder volume does not exist" pool=remote
time="2022-12-24T10:43:34+08:00" level=warning msg="Failed to initialize fanotify, falling back on inotify" err="Failed to initialize fanotify: invalid argume
nt"
time="2022-12-24T10:43:35+08:00" level=error msg="Error getting disk usage" err="Storage pool is unavailable on this server" instance=papercut instanceType=co
ntainer project=default
time="2022-12-24T10:43:35+08:00" level=error msg="Error getting disk usage" err="Storage pool is unavailable on this server" instance=class instanceType=conta
iner project=default
time="2022-12-24T10:43:35+08:00" level=error msg="Error getting disk usage" err="Storage pool is unavailable on this server" instance=ldap-server instanceType
=container project=default
time="2022-12-24T10:43:35+08:00" level=error msg="Error getting disk usage" err="Storage pool is unavailable on this server" instance=mrbs instanceType=contai
ner project=default
time="2022-12-24T10:43:35+08:00" level=error msg="Error getting disk usage" err="Storage pool is unavailable on this server" instance=registryDB instanceType=
container project=default
time="2022-12-24T10:44:33+08:00" level=error msg="Failed mounting storage pool" err="Placeholder volume does not exist" pool=remote
time="2022-12-24T10:45:33+08:00" level=error msg="Failed mounting storage pool" err="Placeholder volume does not exist" pool=remote
.....

I noticed this

time="2022-12-24T10:43:32+08:00" level=error msg="Failed mounting storage pool" err="Placeholder volume does not exist" pool=remote

But, I am sorry that I have no idea what it meant.
Is the Mount Point missing?
How can I fix it please?

Merry Christmas to everyone.

tomp · December 24, 2022, 9:41pm

LXD 5.9 added a pool mount check for the placeholder volume that is (supposed to be) present in the storage pool, which indicates that LXD is using the pool.

Depending on the age of the pool, maybe it was created before the placeholder volume was created (but its certainly not a recent change), or perhaps its been deleted accidentally in the past.

Anyway the command its running to check it exists is:

"rbd",
		"--id", d.config["ceph.user.name"],
		"--cluster", d.config["ceph.cluster_name"],
		"--pool", d.config["ceph.osd.pool_name"],
		"info",
		rbdVolumeName,

Where config is coming from the storage pools’ config lxc storage show <pool> and rbdVolumeName is the name of the placeholder rbd image name.

This should be: lxd_d.config["ceph.osd.pool_name"]

So if you can create an empty rbd image of that name that should get it working.

tomp · December 24, 2022, 9:45pm

One way to see the specific rbd image name that it creates is to create a new temporary ceph pool using lxc storage create and then find the placeholder volume and then re-create it in your existing pool.

terryng · December 27, 2022, 4:47am

Hi Thomas,

Thank you for spending time on this.

I don’t understand your last message:

One way to see the specific rbd image name that it creates is to create a new temporary ceph pool using lxc storage create and then find the placeholder volume and then re-create it in your existing pool

Much appreciated if you could kindly elaborate a bit more please.

FYI, I believe that I have already created a storage pool. Here is the output of lxc storage show remote

where “remote” is the name of the pool:

config:
  ceph.cluster_name: ceph
  ceph.osd.pg_num: "250"
  ceph.osd.pool_name: lxd-ceph
  ceph.user.name: admin
  volatile.pool.pristine: "false"
description: ""
name: remote
driver: ceph
used_by:
- /1.0/instances/admission
.......

I noticed that ceph.osd.pool_name: lxd-ceph is difference from the lxc storage pool name: remote.

I tried rbd ls lxd-ceph to list all the block devices (or volume???) in the pool “lxd-ceph”, eg.

container_admission
container_class
container_db01
.....

When I try to show the volume details with
lxc storage volume show lxd-ceph container_admission

I have Error: Storage pool not found

I then tried lxc storage volume show remote container_admission

I have Error:Storage pool volume not found

So,

the poolname “remote” exists in lxc
the poolname “lxd-ceph” exists in rbd

perhaps, I am confused with the poolname in the namespace of lxc and rbd. Anyway, how about the placeholder? Sigh!

I am not sure if unifying the poolname will help. As the lxd cluster is in production level, I need to be extremely cautious about this.

Thank you in advance.

terryng · December 27, 2022, 8:50am

lxc sorage show remote

gave me the followings:

config:
  ceph.cluster_name: ceph
  ceph.osd.pg_num: "250"
  ceph.osd.pool_name: lxd-ceph
  ceph.user.name: admin
  volatile.pool.pristine: "false"
description: ""
name: remote
driver: ceph

rbd --id admin --cluster ceph --pool lxd-ceph info lxd-ceph

gave me an error complaining the missing of image lxd-ceph

rbd: error opening image lxd-ceph: (2) No such file or directory

it was expected since LXD has problem detecting the placeholder volume of the image.

How can the configure the storage pool “remote” using the correct placeholder volume location please?

In your message, you suggested to create a new temporary ceph pool using lxc storage create. Actually, since I have a cluster, I think that I should be doing something like:

lxc storage create temp_pool ceph source=lxd-ceph --target server01
lxc storage create temp_pool ceph source=lxd-ceph --target server02
lxc storage create temp_pool ceph source=lxd-ceph --target server03
lxc storage create temp_pool ceph source=lxd-ceph --target server04
lxc storage create temp_pool ceph source=lxd-ceph --target server04
lxc storage create temp_pool ceph

What should I do to get the rbdVolumeName?
What should I do with it please?

Thanks!

terryng · December 29, 2022, 11:05am

I think that I have found the rbdVolumeName of the one of the containers.

lxc storage volume list remote

gave me listing of all containers

+----------------------+----------------------+-------------+--------------+---------+----------+
|         TYPE         |         NAME         | DESCRIPTION | CONTENT-TYPE | USED BY | LOCATION |
+----------------------+----------------------+-------------+--------------+---------+----------+
| container            | mrbs                 |             | filesystem   | 1       |          |
+----------------------+----------------------+-------------+--------------+---------+----------+

rbd --id admin --cluster ceph --pool lxd-ceph info container_mrbs

rbd image 'container_mrbs':
	size 9.3 GiB in 2385 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: 14bfb6b8b4567
	block_name_prefix: rbd_data.14bfb6b8b4567
	format: 2
	features: layering
	op_features: 
	flags: 
	create_timestamp: Tue Dec  1 20:37:14 2020
	access_timestamp: Tue Dec  1 20:37:14 2020
	modify_timestamp: Tue Dec  1 20:37:14 2020

rbd --id admin --cluster ceph --pool lxd-ceph info container_mrbs

rbd image 'container_mrbs':
	size 9.3 GiB in 2385 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: 14bfb6b8b4567
	block_name_prefix: rbd_data.14bfb6b8b4567
	format: 2
	features: layering
	op_features: 
	flags: 
	create_timestamp: Tue Dec  1 20:37:14 2020
	access_timestamp: Tue Dec  1 20:37:14 2020
	modify_timestamp: Tue Dec  1 20:37:14 2020

The rbdVolumeName (placeholder volume image) of container mrbs is containter_mrbs. I supposed that it should be added/edited in the container config.

lxc config edit mrbs

showed me the YAML config file.

I could not find any key that is related to placeholder volume image in the YAML file.

Could some experts please shed me some lights please?

Thank you in advance!

terryng · December 29, 2022, 1:39pm

This issue was resolved by creating a new pool and launch a container in the newly created pool.

Firstly, create a pending pool in all servers:

lxc storage create temp_pool ceph source=lxd-ceph --target server01
…
lxc storage create temp_pool ceph

the last command will turn the state of temp_pool to CREATED

Need to create a OSD pool:

ceph osd pool create temp_pool

Initiate the OSD pool:

rbd pool init temp_pool

See if the newly created block device pool is functioning by creating a block device image:

rbd create --size 10 temp_pool/temp

Launch a new container in the newly created storage pool:

lxc launch ubuntu:20.04 test --storage temp_pool

It was believed that the placeholder volume setting was updated. Tried to start the container which could not be started successfully before:

lxc start mrbs

where mrbs is in the original storage pool.

Reference documents:
https://docs.ceph.com/en/quincy/rbd/rados-rbd-cmds/#creating-a-block-device-image

PS: I finally understand what Thomas has hinted me to do. Thanks!

tomp · January 3, 2023, 10:47am

I’ve now got access to a ceph cluster to show you what I meant.

So, first lets setup a ceph storage pool so we can break it to get the error you’re experiencing.

Create storage pool:

lxc storage create ceph ceph

Check for the placeholder volume:

rbd list --pool ceph
lxd_ceph

Delete the placeholder volume from ceph:

rbd remove lxd_ceph --pool ceph
Removing image: 100% complete...done.

Restart LXD and check the error logs and pool status:

DEBUG  [2023-01-03T10:45:05Z] Initializing storage pool                     pool=ceph
DEBUG  [2023-01-03T10:45:05Z] Mount started                                 driver=ceph pool=ceph
DEBUG  [2023-01-03T10:45:05Z] Mount finished                                driver=ceph pool=ceph
ERROR  [2023-01-03T10:45:05Z] Failed mounting storage pool                  err="Placeholder volume does not exist" pool=ceph

lxc storage ls
+---------+--------+------------------------------------+-------------+---------+-------------+
|  NAME   | DRIVER |               SOURCE               | DESCRIPTION | USED BY |    STATE    |
+---------+--------+------------------------------------+-------------+---------+-------------+
| ceph    | ceph   | ceph                               |             | 0       | UNAVAILABLE |
+---------+--------+------------------------------------+-------------+---------+-------------+

Restore placeholder volume and wait for LXD to detect it:

rbd create lxd_ceph --pool ceph --size 0B

DEBUG  [2023-01-03T10:47:05Z] Initializing storage pool                     pool=ceph
DEBUG  [2023-01-03T10:47:05Z] Mount started                                 driver=ceph pool=ceph
DEBUG  [2023-01-03T10:47:05Z] Mount finished                                driver=ceph pool=ceph
INFO   [2023-01-03T10:47:05Z] Initialized storage pool                      pool=ceph
INFO   [2023-01-03T10:47:05Z] All storage pools initialized

jgraichen · January 9, 2023, 9:27am

We were hit by the same issue just today. We have a LXD cluster on 22.04 with many containers using ceph as storage pool. Since the reboot yesterday, all these containers do not start anymore because the storage pool “unavailable on this server”. There has been no recent change except whatever snap is autoupdating.

root@control2a:~# lxc storage show remote
config:
  ceph.cluster_name: ceph
  ceph.osd.force_reuse: "true"
  ceph.osd.pool_name: cloud.core.volumes
  ceph.user.name: cloud.core.lxd
  volatile.pool.pristine: "false"
description: ""
name: remote
driver: ceph

The mentioned placeholder volume did not exist, and, to my knowledge, never existed. Luckily, creating it with rbd did work:

rbd create --pool cloud.core.volumes --size 0B lxd_cloud.core.volumes

Maybe the migration didn’t run whenever it was introduced?

tomp · January 9, 2023, 9:34am

Good to hear it works. Yes I am not sure why some Ceph pools don’t have this placeholder volume.
How old are these storage pools do you know?

jgraichen · January 9, 2023, 9:42am

How old are these storage pools do you know?

It should be around 2.5 years.

tomp · January 9, 2023, 9:48am

I see. Perhaps, as you say, older pools didn’t have these placeholder volumes for some reason.