Ceph storage with existing Ceph pool

Hello all,

I already have a ceph storage pool set up, as my use case involves needing an erasure code pool. That is all fine and good and I was able to once configure LXD to create a storage using an exisitng Ceph pool. Since this is in a LXD cluster I primed it on all members with:

lxc storage create $storage_name ceph source=$ceph_pool_name --target=$node_name

Which seems to create the storage as pending as is expected:

±--------±-------±------------±--------±--------+
| NAME | DRIVER | DESCRIPTION | USED BY | STATE |
±--------±-------±------------±--------±--------+
| $NAME | ceph | | 0 | PENDING |
±--------±-------±------------±--------±--------+

The first time I was able to then run:

lxc storage create $storage_name ceph source=$ceph_pool_name 

And that went ahead an created the storage without issues.

Now, however, even though I ran through the exact same steps and see the storage as pending on the members, when I run the final create I get:

Error: Config key “source” is node-specific.

I’m unsure of what I am missing here though or what further logs I can check. The daemon’s logs show generic chatter between the cluster members, but no specific errors.

In the docs Storage pools | LXD it is suggested to use exactly these commands and I can’t find one where I would setup a pool with erasure-code for Ceph so I figured I had to go about it this way.

Hi Lachezar,
You can send that command on all cluster members,
lxc storage create --target <host> cpool ceph source=cpool
when you have completed the command, you see the status of the storage is pending. Then execute that command. lxc storage create cpool ceph
When everything is alright on the ceph side, you have completed the storage definition.
Please have a look at that link.
https://github.com/lxc/lxd/blob/master/doc/clustering.md
Regards.
P.S.
My ceph pool in the example is cpool like this ceph osd pool create cpool 64

Hello,

Thank you for the reply. When I try that it would originally create a whole new pool in ceph. Now though it just hangs on the create and shows “ERRORED” with:

root@$NODE_NAME:~# lxc storage ls
±--------±-------±------------±--------±--------+
| NAME | DRIVER | DESCRIPTION | USED BY | STATE |
±--------±-------±------------±--------±--------+
| $TORAGE_NAME | ceph | | 0 | ERRORED |
±--------±-------±------------±--------±--------+

Error: OSD pool missing in df output

shown if I run lxc storage info on a target node.

My pool does show up in ceph df on all members:

root@$NODE_NAME:~# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    5.5 TiB  5.5 TiB  5.0 GiB   5.0 GiB       0.09
TOTAL  5.5 TiB  5.5 TiB  5.0 GiB   5.0 GiB       0.09

--- POOLS ---
POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics   1  128  2.6 MiB        6  7.8 MiB      0    1.7 TiB
$CEPH_POOL_NAME_IN_LXD  3  128      0 B        0      0 B      0    3.5 TiB

Hi,
Can you post the lxc cluster list and what is $TORAGE_NAME that could be a simple pool name, right?
And all the lxd cluster members, can you post the ls -alh /etc/ceph command.
Regards.

Yes, small typo, $STORAGE_NAME is the same as the ceph pool in this case, though I tried it with a different one and the effects are the same. Here are the contents of /etc/ceph

drwxr-xr-x 2 root root 4.0K Nov 9 04:28 .
drwxr-xr-x 84 root root 4.0K Nov 17 08:50 …
-rw------- 1 root root 151 Nov 9 03:41 ceph.client.admin.keyring
-rw-r–r-- 1 root root 538 Nov 9 04:28 ceph.conf
-rw-r–r-- 1 root root 92 Sep 16 08:15 rbdmaps

The cluster list is as follows:

±-------±---------------------------±---------±-------------±---------------±------------±-------±------------------+
| NAME | URL | ROLES | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE | MESSAGE |
±-------±---------------------------±---------±-------------±---------------±------------±-------±------------------+
| node01 | https://$IP_OF_NODE_ONE:8443 | database | x86_64 | default | | ONLINE | Fully operational |
±-------±---------------------------±---------±-------------±---------------±------------±-------±------------------+
| node02 | https://$IP_OF_NODE_TWO:8443 | database | x86_64 | default | | ONLINE | Fully operational |
±-------±---------------------------±---------±-------------±---------------±------------±-------±------------------+
| node03 | https://$IP_OF_NODE_THREE:8443 | database | x86_64 | default | | ONLINE | Fully operational |
±-------±---------------------------±---------±-------------±---------------±------------±-------±------------------+

So, You have completed lxc storage create --target <host> cpool ceph source=cpool part on each lxd cluster member right?
Please delete the cpool if you had created before on the ceph side. ceph osd pool rm cpool cpool --yes-i-really-really-mean-it and delete lxd storage lxc storage delete cpool.
Then execute the lxc storage create cpool ceph command.
And if you get any error, please post with the step information.
Regards.

I did complete it yes, however, what I’m trying to do is use a pool that is already existing in Ceph. Apologies if I didn’t make that clear enough originally. The steps you described will create a new one, which is not what I need, since I can’t see the required options for using erasure coding being available when creating it from the LXD side. So I need help with either:

A) Setting up a storage backed with a Ceph pool that already exists.

B) Options to configure a storage backed with a Ceph pool with erasure encoding when setting it up from LXD.

Have you tried using just:

lxc storage create $storage_name ceph

As the final step, after specifying the “per-node” settings (even though for ceph source isn’t, which looks like a bug to me)?

Each time the final step fails an is in an ERRORED state you need to delete the storage pool and start again.

Please can you log your reproducer steps as an issue at https://github.com/lxc/lxd/issues

LXD does consume pre-existing empty pools just fine, I believe that’s what we actually rely on for all Ceph based deployments using the LXD charm.

@sdeziel

It certainly appears to have support for it in the driver:

But the validation of member-specific config might have an issue.

It didn’t last we tried it a few weeks ago at least, I seem to recall @sdeziel and I specifically testing this as part of a cluster.

Mind you, there was a slight issue in that LXD wouldn’t properly record the pool as “pristine” in that case, but I got that fixed in master immediately afterwards (and didn’t impact consuming the pool in the first place, it just considered it dirty when it shouldn’t have).

That is correct, the LXD charm consumes existing empty Ceph pools and the fix @stgraber refers to was released in LXD 4.20.

I did try that, however, what happens then is either:

  1. The request is sent to the API, judging by “lxd monitor --debug” but no response is sent from the daemon, so the command itself just hangs without any feedback, as a side I am unsure how long the timeout would be in such a case, but I usually give up after 10 minutes. There is no storage created. Restarting the snap.lxd.daemon.service fixes this.

  2. If 1) isn’t the case and there is no existing storage you would run the configure on the targets, then the storage create, which hangs, though this time there is a storage created with a status of “ERRORED” with the “OSD pool missing” message I provided previously. Again, the pool does indeed exist in Ceph. I did notice in the driver that it runs a query to ceph to get this information, running it manually does indeed return results:

root@node02:~# ceph --name client.lxd --cluster ceph df -f json | jq .
{
“stats”: {
“total_bytes”: 6001218551808,
“total_avail_bytes”: 5995789131776,
“total_used_bytes”: 5429420032,
“total_used_raw_bytes”: 5429420032,
“total_used_raw_ratio”: 0.000904719578102231,
“num_osds”: 6,
“num_per_pool_osds”: 6,
“num_per_pool_omap_osds”: 6
},
“stats_by_class”: {
“ssd”: {
“total_bytes”: 6001218551808,
“total_avail_bytes”: 5995789131776,
“total_used_bytes”: 5429420032,
“total_used_raw_bytes”: 5429420032,
“total_used_raw_ratio”: 0.000904719578102231
}
},
“pools”: [
{
“name”: “device_health_metrics”,
“id”: 1,
“stats”: {
“stored”: 2726661,
“objects”: 6,
“kb_used”: 7989,
“bytes_used”: 8179983,
“percent_used”: 1.4365576816999237e-06,
“max_avail”: 1898049044480
}
},
{
“name”: “lxd-pool”,
“id”: 3,
“stats”: {
“stored”: 0,
“objects”: 0,
“kb_used”: 0,
“bytes_used”: 0,
“percent_used”: 0,
“max_avail”: 3796098088960
}
},
{
“name”: “lxd-pool2”,
“id”: 7,
“stats”: {
“stored”: 0,
“objects”: 0,
“kb_used”: 0,
“bytes_used”: 0,
“percent_used”: 0,
“max_avail”: 1898049044480
}
},
{
“name”: “lxdtest1”,
“id”: 8,
“stats”: {
“stored”: 36,
“objects”: 4,
“kb_used”: 24,
“bytes_used”: 24576,
“percent_used”: 4.31601065997711e-09,
“max_avail”: 1898049044480
}
},
{
“name”: “lxdalextest1”,
“id”: 9,
“stats”: {
“stored”: 0,
“objects”: 0,
“kb_used”: 0,
“bytes_used”: 0,
“percent_used”: 0,
“max_avail”: 1898049044480
}
}
]
}

Where the pool we want is lxd-pool.

It seems to me there are three things here. Hence I would like to clarify should I open the bug for it not allowing source in the global config, the storage create command hanging unless the lxd.daemon is restarted (which produces no logs for some reason), or for the “OSD pool missing”? Or three separate ones?

  1. The source property is always per server, so that’s expected (though in ceph’s case it will almost always be the same value)
  2. The command hanging is likely the issue, to me, this suggests that one of your LXD servers is itself stuck, either on calling a ceph command or on the kernel. We’d need a ps fauxww from all servers at the time the command is hanging.
  3. I’m guessing is probably the result of whatever is stuck in 2) timing out with an empty response.

While hanging:

From the first node (the one where we run the create command:

Second node:

Third node:

It seems to me there is nothing special running overall. The storage does show up on the other nodes though as “errored” as I mentioned so it is hitting the API but seems to not be getting a response back? This is with a --debug flag on the create:

root@node01:/# lxc storage create default ceph --debug
DBUG[11-26|11:19:49] Connecting to a local LXD over a Unix socket
DBUG[11-26|11:19:49] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
DBUG[11-26|11:19:49] Got response struct from LXD
DBUG[11-26|11:19:49]
{
“config”: {
“cluster.https_address”: “172.16.222.9:8443”,
“core.https_address”: “172.16.222.9:8443”,
“core.trust_password”: true,
“images.auto_update_interval”: “0”
},
“api_extensions”: [
“storage_zfs_remove_snapshots”,
“container_host_shutdown_timeout”,
“container_stop_priority”,
“container_syscall_filtering”,
“auth_pki”,
“container_last_used_at”,
“etag”,
“patch”,
“usb_devices”,
“https_allowed_credentials”,
“image_compression_algorithm”,
“directory_manipulation”,
“container_cpu_time”,
“storage_zfs_use_refquota”,
“storage_lvm_mount_options”,
“network”,
“profile_usedby”,
“container_push”,
“container_exec_recording”,
“certificate_update”,
“container_exec_signal_handling”,
“gpu_devices”,
“container_image_properties”,
“migration_progress”,
“id_map”,
“network_firewall_filtering”,
“network_routes”,
“storage”,
“file_delete”,
“file_append”,
“network_dhcp_expiry”,
“storage_lvm_vg_rename”,
“storage_lvm_thinpool_rename”,
“network_vlan”,
“image_create_aliases”,
“container_stateless_copy”,
“container_only_migration”,
“storage_zfs_clone_copy”,
“unix_device_rename”,
“storage_lvm_use_thinpool”,
“storage_rsync_bwlimit”,
“network_vxlan_interface”,
“storage_btrfs_mount_options”,
“entity_description”,
“image_force_refresh”,
“storage_lvm_lv_resizing”,
“id_map_base”,
“file_symlinks”,
“container_push_target”,
“network_vlan_physical”,
“storage_images_delete”,
“container_edit_metadata”,
“container_snapshot_stateful_migration”,
“storage_driver_ceph”,
“storage_ceph_user_name”,
“resource_limits”,
“storage_volatile_initial_source”,
“storage_ceph_force_osd_reuse”,
“storage_block_filesystem_btrfs”,
“resources”,
“kernel_limits”,
“storage_api_volume_rename”,
“macaroon_authentication”,
“network_sriov”,
“console”,
“restrict_devlxd”,
“migration_pre_copy”,
“infiniband”,
“maas_network”,
“devlxd_events”,
“proxy”,
“network_dhcp_gateway”,
“file_get_symlink”,
“network_leases”,
“unix_device_hotplug”,
“storage_api_local_volume_handling”,
“operation_description”,
“clustering”,
“event_lifecycle”,
“storage_api_remote_volume_handling”,
“nvidia_runtime”,
“container_mount_propagation”,
“container_backup”,
“devlxd_images”,
“container_local_cross_pool_handling”,
“proxy_unix”,
“proxy_udp”,
“clustering_join”,
“proxy_tcp_udp_multi_port_handling”,
“network_state”,
“proxy_unix_dac_properties”,
“container_protection_delete”,
“unix_priv_drop”,
“pprof_http”,
“proxy_haproxy_protocol”,
“network_hwaddr”,
“proxy_nat”,
“network_nat_order”,
“container_full”,
“candid_authentication”,
“backup_compression”,
“candid_config”,
“nvidia_runtime_config”,
“storage_api_volume_snapshots”,
“storage_unmapped”,
“projects”,
“candid_config_key”,
“network_vxlan_ttl”,
“container_incremental_copy”,
“usb_optional_vendorid”,
“snapshot_scheduling”,
“snapshot_schedule_aliases”,
“container_copy_project”,
“clustering_server_address”,
“clustering_image_replication”,
“container_protection_shift”,
“snapshot_expiry”,
“container_backup_override_pool”,
“snapshot_expiry_creation”,
“network_leases_location”,
“resources_cpu_socket”,
“resources_gpu”,
“resources_numa”,
“kernel_features”,
“id_map_current”,
“event_location”,
“storage_api_remote_volume_snapshots”,
“network_nat_address”,
“container_nic_routes”,
“rbac”,
“cluster_internal_copy”,
“seccomp_notify”,
“lxc_features”,
“container_nic_ipvlan”,
“network_vlan_sriov”,
“storage_cephfs”,
“container_nic_ipfilter”,
“resources_v2”,
“container_exec_user_group_cwd”,
“container_syscall_intercept”,
“container_disk_shift”,
“storage_shifted”,
“resources_infiniband”,
“daemon_storage”,
“instances”,
“image_types”,
“resources_disk_sata”,
“clustering_roles”,
“images_expiry”,
“resources_network_firmware”,
“backup_compression_algorithm”,
“ceph_data_pool_name”,
“container_syscall_intercept_mount”,
“compression_squashfs”,
“container_raw_mount”,
“container_nic_routed”,
“container_syscall_intercept_mount_fuse”,
“container_disk_ceph”,
“virtual-machines”,
“image_profiles”,
“clustering_architecture”,
“resources_disk_id”,
“storage_lvm_stripes”,
“vm_boot_priority”,
“unix_hotplug_devices”,
“api_filtering”,
“instance_nic_network”,
“clustering_sizing”,
“firewall_driver”,
“projects_limits”,
“container_syscall_intercept_hugetlbfs”,
“limits_hugepages”,
“container_nic_routed_gateway”,
“projects_restrictions”,
“custom_volume_snapshot_expiry”,
“volume_snapshot_scheduling”,
“trust_ca_certificates”,
“snapshot_disk_usage”,
“clustering_edit_roles”,
“container_nic_routed_host_address”,
“container_nic_ipvlan_gateway”,
“resources_usb_pci”,
“resources_cpu_threads_numa”,
“resources_cpu_core_die”,
“api_os”,
“container_nic_routed_host_table”,
“container_nic_ipvlan_host_table”,
“container_nic_ipvlan_mode”,
“resources_system”,
“images_push_relay”,
“network_dns_search”,
“container_nic_routed_limits”,
“instance_nic_bridged_vlan”,
“network_state_bond_bridge”,
“usedby_consistency”,
“custom_block_volumes”,
“clustering_failure_domains”,
“resources_gpu_mdev”,
“console_vga_type”,
“projects_limits_disk”,
“network_type_macvlan”,
“network_type_sriov”,
“container_syscall_intercept_bpf_devices”,
“network_type_ovn”,
“projects_networks”,
“projects_networks_restricted_uplinks”,
“custom_volume_backup”,
“backup_override_name”,
“storage_rsync_compression”,
“network_type_physical”,
“network_ovn_external_subnets”,
“network_ovn_nat”,
“network_ovn_external_routes_remove”,
“tpm_device_type”,
“storage_zfs_clone_copy_rebase”,
“gpu_mdev”,
“resources_pci_iommu”,
“resources_network_usb”,
“resources_disk_address”,
“network_physical_ovn_ingress_mode”,
“network_ovn_dhcp”,
“network_physical_routes_anycast”,
“projects_limits_instances”,
“network_state_vlan”,
“instance_nic_bridged_port_isolation”,
“instance_bulk_state_change”,
“network_gvrp”,
“instance_pool_move”,
“gpu_sriov”,
“pci_device_type”,
“storage_volume_state”,
“network_acl”,
“migration_stateful”,
“disk_state_quota”,
“storage_ceph_features”,
“projects_compression”,
“projects_images_remote_cache_expiry”,
“certificate_project”,
“network_ovn_acl”,
“projects_images_auto_update”,
“projects_restricted_cluster_target”,
“images_default_architecture”,
“network_ovn_acl_defaults”,
“gpu_mig”,
“project_usage”,
“network_bridge_acl”,
“warnings”,
“projects_restricted_backups_and_snapshots”,
“clustering_join_token”,
“clustering_description”,
“server_trusted_proxy”,
“clustering_update_cert”,
“storage_api_project”,
“server_instance_driver_operational”,
“server_supported_storage_drivers”,
“event_lifecycle_requestor_address”,
“resources_gpu_usb”,
“clustering_evacuation”,
“network_ovn_nat_address”,
“network_bgp”,
“network_forward”,
“custom_volume_refresh”,
“network_counters_errors_dropped”,
“metrics”,
“image_source_project”,
“clustering_config”,
“network_peer”,
“linux_sysctl”,
“network_dns”,
“ovn_nic_acceleration”
],
“api_status”: “stable”,
“api_version”: “1.0”,
“auth”: “trusted”,
“public”: false,
“auth_methods”: [
“tls”
],
“environment”: {
“addresses”: [
“172.16.222.9:8443”
],
“architectures”: [
“x86_64”,
“i686”
],
“certificate”: REDACTED
“certificate_fingerprint”: “REDACTED”,
“driver”: “lxc | qemu”,
“driver_version”: “4.0.11 | 6.1.0”,
“firewall”: “nftables”,
“kernel”: “Linux”,
“kernel_architecture”: “x86_64”,
“kernel_features”: {
“netnsid_getifaddrs”: “true”,
“seccomp_listener”: “true”,
“seccomp_listener_continue”: “true”,
“shiftfs”: “false”,
“uevent_injection”: “true”,
“unpriv_fscaps”: “true”
},
“kernel_version”: “5.10.0-9-amd64”,
“lxc_features”: {
“cgroup2”: “true”,
“core_scheduling”: “true”,
“devpts_fd”: “true”,
“idmapped_mounts_v2”: “true”,
“mount_injection_file”: “true”,
“network_gateway_device_route”: “true”,
“network_ipvlan”: “true”,
“network_l2proxy”: “true”,
“network_phys_macvlan_mtu”: “true”,
“network_veth_router”: “true”,
“pidfd”: “true”,
“seccomp_allow_deny_syntax”: “true”,
“seccomp_notify”: “true”,
“seccomp_proxy_send_notify_fd”: “true”
},
“os_name”: “Debian GNU/Linux”,
“os_version”: “11”,
“project”: “default”,
“server”: “lxd”,
“server_clustered”: true,
“server_name”: “node01”,
“server_pid”: 179350,
“server_version”: “4.20”,
“storage”: “”,
“storage_version”: “”,
“storage_supported_drivers”: [
{
“Name”: “btrfs”,
“Version”: “5.4.1”,
“Remote”: false
},
{
“Name”: “cephfs”,
“Version”: “15.2.14”,
“Remote”: true
},
{
“Name”: “dir”,
“Version”: “1”,
“Remote”: false
},
{
“Name”: “lvm”,
“Version”: “2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.43.0”,
“Remote”: false
},
{
“Name”: “ceph”,
“Version”: “15.2.14”,
“Remote”: true
}
]
}
}
DBUG[11-26|11:19:49] Sending request to LXD method=POST url=http://unix.socket/1.0/storage-pools etag=
DBUG[11-26|11:19:49]
{
“config”: {},
“description”: “”,
“name”: “default”,
“driver”: “ceph”
}

And it just sits there. LXC monitor does catch the request but nothing else.

Well, you have quite a lot of stuck rbd --id admin --cluster ceph --pool lxd-pool info lxd_lxd-pool, so that’s what LXD is blocked on.

You said that running the same directly works fine for you, so it may be some kind of ceph version mismatch thing going on. You could try doing snap set lxd ceph.external=true and then reboot all systems for good measure (will get rid of anything that was stuck).

Thank you for the details, the pool did indeed have a different than active state in Ceph. That is addressed though now I get the below upon the final step:

Error: Failed to run: ceph --name client.admin --cluster ceph osd pool create lxd-pool 32: Traceback (most recent call last):
  File "/snap/lxd/current/lib/python3/dist-packages/ceph_argparse.py", line 1423, in send_command
    ret, outbuf, outs = run_in_thread(
  File "/snap/lxd/current/lib/python3/dist-packages/ceph_argparse.py", line 1343, in run_in_thread
    raise t.exception
  File "/snap/lxd/current/lib/python3/dist-packages/ceph_argparse.py", line 1309, in run
    self.retval = self.func(*self.args, **self.kwargs)
TypeError: Argument 'cmd' has incorrect type (expected str, got list)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/snap/lxd/current/lib/python3/dist-packages/ceph_argparse.py", line 1492, in json_command
    ret, outbuf, outs = send_command_retry(cluster,
  File "/snap/lxd/current/lib/python3/dist-packages/ceph_argparse.py", line 1351, in send_command_retry
    return send_command(*args, **kwargs)
  File "/snap/lxd/current/lib/python3/dist-packages/ceph_argparse.py", line 1450, in send_command
    raise RuntimeError('"{0}": exception {1}'.format(cmd, e))
RuntimeError: "['{"prefix": "get_command_descriptions"}']": exception Argument 'cmd' has incorrect type (expected str, got list)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 1310, in <module>
    retval = main()
  File "/usr/bin/ceph", line 1221, in main
    ret, outbuf, outs = json_command(cluster_handle, target=target,
  File "/snap/lxd/current/lib/python3/dist-packages/ceph_argparse.py", line 1498, in json_command
    raise RuntimeError('"{0}": exception {1}'.format(argdict, e))
RuntimeError: "None": exception "['{"prefix": "get_command_descriptions"}']": exception Argument 'cmd' has incorrect type (expected str, got list)

I did try setting the ceph.external=true to false again but that didn’t seem to help it.

Okay so we can disregard my last update, reinstalling the snap fixed it. What I am seeing now however is:

root@node01:~# lxc storage create default ceph
Error: Failed to run: rbd --id admin --cluster ceph --pool lxd-pool --image-feature layering --size 0B create lxd_lxd-pool: 2021-11-30T06:06:48.362-0800 7f623effd700 -1 librbd::image::CreateRequest: 0x55aade2fe110 handle_add_image_to_directory: error adding image to directory: (95) Operation not supported
rbd: create error: (95) Operation not supported

This is on an Erasure Coding pool with EC overwrites enabled. Is there any other setting that should be turned on in the ceph side?