Unable to connect LXD (5.4) instance to a 17.2.3 (Quincy) CEPH cluster

I have been playing with ceph+LXD for the past couple of days and wanted to build a proof of concept on a VPS to showcase something with a colleague.

I installed a CEPH instance on a VPS .
the instance is up and is only complaining about the lack of OSD nodes (not ideal but this is just a quick POC; a single node would do)

Then i tried installing LXD on another VPS only for lxd init to hang
and lxc storage info remote to give me this error:

Error: Failed to run: ceph --name client. --cluster ceph df -f json: 2022-08-11T15:14:26.623+0000 7ff494a95700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
[errno 13] RADOS permission denied (error connecting to the cluster)

Do note that just running the command ceph status on the lxd machine connects the ceph node without any issues

I feel like LXD is using a different CEPH client from the one installed system wide but that’s just a guess.

any help would be appreciated!
Thanks.

setup information:

LXD version is 5.4 (snap)
ceph-common is 17.2.3 (from apt)

this also might be relevant

root@lxd-s-2vcpu-2gb-fra1-01:~# snap get lxd ceph
Key            Value
ceph.builtin   false
ceph.external  true

and here is the output of what i was trying to give to lxd init

config:
  core.https_address: <public_ip>:8443
networks:
- config:
    bridge.mode: fan
    fan.underlay_subnet: 10.222.79.1/24
  description: ""
  name: lxdfan0
  type: ""
  project: default
storage_pools:
- config:
    ceph.cluster_name: ceph
    ceph.osd.pg_num: "32"
    ceph.osd.pool_name: lxd
  description: ""
  name: remote
  driver: ceph
profiles:
- config: {}
  description: ""
  devices:
    eth0:
      name: eth0
      network: lxdfan0
      type: nic
    root:
      path: /
      pool: remote
      type: disk
  name: default
projects: []
cluster:
  server_name: lxd-s-2vcpu-2gb-fra1-01
  enabled: true
  member_config: []
  cluster_address: ""
  cluster_certificate: ""
  server_address: ""
  cluster_password: ""
  cluster_certificate_path: ""
  cluster_token: ""

Okay so after turning on debug mode I discovered something:

"storage_supported_drivers": [
				...
				{
					"Name": "ceph",
					"Version": "15.2.16",
					"Remote": true
				},

LXD is saying it only supports CEPH 15.2.16 not the 17.2.3 I have installed.

Does LXD 5.4 support CEPH 17.2.3 in the first place?
and if so how can I enable that support?

The snap package includes the ceph client, it doesn’t use your system’s ceph by default.
You can change that with snap set lxd ceph.external=true followed with a systemctl reload snap.lxd.daemon

1 Like

Ok this fixed the first issue now lxd can connect to ceph no problems.
But now I get this error

Error: Failed to create storage pool "default": Failed to run: ceph --name client.admin --cluster ceph osd pool create lxd 32: Traceback (most recent call last):
  File "/usr/bin/ceph", line 1326, in <module>
    retval = main()
  File "/usr/bin/ceph", line 1246, in main
    sigdict = parse_json_funcsigs(outbuf.decode('utf-8'), 'cli')
  File "/snap/lxd/current/lib/python3/dist-packages/ceph_argparse.py", line 847, in parse_json_funcsigs
    cmd['sig'] = parse_funcsig(cmd['sig'])
  File "/snap/lxd/current/lib/python3/dist-packages/ceph_argparse.py", line 793, in parse_funcsig
    newsig.append(argdesc(t,
  File "/snap/lxd/current/lib/python3/dist-packages/ceph_argparse.py", line 673, in __init__
    self.instance = self.t(**self.typeargs)
TypeError: __init__() got an unexpected keyword argument 'positional'

I do realize this is technically a ceph issue.
but running that same command creates a new pool no problems.

snapcraft: Clear PYTHONPATH in run-host · lxc/lxd-pkg-snap@00ddce3 · GitHub may help with that

Having the same problem again and setting ceph.external=true isn’t helping

Environment

  • ceph 17.2.3
  • lxd edge (git-ae54a13) AND lxd stable (5.44)
  • lxd.ceph.external=true

Issue

root@nyc3-lxd:~# lxc --debug storage info remote  
          ...
			"storage": "ceph",
			"storage_version": "17.2.3",
			"storage_supported_drivers": [
				{
					"Name": "ceph",
					"Version": "17.2.3",
					"Remote": true
				},
				{
					"Name": "cephfs",
					"Version": "17.2.3",
					"Remote": true
				},
				{
					"Name": "cephobject",
					"Version": "17.2.3",
					"Remote": true
				},
                ...
 			]
		}
	} 
DEBUG  [2022-08-18T13:07:09Z] Sending request to LXD                        etag= method=GET url="http://unix.socket/1.0/storage-pools/remote"
DEBUG  [2022-08-18T13:07:09Z] Got response struct from LXD                 
DEBUG  [2022-08-18T13:07:09Z] 
	{
		"config": {
			"source": "lxd-ny"
		},
		"description": "",
		"name": "remote",
		"driver": "ceph",
		"used_by": null,
		"status": "Pending",
		"locations": [
			"none"
		]
	} 
DEBUG  [2022-08-18T13:07:09Z] Sending request to LXD                        etag= method=GET url="http://unix.socket/1.0/storage-pools/remote/resources"

Error: Failed to run: ceph --name client. --cluster  df -f json: exit status 13 (2022-08-18T13:07:09.528+0000 7f978effd700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]
[errno 13] RADOS permission denied (error connecting to the cluster))

And all LXD ceph operations fail.
I am unable to create a new storage pool or list existing PENDING ceph ones.

So what gives?
I tried purging lxd and installing it again and that didn’t work

Also on a side note that might help narrow down this issue:

The ceph client I have installed supports reading from /etc/ceph/ceph.client.admin.keyring
But I get this error message when trying to do a ceph operation from lxd

root@nyc3-lxd:~# lxc storage info remote  

Error: Failed to run: ceph --name client. --cluster  df -f json: exit status 13 (2022-08-18T13:13:59.796+0000 7f74c749a700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client..keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory

which is fixed by copying /etc/ceph/ceph.client.admin.keyring to /etc/ceph/ceph.keyring

Any help would be appreciated.
@stgraber
@tomp

Please show lxc storage show <ceph pool>

âžś  ~ lxc storage show remote
config:
  source: lxd-ny
description: ""
name: remote
driver: ceph
used_by: []
status: Pending
locations:
- none

@tomp

OK so looks like you’ve not setup the ceph pool correctly yet.

Ok so turns out the issue was my ceph mgr and monitor nodes were undersized.
Increasing their size fixed everything.
Thanks!

1 Like