LXD CephFS 'Not implemented' error

yang · March 31, 2021, 10:40am

I have a fresh 3 node Ceph cluster setup with CephFS created. Following are the cluster details:

$ sudo ceph -s
  cluster:
    id:     250f2880-9203-11eb-b6c2-013097c225cb
    health: HEALTH_WARN
            Degraded data redundancy: 7/66 objects degraded (10.606%), 4 pgs degraded, 16 pgs undersized
 
  services:
    mon: 3 daemons, quorum node1,node2,node3 (age 40m)
    mgr: node1.vtgvss(active, since 31m), standbys: node2.yweyet
    mds: lxd-storage:1 {0=lxd-storage.node3.vuxquu=up:active} 1 up:standby
    osd: 3 osds: 3 up (since 34m), 3 in (since 34m)
 
  data:
    pools:   3 pools, 65 pgs
    objects: 22 objects, 7.9 KiB
    usage:   3.0 GiB used, 11 TiB / 11 TiB avail
    pgs:     7/66 objects degraded (10.606%)
             49 active+clean
             12 active+undersized
             4  active+undersized+degraded

$ sudo ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd     11 TiB   11 TiB  8.6 MiB   2.0 GiB       0.02
ssd    346 GiB  345 GiB  284 KiB   1.0 GiB       0.29
TOTAL   11 TiB   11 TiB  8.9 MiB   3.0 GiB       0.03
 
--- POOLS ---
POOL                     ID  PGS  STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics     1    1      0 B        0      0 B      0    3.5 TiB
cephfs.lxd-storage.meta   2   32  6.2 KiB       22  1.0 MiB      0    3.9 TiB
cephfs.lxd-storage.data   3   32      0 B        0      0 B      0    3.5 TiB

All nodes ares using LXD 4.11 .

I created the remote LXD cluster storage pool through lxd init as follows:

$ sudo lxd init

Would you like to use LXD clustering? (yes/no) [default=no]: yes
What name should be used to identify this node in the cluster? [default=node1]: 
What IP address or DNS name should be used to reach this node? [default=192.168.1.110]: 
Are you joining an existing cluster? (yes/no) [default=no]: 
Setup password authentication on the cluster? (yes/no) [default=yes]: 
Trust password for new clients:
Again: 
Do you want to configure a new local storage pool? (yes/no) [default=yes]: no
Do you want to configure a new remote storage pool? (yes/no) [default=no]: yes
Name of the storage backend to use (ceph, cephfs) [default=ceph]: cephfs
Create a new CEPHFS pool? (yes/no) [default=yes]: no
Name of the existing CEPHFS pool or dataset: lxd-storage
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to configure LXD to use an existing bridge or host interface? (yes/no) [default=no]: 
Would you like to create a new Fan overlay network? (yes/no) [default=yes]: 
What subnet should be used as the Fan underlay? [default=auto]: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes] 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]:

The LXD cluster looks like the following:

$ lxc cluster list
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| NAME  |            URL             | DATABASE | STATE  |      MESSAGE      | ARCHITECTURE | FAILURE DOMAIN |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| node1 | https://192.168.1.110:8443 | YES      | ONLINE | fully operational | x86_64       | default        |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| node2 | https://192.168.1.176:8443 | YES      | ONLINE | fully operational | x86_64       | default        |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+
| node3 | https://192.168.1.193:8443 | YES      | ONLINE | fully operational | x86_64       | default        |
+-------+----------------------------+----------+--------+-------------------+--------------+----------------+

The CephFS storage pool seems to have been created correctly:

$ lxc storage show remote
config:
  cephfs.cluster_name: ceph
  cephfs.path: lxd-storage
  cephfs.user.name: admin
description: ""
name: remote
driver: cephfs
used_by:
- /1.0/instances/test-container
- /1.0/instances/ubuntu-container
- /1.0/profiles/default
- /1.0/profiles/vm
status: Created
locations:
- node1
- node2
- node3

However when creating an instance, the following error occurs:

$ lxc init ubuntu:20.04 test-container
Creating test-container
Error: Failed instance creation: Load instance storage pool: Not implemented

The instance is however still listed under lxc list

$ lxc list
+------------------+---------+------+------+-----------+-----------+----------+
|       NAME       |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+------------------+---------+------+------+-----------+-----------+----------+
| test-container   | STOPPED |      |      | CONTAINER | 0         | node2    |
+------------------+---------+------+------+-----------+-----------+----------+
| ubuntu-container | STOPPED |      |      | CONTAINER | 0         | node1    |
+------------------+---------+------+------+-----------+-----------+----------+

I am unable to delete the storage pool to reconfigure as it is being used, however I am also unable to delete the instances:

$ lxc delete test-container
Error: Not implemented

Any tips are appreciated.

tomp · March 31, 2021, 11:13am

Are you able to capture the output of lxc monitor --type=logging --pretty in one window while you try to init a new container on that storage pool please.

tomp · March 31, 2021, 11:17am

Ah, I know the issue, currently the cephfs storage driver doesn’t supporting hosting instances, only custom volumes.

However it certainly shouldn’t leave instance DB records left over.

Can you upgrade to 4.12 and let me know if thats still happening please.

yang · March 31, 2021, 11:18am

Here’s the output for the init:

DBUG[03-31|11:15:05] Handling                                 username=yeyang ip=@ method=GET protocol=unix url=/1.0
DBUG[03-31|11:15:05] Handling                                 ip=@ method=GET protocol=unix url=/1.0/events username=yeyang
DBUG[03-31|11:15:05] New event listener: f7cebea4-affa-4288-91c6-38b2ab12db03 
DBUG[03-31|11:15:05] Handling                                 url=/1.0/instances username=yeyang ip=@ method=POST protocol=unix
DBUG[03-31|11:15:05] Responding to instance create 
DBUG[03-31|11:15:05] Connecting to a remote simplestreams server 
DBUG[03-31|11:15:08] Heartbeat updating local raft nodes to [{ID:1 Address:192.168.1.110:8443 Role:voter} {ID:2 Address:192.168.1.176:8443 Role:voter} {ID:3 Address:192.168.1.193:8443 Role:voter}] 
DBUG[03-31|11:15:08] Starting heartbeat round 
DBUG[03-31|11:15:08] New task Operation: 69dc3e4a-5f05-42e1-b136-1238bf23fb30 
DBUG[03-31|11:15:08] Started task operation: 69dc3e4a-5f05-42e1-b136-1238bf23fb30 
DBUG[03-31|11:15:08] Connecting to a remote simplestreams server 
DBUG[03-31|11:15:08] Handling                                 ip=@ method=GET protocol=unix url=/1.0/operations/69dc3e4a-5f05-42e1-b136-1238bf23fb30 username=yeyang
DBUG[03-31|11:15:09] Transferring image "46701fa2d99c72583f858c50a25f9f965f06a266b997be7a57a8e66c72b5175b" from node "192.168.1.193:8443" 
DBUG[03-31|11:15:09] Connecting to a remote LXD over HTTPs 
DBUG[03-31|11:15:12] Sending heartbeat to 192.168.1.193:8443 
DBUG[03-31|11:15:12] Sending heartbeat request to 192.168.1.193:8443 
DBUG[03-31|11:15:12] Successful heartbeat for 192.168.1.193:8443 
DBUG[03-31|11:15:13] Sending heartbeat to 192.168.1.176:8443 
DBUG[03-31|11:15:13] Sending heartbeat request to 192.168.1.176:8443 
DBUG[03-31|11:15:13] Image already exists in the DB           fingerprint=46701fa2d99c72583f858c50a25f9f965f06a266b997be7a57a8e66c72b5175b
DBUG[03-31|11:15:13] Successful heartbeat for 192.168.1.176:8443 
INFO[03-31|11:15:13] Creating container                       ephemeral=false instance=test-container3 instanceType=container project=default
DBUG[03-31|11:15:13] Completed heartbeat round 
DBUG[03-31|11:15:14] FillInstanceConfig finished              project=default driver=cephfs instance=test-container3 pool=remote
DBUG[03-31|11:15:14] FillInstanceConfig started               pool=remote project=default driver=cephfs instance=test-container3
INFO[03-31|11:15:14] Created container                        ephemeral=false instance=test-container3 instanceType=container project=default
INFO[03-31|11:15:14] Deleting container                       used="1970-01-01 00:00:00 +0000 UTC" created="2021-03-31 11:15:13.634068072 +0000 UTC" ephemeral=false instance=test-container3 instanceType=container project=default
DBUG[03-31|11:15:14] Failure for task operation: 69dc3e4a-5f05-42e1-b136-1238bf23fb30: Load instance storage pool: Not implemented 
DBUG[03-31|11:15:14] Event listener finished: f7cebea4-affa-4288-91c6-38b2ab12db03 
DBUG[03-31|11:15:14] Disconnected event listener: f7cebea4-affa-4288-91c6-38b2ab12db03

I have tried creating a simple OSD pool and it seems to be working correctly.

yang · March 31, 2021, 11:18am

Ok, I will report back as soon as I have upgraded, thank you.

tomp · March 31, 2021, 11:20am

For interest, the supported volume types are specified here:

github.com

lxc/lxd/blob/master/lxd/storage/drivers/driver_cephfs.go#L79


}
// Info returns the pool driver information.
func (d *cephfs) Info() Info {
	return Info{
		Name:              "cephfs",
		Version:           cephfsVersion,
		OptimizedImages:   false,
		PreservesInodes:   false,
		Remote:            d.isRemote(),
		VolumeTypes:       []VolumeType{VolumeTypeCustom},
		VolumeMultiNode:   true,
		BlockBacking:      false,
		RunningCopyFreeze: false,
		DirectIO:          true,
		MountedRoot:       true,
	}
}
// Create is called during pool creation and is effectively using an empty driver struct.
// WARNING: The Create() function cannot rely on any of the struct attributes being set.

yang · March 31, 2021, 11:49am

Ok, I have updated to LXD 4.12 on all cluster nodes, however the instance deletion issue still persists.

tomp · March 31, 2021, 11:52am

Does it leave new instances though (as opposed to not allowing the removal of the original ones you created)?

yang · March 31, 2021, 11:55am

Unfortunately yes

$ lxc init ubuntu:20.04 test-container2
Creating test-container2
Error: Failed instance creation: Load instance storage pool: Not implemented

$ lxc delete test-container2
Error: Not implemented

tomp · March 31, 2021, 12:49pm

OK thanks I’ll try and reproduce and fix.

stgraber · March 31, 2021, 4:19pm

Ah yeah, sounds like we need earlier detection and possibly a more user friendly error (backend doesn’t support storing instances)?

Ideally it should have errored when setting the disk device to that pool in the first place.

tomp · April 1, 2021, 11:08am

This PR fixes it:

https://github.com/lxc/lxd/pull/8628

tomp · April 1, 2021, 11:25am

You can clear up the orphaned DB records by running:

sudo lxd sql global 'DELETE FROM instances WHERE name = "<instance name>"'
sudo lxd sql global 'DELETE FROM storage_volumes WHERE name = "<instance name>"'

This assumes you don’t have any instances called the same as the orphaned containers in other LXD projects.