Introducing MicroCeph

tomp · December 3, 2022, 5:29pm

Yes thats right. There is a lightweight daemon that is part of the snap package that uses a dqlite clustered database for controlling and configuring ceph components.

tomp · December 8, 2022, 9:59am

If you want to connect an external LXD that isn’t running on the same host(s) as MicroCeph then you need to copy the ceph.conf and ceph.client.admin.keyring files from /var/snap/microceph/current/conf from one of the MicroCeph host(s) to your local LXD system(s) into /etc/ceph.

This isn’t required when using the LXD snap package on the same host(s) as the MicroCeph installation because the LXD snap has detection for MicroCeph built-in:

github.com

lxc/lxd-pkg-snap/blob/9e8d7de5820986a3a3f31f057b6946a831c5ad6c/snapcraft/commands/daemon.start#L298-L299

      
        
            elif [ -e "/var/snap/microceph" ]; then
                ln -s /var/snap/microceph/current/conf/ /etc/ceph

Also required for LXD to operate ceph storage pools on MicroCeph is to run the following command on one of the MicroCeph hosts:

microceph.ceph config set mon mon_allow_pool_delete true

I’ve found this useful for using MicroCeph when running the LXD test suite locally to test the ceph storage driver.

kees · December 8, 2022, 3:28pm

Is it possible at all to change an IP address of a ceph node after completing the init? I have a playfield installation with three (old) PCs. I forgot to give all three a fixed IP address.

Is there a way to change the IP address in the ceph config?

If not, how can redo the installation? Perhaps just snap remove --purge?

stgraber · December 10, 2022, 6:05am

Currently there’s no way to handle re-addressing, so you’d indeed need to go the purge route for the time being.

I think we’ll eventually want to allow it, but it’s a bit tricky as we need to reconfigure both microceph itself and the ceph deployment too, including all of its clients.

jonny_peace · December 11, 2022, 1:25pm

Love what you guys are doing

I’ve encountered an issue after I rebooted each host and now one of the osd’s is down. I can find a way to stop osd’s, but not bring them back up. Rebooting has not helped, and I tried reloading the snap microceph service as well. I’m not hugely familiar with ceph, so it might be obvious to someone else.

I’ve found this in my journalctl

microceph.daemon[5691]: time="2022-12-11T12:46:30Z" level=error msg="Failed to send database upgrade request" error="Patch \"https://10.10.101.43:7000/cluster/internal/database\": Unable to connect to: \"10.10.101.43:7000\""

I don’t have any firewalls active on these hosts, so shouldn’t be something as simple as this. It’s just a 3 pool cluster, 3 drives, and 2 of them are healthy.

tomp · December 11, 2022, 7:15pm

Ah, glad I’m not the only one

github.com/canonical/microceph

ceph osd won't start after forceful stop - where are the logs?

opened 01:15PM - 08 Dec 22 UTC

tomponline

I am running a 3 member ceph cluster in separate VMs, each with a 10GB block dis…k passed in for the OSDs. All was well until suddenly, during the LXD ceph test suite run, all VMs hung and consumed 100% CPU. So I had to forcefully stop them and now one of the VM's OSD won't come online. ``` root 1917 4.3 2.2 1342264 32472 ? Ssl 13:10 0:16 microcephd --state-dir /var/snap/microceph/common/state root 1922 0.0 1.8 693148 27048 ? Ssl 13:10 0:00 ceph-mds -f --cluster ceph --id ceph1 root 1927 4.2 19.0 1193448 278760 ? Ssl 13:10 0:15 ceph-mgr -f --cluster ceph --id ceph1 root 1935 2.8 4.3 783600 63848 ? Ssl 13:10 0:10 ceph-mon -f --cluster ceph --id ceph1 root 1944 0.0 0.0 2888 1056 ? Ss 13:10 0:00 /bin/sh /snap/microceph/35/commands/osd.start root 2023 0.0 0.0 2788 1004 ? S 13:10 0:00 sleep infinity root 2166 0.0 0.0 8368 1012 pts/0 S 13:10 0:00 sleep infinity ``` The `/bin/sh /snap/microceph/35/commands/osd.start` sleeping for infinity is a concern. The problem is I can't diagnose this as there doesn't appear to be any logs. I've looked in `/var/snap/microceph/common/logs` but it is empty. ``` microceph.ceph status cluster: id: 4bb5c238-1fef-461b-8bc3-cfd06f2c6011 health: HEALTH_WARN Reduced data availability: 65 pgs inactive services: mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 6m) mgr: ceph1(active, since 6m), standbys: ceph2, ceph3 osd: 3 osds: 2 up (since 109m), 2 in (since 12m) data: pools: 3 pools, 65 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 65 unknown ```

Please could you post more details there to help figure out what’s going on?

jonny_peace · December 11, 2022, 8:45pm

No problem, I shall see you on the other side

rocket · December 12, 2022, 12:31am

I assume its on the roadmap, but having a preseed option for this tool would be great for a lab env.

kees · December 12, 2022, 3:43pm

What does it mean that storage is PENDING after adding ceph storage?

I didn’t quite start with a totally new lxd installation. I already had one to play with. I did the setup of microceph. It is running. Now I wanted to add a storage ceph as follows:

$ sudo lxc storage add ceph ceph
Error: Pool not pending on any node (use --target <node> first)

So I selected one of the three nodes.

$ sudo lxc storage add ceph ceph --target kwistbeek
Storage pool ceph pending on member kwistbeek

Now the ceph storage is in state PENDING. What do I have to do next?

cemzafer · December 14, 2022, 6:32pm

Hi @kees ,
Take a look at those links.
https://www.youtube.com/watch?v=PX1n3ZAWuAU&t=391s
https://linuxcontainers.org/lxd/docs/master/howto/cluster_config_storage/

Regards.

jonny_peace · December 26, 2022, 2:44pm

Hope you’ve all had a Merry Xmas

I’ve not been able to add to an existing LXD cluster either. I always ended up in pending or error (while guessing) state as well. I did try the above method, but also did reach out to Stephane in the youtube video, but LXD doesn’t automatically pick up the cluster with this command…

systemctl reload snap.lxd.daemon

I’ve tried this…

lxc storage create ceph1 ceph --target uby2
lxc storage create ceph1 ceph --target uby1
lxc storage create ceph1 ceph --target uby3

sudo microceph.ceph osd pool ls

output:
.mgr

lxc storage create ceph1 ceph ceph.osd.pool_name=.mgr

output:

Error: Failed to run: rbd --id admin --cluster ceph --pool .mgr --image-feature layering --size 0B create lxd_.mgr: exit status 95 (2022-12-26T14:25:07.411+0000 7f34ee7fc700 -1 librbd::image::CreateRequest: 0x55650799f480 handle_add_image_to_directory: error adding image to directory: (95) Operation not supported
rbd: create error: (95) Operation not supported)

I am obviously guessing with the above commands to try & make it work, and I did try with the id value from microceph.ceph status, but I get the same error. So it’s not obvious from documentation how to add to an existing lxd cluster… probably more so with my limited skillset with ceph.

I am experiencing issue 71 on the github repo which is probably not related but, thought i’d mention just in case.

Enjoy the rest of your festivities

kriszos · December 27, 2022, 12:26am

I think there is a bug in latest microceph snap, We have the same issues here Introducing MicroCloud - #16 by stgraber

stgraber · January 5, 2023, 12:43am

Some early issues with MicroCeph have been fixed today and both MicroCloud and MicroCeph have been updated.

If you’ve had any issues with either of the projects, please give them another try!

Specifically, this fixes issues with rbd create as well as the reported I/O error in microceph.ceph status which was related. Basically there was an issue with module loading within the OSD daemon which would prevent the creation of RBD images but would not otherwise prevent Ceph from starting up.

kriszos · January 5, 2023, 4:17pm

I checked today and confirm that issue with rbd create has been resolved, thank you very much. i am looking forward to further testing.

jonny_peace · January 6, 2023, 5:18pm

I can verify this works for me as well, thank you

intrepidsilence · January 24, 2023, 5:04pm

Following along with the specific instructions in the video, I now get this when trying to initialize LXD after getting microceph running:

Error: Failed to create storage pool “remote”: Failed to run: ceph --name client.admin --cluster ceph osd pool create lxd 32: exit status 1 (Error initializing cluster client: ObjectNotFound(‘RADOS object not found (error calling conf_read_file)’))

Is it because ceph-common is not installed as part of the microceph snap?

root@lab1:~# microceph.ceph status
cluster:
id: fd84322b-2715-4975-8edf-cc4248e04f45
health: HEALTH_OK

services:
mon: 3 daemons, quorum lab1,lab2,lab3 (age 28m)
mgr: lab1(active, since 30m), standbys: lab2, lab3
osd: 3 osds: 3 up (since 28m), 3 in (since 28m)

data:
pools: 1 pools, 1 pgs
objects: 2 objects, 577 KiB
usage: 65 MiB used, 750 GiB / 750 GiB avail
pgs: 1 active+clean

tonysmithio · April 10, 2023, 1:12pm

@stgraber

It would appear that you can’t use a raw unformatted disk. It seems that the path to the disk to be used for the OSD can only be from the “/dev/disk/by-id/” path. If I provide the simple “/dev/disk” path, then the OSD cannot be added. Raw unformatted disks do not have an “ID” that can be referenced in the path of “/dev/disk/by-id/”. Is there a reason that you can’t just use a “/dev/disk” path? Why only allow the specific path of “/dev/disk/by-id/”?

stgraber · April 11, 2023, 12:53am

Hmm, I thought we fixed that particular bug a couple weeks ago. Could you try with the edge channel of microceph?

We need to promote a full suite of microceph, microovn and microcloud snaps as soon as the new snapd is finally in stable…

vipulbhatt2003 · April 11, 2023, 6:29am

Hello Stéphane,
Before starting the Micro Cloud setup, do we need to add all the nodes to an LXC cluster? Is that a pre-requisite for micro cloud ( and LXD) to detect the servers to be added to micro cloud cluster.

tomp · April 11, 2023, 7:02am

No, MicroCloud will configure LXD into a cluster.