As you may have seen in some of our tutorials and videos, building up a Ceph cluster can be a bit tricky and time consuming, especially if it’s just for testing or a small home lab.
To make this much easier, we’ve been spending a bit of time creating something called microceph.
It’s available as a snap package, currently only tested on Ubuntu but likely to be working on other distros too. That snap uses a small management daemon that shares a bunch of the clustering logic that LXD also uses. This allows for very easy clustering of multiple systems together which combined with an easy bootstrap process allows for setting up a Ceph cluster in just a few minutes.
If you’d like to try it out, you can just run snap install microceph followed with microceph init.
This will run you through the setup process interactively. The first system will create a new Ceph cluster, let you add additional systems and then add disks.
You can use this on a single system with at least 3 disks or partitions though you’ll need to tweak the replication a little bit on your pools for this to work properly.
For a more standard setup, you’d want 3 systems each with at least 1 disk or partition.
All my development and testing was done by running microceph inside of 3 LXD virtual machines, each with an additional disk attached for use by Ceph.
Once all configured, you can run microceph.ceph status to make sure it’s all good.
The resulting Ceph configuration and keyring can be found at /var/snap/microceph/current/conf/ and can be copied over to LXD or any other application supporting Ceph.
There’s not a ton of configuration to it really at this point. microceph init will get it up and running, at which point you’ve got a Ceph running with mon, mds, mgr and osd services and those can be configured as normal through the use of microceph.ceph.
These days, it’s not recommended to use the ceph.conf file directly, instead, you can use the config command which lets you set configuration on specific daemons, machines, locations, …
It’s a lot easier to deploy MicroCeph than a traditional production deployment.
MicroCeph takes care of the initial service placement for HA, so you don’t really have to think about that.
For differences, one internal difference is that MicroCeph doesn’t use LVM to label the disks.
Instead the disks are recorded in the MicroCeph database and have the OSD spawned directly on them. This saves us from having to also drive LVM and makes things a bit tidier.
In general the goal is for MicroCeph to be usable in production as a way to have a Ceph that can very easily be setup across any number of machines. The versions of the various Ceph daemons are identical to what you’d get through Ubuntu 22.04 LTS as we’re actually consuming those packages.
I think we’ll want to wait for more users to play with this in homelabs and report back any obvious issue we didn’t see in our own use/testing so far before we feel confident telling folks to use this for small production sites.
So I’d probably want to give it another month at this point.
Hi,
I’m not so sure whether creating this post in here or a new one anyway I create a simple microceph environment but I cant figure out that error, may be someone can explain how to resolve this.
Regards.
root@cephnode1:~# ceph -s
Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')
Humm, I think, I found the exact problem, on the server cephnode1, the snap.microceph.daemon looks active(running) but logs prints error.
root@cephnode1:/var/log# systemctl status snap.microceph.daemon
● snap.microceph.daemon.service - Service for snap application microceph.daemon
Loaded: loaded (/etc/systemd/system/snap.microceph.daemon.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2022-10-28 15:37:10 +03; 1s ago
Main PID: 2176 (microcephd)
Tasks: 8 (limit: 1117)
Memory: 12.9M
CPU: 595ms
CGroup: /system.slice/snap.microceph.daemon.service
└─2176 microcephd --state-dir /var/snap/microceph/common/state
Oct 28 15:37:10 cephnode1 systemd[1]: Started Service for snap application microceph.daemon.
root@cephnode1:/var/log# journalctl -f -u snap.microceph.daemon
Oct 28 15:40:20 cephnode1 systemd[1]: snap.microceph.daemon.service: Failed with result 'exit-code'.
Oct 28 15:40:20 cephnode1 systemd[1]: snap.microceph.daemon.service: Scheduled restart job, restart counter is at 32.
Oct 28 15:40:20 cephnode1 systemd[1]: Stopped Service for snap application microceph.daemon.
Oct 28 15:40:20 cephnode1 systemd[1]: Started Service for snap application microceph.daemon.
Oct 28 15:40:30 cephnode1 microceph.daemon[2648]: Error: Unable to start daemon: Daemon failed to start: Failed to re-establish cluster connection: context deadline exceeded
Oct 28 15:40:30 cephnode1 systemd[1]: snap.microceph.daemon.service: Main process exited, code=exited, status=1/FAILURE
Oct 28 15:40:30 cephnode1 systemd[1]: snap.microceph.daemon.service: Failed with result 'exit-code'.
Oct 28 15:40:31 cephnode1 systemd[1]: snap.microceph.daemon.service: Scheduled restart job, restart counter is at 33.
Oct 28 15:40:31 cephnode1 systemd[1]: Stopped Service for snap application microceph.daemon.
Oct 28 15:40:31 cephnode1 systemd[1]: Started Service for snap application microceph.daemon.
I run a production LXD+Ceph cluster in a datacenter on 3 servers where I do weekly rolling reboots for security updates, all services running on the remaining two servers run without any issue during the reboot of the 3rd server.
I have encountered some difficulties but completed the task and impressed with the result. I have installed 3 lxd vm and add external disks to them. Here are some outputs of my settings.
Thanks for the effort, regards.
root@cephnode1:~# microceph disk list
Disks configured in MicroCeph:
+-----+-----------+----------------------------------------------------+
| OSD | LOCATION | PATH |
+-----+-----------+----------------------------------------------------+
| 0 | cephnode2 | /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_data1 |
+-----+-----------+----------------------------------------------------+
| 1 | cephnode3 | /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_data1 |
+-----+-----------+----------------------------------------------------+
| 2 | cephnode1 | /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_lxd_data1 |
+-----+-----------+----------------------------------------------------+
Available unpartitioned disks on this system:
+-------+----------+------+------+
| MODEL | CAPACITY | TYPE | PATH |
+-------+----------+------+------+
And from the client point of view, I have installed the ceph-common package to the client and after the installation, copied ceph.conf and ceph.client.admin.keyring from any cephnode to the /etc/ceph directory and here is the ceph status.
indiana@lxdserver:~$ ceph -s
cluster:
id: 61bfdca6-3de5-429b-a73a-fae9e912b8d9
health: HEALTH_WARN
3 osds down
3 hosts (3 osds) down
1 root (3 osds) down
Reduced data availability: 1 pg inactive
services:
mon: 3 daemons, quorum cephnode1,cephnode2,cephnode3 (age 46m)
mgr: cephnode2(active, since 47m), standbys: cephnode1, cephnode3
osd: 3 osds: 0 up (since 16m), 3 in (since 2h)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
1 unknown
I have installed the microceph snap and encountered following warnings. Is this expected?
I am doing this on Ubuntu 22.04.1 with LXD 5.7/stable.
ubuntu@lxd-host-01:~$ sudo snap install microceph
2022-11-02T11:04:32+05:00 INFO snap “microceph” has bad plugs or slots: microceph (unknown interface “microceph”)
2022-11-02T11:04:34+05:00 INFO snap “microceph” has bad plugs or slots: microceph (unknown interface “microceph”)
microceph 0+git.499c15f from Canonical✓ installed
WARNING: There is 1 new warning. See ‘snap warnings’.
ubuntu@lxd-host-01:~$ snap warnings
last-occurrence: today at 11:04 PKT
warning: |
snap “microceph” has bad plugs or slots: microceph (unknown interface “microceph”)
We will be starting testing this pretty much immediately @stgraber
I would hope to see the juju charms follow along with this somehow for LXD as it adds the operational side of things for us. We are currently half way through setting up full scale LXD cluster with Juju and have hit some bugs which we would like to work with you on.