Microceph vs Cephadm + Microceph partition fix

itoffshore · February 8, 2024, 2:08am

I’ve been experimenting with microceph & more briefly with cephadm (with services running in podman) on a 5 x node “low end” cluster - these details are probably more interesting for people with nodes having single disks who still want to run ceph. TLDR - I’m going to be using microceph for various reasons:

For your cluster you want at least 2 x NIC’s ideally with a 10gb or greater internal network for ceph
5 x node clusters will have better performance than 3 x node clusters.
microceph uses around 1.5gb of RAM on monitor nodes & 500mb on the other nodes
microceph clusters reboot much faster (less than 10 seconds on nvme) - than cephadm nodes (the services in podman took much longer to stop - 30 to 60 seconds or so)
cephadm uses a lot more disk (as it’s running a complete Centos 8 stream system in each container) - RAM usage was similar to microceph
cephadm will not install directly onto partitions - it requires lvm lv’s
install podman then apt install cephadm --no-install-recommends (to avoid pulling in docker) - ubuntu 23.10 gives you podman v4 from the official repos now. You probably want to use docker with cephadm - my cephadm cluster never became healthy (mgr kept crashing - possibly due to using podman ?)
Rather than going by the microceph docs - use microceph init to initialise every node so you can choose the specific ip address to run ceph on (otherwise by default it will use your public interfaces)

These are the basic steps I use to get microceph running on disk partitions (partition fix from original discussion on Github):

Microceph notes
===============

# opening outbound destination ports
# tcp 3300 6789 6800-6850 7443 
# 3300 is the new messenger v2 port (tcp 6789 is v1)
----------------------------------------------------

# put apparmor in complain mode so microceph init works
-------------------------------------------------------
echo -n complain > /sys/module/apparmor/parameters/mode

# installation
---------------
snap install microceph --channel reef/stable
snap refresh --hold microceph

# allow OSD to install on partitions
-------------------------------------------------------
systemctl edit snap.microceph.osd --drop-in=override

[Unit]
ExecStartPre=/path/to/osdfix
-------------------------------------------------------

# the script I use to fix the osd service
-----------------------------------------
* There are actually 3 snap profiles that mention virtio
* (see commented out $PROFILES & $FILES below)
* I found only the osd profile needs changing for partitions to work.
* Just change $ADD to a rule that makes sense for your disks
* As sed is inserting a line you don't need to escape forward slashes in $ADD

#!/bin/sh

TAG="Cephy"
ADD="/dev/vda[4-9] rwk,\t\t\t\t\t   # $TAG"
SEARCH="/dev/vd\[a-z\]"
#PROFILES="/var/lib/snapd/apparmor/profiles/snap.microceph*"
#FILES=$(grep -l $SEARCH $PROFILES)
FILES="/var/lib/snapd/apparmor/profiles/snap.microceph.osd"

for file in $FILES; do
        if ! grep -q $TAG $file; then
                line=$(grep -n $SEARCH $file | cut -d : -f 1)
                sed -i "$line i $ADD" $file
                echo "Reloading: $file"
                apparmor_parser -r $file
        else
                echo "Already configured: $file"
        fi
done

exit 0

-----------------------------------------------------------------------------------

# first node say yes to cluster & add names of additional nodes to see join tokens:
-----------------------------------------------------------------------------------
microceph init

# example partition paths (for a VPS)
-------------------------------------
/dev/disk/by-path/virtio-pci-0000:00:05.0-part4

# additional nodes say no to cluster & provide token from previous step
-----------------------------------------------------------------------
microceph init

------------------------------------------------------------------------------------
# my previous microceph testing - long story short:
# the defaults give good performance 
# with an internal network for ceph no need to go crazy on security

https://discuss.linuxcontainers.org/t/introducing-microceph/15457/47?u=itoffshore
------------------------------------------------------------------------------------

# useful commands
-----------------
microceph disk list
microceph cluster list
microceph cluster config list
ceph osd lspools
ceph osd tree

Hopefully these notes save people some time:

I found microceph completely stable running for weeks on end (until I thought it was a good idea to try cephadm )
I also upgraded microceph from quincy to reef with zero problems

I will be continuing with incus & ovn-central - I did also try microovn but this seems to be better used with microcloud.

I got the cluster uplink network running without too much trouble - I just need to figure out how to give each chassis a memorable name (as microovn does by default)

itoffshore · February 10, 2024, 12:14am

correction to /etc/systemd/system/snap.microceph.osd.service.d/override.conf - it should be

incus-network1154×423 62.9 KB
microceph is tightly coupled to lxd - you probably either need to use these together - or disable apparmor so creating containers on microceph works under incus (or provision ceph with cephadm to use with incus)

Di_p_Nguy_n · February 19, 2024, 1:17am

Hi expert

Im newbie with MicroCeph, and Im getting error

Orchestrator not found - in Service Tabs and NFS Tabs

Can you give me some advices ?

itoffshore · February 19, 2024, 1:32am

I didn’t play around with the Web GUI in microceph - but if that is where you are seeing errors - you probably have not enabled some optional services (e.g metrics)

After a bit more testing I also found my original fix to chattr +i /var/lib/snapd/apparmor/profiles/snap.microceph.osd to stop it losing configuration was needed (you only need to worry about that if you run ceph on partitions)

I’ve since moved on to testing OpenSuse’s MicroOS with rke2 / Longhorn (microceph on partitions doesn’t seem reliable enough - but is probably fine with multiple disks)

Di_p_Nguy_n · February 19, 2024, 3:15am

Thank you for replying

To be honest, i dont understand what you saying because of my lacking in knowledge, but can i ask you one more question, does MicroCeph support NFS via Ganesha ?

itoffshore · February 19, 2024, 9:10am

You are probably better off using cephadm to create the cluster if you want to use nfs. Microceph doesn’t look like it supports nfs (searching the microceph docs for “nfs” returns nothing)

This may be the problem (from nfs):

Under certain conditions, NFS access using the CephFS FSAL fails. This causes an error to be thrown that reads “Input/output error”. Under these circumstances, the application metadata must be set for the CephFS metadata and data pools.

xite6tal · February 20, 2024, 11:09am

I have been using cephadm with podman on debian 12 without any issues. Got the ceph packages from proxmox repo. Did you get to decode what the context of this problem is or error logs ?

itoffshore · February 20, 2024, 6:22pm

I didn’t get to the bottom of the problems with cephadm in Ubuntu 23.10:

part of it was probably due to a 12gb system partition (more space is needed due to all the Centos 8 Stream systems inside podman from cephadm) - maybe I’ll have another go with a bigger system partition.
It could also be my sysctl settings (not enough namespaces)

If you like podman MicroOS is worth a look (it’s an install option for non clustered hosts) - it also has cephadm:

S | Name             | Summary                                             | Type
--+------------------+-----------------------------------------------------+--------
  | ceph-mgr-cephadm | Ceph Manager module for cephadm-based orchestration | package
  | cephadm          | Utility to bootstrap Ceph clusters                  | package

S | Name                     | Summary                                               | Type
--+--------------------------+-------------------------------------------------------+--------
  | cockpit-podman           | Cockpit component for Podman containers               | package
  | podman                   | Daemon-less container engine for managing container-> | package
  | podman-docker            | Emulate Docker CLI using podman                       | package
  | podman-remote            | Client for managing podman containers remotely        | package
  | podmansh                 | Confined login and user shell using podman            | package
  | python39-podman          | A library to interact with a Podman server            | package
  | python39-podman-compose  | A script to run docker-compose using podman           | package
  | python310-podman         | A library to interact with a Podman server            | package
  | python310-podman-compose | A script to run docker-compose using podman           | package
  | python311-podman         | A library to interact with a Podman server            | package
  | python311-podman-compose | A script to run docker-compose using podman           | package
  | python312-podman         | A library to interact with a Podman server            | package
  | python312-podman-compose | A script to run docker-compose using podman           | package

I’m playing around with kiwi-ng to build a custom install iso (very nice - user passwords stored encrypted in configuration) - simple to customize images.
incus available too.