Ceph OSD Fails to Start in LXD Container After Upgrading Host from Ubuntu 20.04 to 22.04

TCMC · May 29, 2023, 7:00pm

Hello,

I am currently facing an issue with my Ceph Cluster, running on Ubuntu 20.04 plus OpenStack Victoria Ubuntu Cloud Archive.

The Ceph OSDs are hosted within LXD containers, and everything was functioning correctly when both the hosts and containers were running Ubuntu 20.04.

I am currently using LXD version 5.13 (I also tried version 5.14 from latest/candidate SNAP channel) installed via SNAP, which worked fine on Ubuntu 20.04.

However, after upgrading the host from Ubuntu 20.04 to 22.04 using the do-release-upgrade command, the Ceph OSD daemon (still unchanged on Ubuntu 20.04 + UCA Victoria) within the LXD container fails to start.

Here is a portion of the LXD Profile (which works in Ubuntu 20.04):

config:
  raw.lxc: |-
    lxc.apparmor.profile = unconfined
    lxc.cgroup.devices.allow = b 253:* rwm
    lxc.mount.entry = /proc/sys/vm proc/sys/vm proc bind,rw 0 0
    lxc.mount.entry = /proc/sys/fs proc/sys/fs proc bind,rw 0 0
  security.privileged: "true"
description: osds
devices:
...

Here is a portion of the LXD Container for Ceph OSD (which works in Ubuntu 20.04):

...
devices:
  mapper-control:
    path: /dev/mapper/control
    type: unix-char
  sda:
    path: /dev/sda
    source: /dev/disk/by-id/ata-Kingston_SSD_XYZ
    type: unix-block
  sdc:
    path: /dev/sdc
    source: /dev/disk/by-id/ata-Seagate_HDD_XYSA
    type: unix-block
  sdd:
    path: /dev/sdd
    source: /dev/disk/by-id/ata-Seagate_HDD_XYCZ
    type: unix-block
  sys-fs:
    path: /proc/sys/fs
    source: /proc/sys/fs
    type: disk
  sys-vm:
    path: /proc/sys/vm
    source: /proc/sys/vm
    type: disk
...

Since the host has been upgraded, the Ceph OSD inside the container (Ubuntu 20.04 + UCA) no longer starts. The following errors are encountered:

[ceph_volume.process][INFO  ] Running command: /usr/sbin/ceph-volume lvm trigger 1-<REMOVED>
[ceph_volume.process][INFO  ] Running command: /usr/sbin/ceph-volume lvm trigger 4-<REMOVED>
[ceph_volume.process][INFO  ] stderr Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-999
/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-block-<REMOVED>/osd-block-<REMOVED> --path /var/lib/ceph/osd/ceph-999 --no-mon-config
abel for /dev/ceph-block-<REMOVED>/osd-block-<REMOVED>: (1) Operation not permitted
400 <STRING> -1 bluestore(/dev/ceph-block-<REMOVED>/osd-block-<REMOVED>) _read_bdev_label failed to open /dev/ceph-block-<REMOVED>/osd-block-<REMOVED>: (1) Operation not permitted
d returned non-zero exit status: 1
[ceph_volume.process][INFO  ] stderr Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-9999
/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-block-<REMOVED>/osd-block-<REMOVED> --path /var/lib/ceph/osd/ceph-9999 --no-mon-config
abel for /dev/ceph-block-<REMOVED>/osd-block-<REMOVED>: (1) Operation not permitted
400 <STRING> -1 bluestore(/dev/ceph-block-<REMOVED>/osd-block-<REMOVED>) _read_bdev_label failed to open /dev/ceph-block-<REMOVED>/osd-block-<REMOVED>: (1) Operation not permitted
d returned non-zero exit status: 1
[systemd][WARNING] command returned non-zero exit status: 1
[systemd][WARNING] failed activating OSD, retries left: 1 
[systemd][WARNING] command returned non-zero exit status: 1
[systemd][WARNING] failed activating OSD, retries left: 1

As a result, the /var/lib/ceph/osd/ceph-XYZ aren’t being mounted inside of the LXD Container, as it was before the upgrade to Ubuntu 22.04 in the host. And Ceph OSD doesn’t show up online in the Ceph Mon controllers.

To debug, I ran:

root@osd-1:~# dd if=/dev/ceph-block-<REMOVED>/osd-block-<REMOVED> of=/tmpdata bs=1024 count=1000
dd: failed to open '/dev/ceph-block-<REMOVED>/osd-block-<REMOVED>': Operation not permitted

…Like in the Ceph Volume logs. This works in the other 20.04-based host/containers.

Worth mentioning that the lvdisplay command works inside of the Ceph OSD container (20.04), and I can also see them with ls /dev/mapper. So the problem lies someplace else! I believe…

NOTE: I’m also running /sbin/lvm vgmknodes --refresh as a systemd service in the Ceph OSD container, otherwise, the LVM utilities won’t work, and Ceph Ansible doesn’t even deploy it.

I consulted with ChatGPT, which suggested that the issue might be related to changes in the LXD security model and the introduction of “LXD Security Denials” in Ubuntu 22.04. However, I am skeptical of this suggestion, I think it’s hallucinating. Enabling nesting, also recommended by ChatGPT, did not resolve the issue, nor did other tips it provided.

I intend to continue running my Ceph OSDs as LXD containers on Ubuntu 22.04 while ensuring they function correctly. Currently, the other nodes in the cluster (all host/container running 20.04) are working as expected (Ceph OSD Inside LXD Containers).

How can I resolve this problem on Ubuntu 22.04?

Given that LXD is the same SNAP package on Ubuntu 20.04 and 22.04, I expected no issues since the SNAP package itself was not modified.

If I export the Ceph OSD container, by running lxc export osd-1 osd-1.tar.gz, reinstall the host O.S. back to Ubuntu 20.04, then lxc import osd-1.tar.gz, everything runs again! This means that the data is intact in the storage devices! It’s just failing to start… This is the third time within the past year that I’m trying to upgrade to Ubuntu 22.04.

I kindly request your advice, as it prevents me from upgrading my entire infrastructure to Ubuntu 22.04 or newer.

Thank you for any assistance you can provide.

NOTE: Also tried ideas from this post: https://chris-sanders.github.io/2018-05-11-block-device-in-containers/

Cheers!
Thiago

tomp · May 30, 2023, 12:48pm

Does sudo dmesg show any AppArmor denials?

TCMC · May 30, 2023, 3:34pm

Hi Thomas,

Thanks for you reply!

No, I’m not seeing any AppArmor denials.

I’m running as root in the bare metal host: dmesg | grep -i apparmor, no denials, while the Ceph OSD daemon is in “start loop” inside of LXD container contantly trying to access /dev/ceph* (but receiving Operation not permitted message).

tomp · May 30, 2023, 3:36pm

What permissions do the /dev/ devices have compared to the host where it works, does anything change?

TCMC · May 30, 2023, 3:57pm

I found a difference, the owner/group of the /dev/ceph* block devices seems wrong.

In the hosts that are still with Ubuntu 20.04 on both host and containers, I can see inside of a working Ceph OSD LXD:

root@osd-2:~# ll /dev/mapper/*
brw-rw---- 1 ceph ceph 253,   3 May 30 11:45 /dev/mapper/ceph--block--<REMOVED-osd--block--<REMOVED>
brw-rw---- 1 ceph ceph 253,   1 May 30 11:45 /dev/mapper/ceph--block--dbs--<REMOVED>-osd--block--db--<REMOVED>
brw-rw---- 1 ceph ceph 253,   0 May 30 11:45 /dev/mapper/ceph--block--dbs--<REMOVED>-osd--block--db--<REMOVED>
brw-rw---- 1 ceph ceph 253,   2 May 30 11:45 /dev/mapper/ceph--block--<REMOVED>-osd--block--<REMOVED>
crw-rw---- 1 root root  10, 236 May 23 14:40 /dev/mapper/control

It belongs to ceph:ceph.

While on the node I upgraded to Ubuntu 22.04, somehow it affected the container, as follows:

root@osd-1:~# ll /dev/mapper/*
brw-rw---- 1 root disk 253,   2 May 30 11:27 /dev/mapper/ceph--block--<REMOVED>-osd--block--<REMOVED>
brw-rw---- 1 root disk 253,   3 May 30 11:27 /dev/mapper/ceph--block--<REMOVED>-osd--block--<REMOVED>
brw-rw---- 1 root disk 253,   1 May 30 11:27 /dev/mapper/ceph--block--dbs--<REMOVED>-osd--block--db--<REMOVED>
brw-rw---- 1 root disk 253,   0 May 30 11:27 /dev/mapper/ceph--block--dbs--<REMOVED>-osd--block--db--<REMOVED>
crw-rw---- 1 root root  10, 236 May 30 11:27 /dev/mapper/control

It belongs to root:disk in the container where it fails! Maybe Ceph updates the owner:group bits when it starts normally (not 100% sure, though).

However, as you can see in my initial tests, not even root can, for example, read from it: dd if=/dev/ceph-block-<REMOVED>/osd-block-<REMOVED> .... this also fails, and I ran it as root.

NOTE: The underlying /dev/sd* device files are the same across Ubuntu 20.04 and 22.04.

TCMC · May 30, 2023, 4:21pm

It’s interesting to note that this:

root@osd-1:~# dd if=/dev/ceph-block-dbs-<REMOVED/osd-block-db-<REMOVED> of=/tmpdata bs=1024 count=1000
dd: failed to open '/dev/ceph-block-dbs-<REMOVED>/osd-block-db-<REMOVED>': Operation not permitted

…Does not work.

However, the Logical Volume (lvm2) called /dev/ceph-block-dbs-<REMOVED>/osd-block-db-<REMOVED> sits on top of /dev/sda (the lvm2 Physical Volume configured by Ceph Ansible when everything was Ubuntu 20.04) (mapped from /dev/disk/by-id/ata-Kingston_SSD_XYZ) works. Look:

root@osd-1:~# dd if=/dev/sda of=/tmpdata bs=1024 count=1000
1000+0 records in
1000+0 records out
1024000 bytes (1.0 MB, 1000 KiB) copied, 0.0208716 s, 49.1 MB/s

I’m definitely curious about how to solve this issue! lol

tomp · June 1, 2023, 9:42am

Any ideas @amikhalitsyn ?

amikhalitsyn · June 1, 2023, 1:52pm

I believe that it should be:

lxc.cgroup2.devices.allow = b 253:* rwm

tomp · June 1, 2023, 1:58pm

Oh good spot!

TCMC · June 1, 2023, 7:13pm

Wheee! It worked!
Thank you @tomp & @amikhalitsyn!
I really appreciate the help.
You folks are awesome!

TCMC · January 10, 2024, 10:59pm

And here we go again! LOL

@amikhalitsyn, only you can solve this!

In my Ubuntu 22.04 with Ceph OSD running inside LXD, all is good with Linux 6.2. But, because I have the meta-package linux-generic-hwe-22.04, as soon as I run apt update ; apt upgrade ; reboot, the Ceph OSD inside the LXD container fails to start because Canonical just released Linux 6.5 for Jammy.

I had to:

apt purge `dpkg -l | grep linux | grep "6.5" | awk '{print $2}' | xargs`
apt autoremove

Then, reboot into Linux 6.2, and Ceph OSD starts again.

Did CGroup infrastructure change again from Linux 6.2 to 6.5?

For reference, here’s the LXD Profile that works with 6.2 but fails with 6.5:

config:
  raw.lxc: |-
    lxc.apparmor.profile = unconfined
    lxc.cgroup2.devices.allow = b 253:* rwm
    lxc.mount.entry = /proc/sys/vm proc/sys/vm proc bind,rw 0 0
    lxc.mount.entry = /proc/sys/fs proc/sys/fs proc bind,rw 0 0
  security.privileged: "true"
...

Thanks in advance!

TCMC · January 15, 2024, 12:15pm

Hello again,

I’ve revisited the issue one more time with my Ceph OSD not starting in the LXD container (Ubuntu 20.04) after upgrading the host to Linux 6.5 in Ubuntu 22.04, using the linux-generic-hwe-22.04 package. Unfortunately, the same problem is back again. To recap, my LXD profile includes the setting lxc.cgroup2.devices.allow, which functions correctly under Linux 6.2 but fails with Linux 6.5.

I’m trying to understand precisely what changes between Linux 6.2 and 6.5 could impact this functionality. Is there any alteration in the kernel that might affect lxc.cgroup2.devices.allow or related container permissions? Any insights or suggestions would be greatly appreciated, as this setup is critical for my environment.

Cheers!

TCMC · January 25, 2024, 1:40pm

No one?!

Unfortunately, each upgrade breaks Linux containers, making it difficult to use this technology meaningfully.

Suppose it is not possible to fix it or have it work reliably and predictably, no worries. But I want to know so I can decide to reconfigure it (with help from this community) or give up (on LXD or Incus) and get back to bare metal.

It would be a shame to give up on it. I love this tech! But I see no other choice if it is unreliable and risks breaking on each kernel upgrade.

Perhaps I should contact Kernel folks somewhere?