Can no longer start VMs, ceph.conf permission denied

I first noticed this saturday the 25th when I rebooted my VM and it did not come back up.

As of a few days I can no longer reboot my VMs hosted on Ceph RBD storage:
error reading conf file /etc/ceph/ceph.conf: Permission denied

This may very well be an issue for a longer time as I haven’t had time to manage my infrastructure for the past few weeks.

~
root @ node3 # ls -l /etc/ceph/ceph.conf
-rw-r----- 1 root root 556 Sep 17  2020 /etc/ceph/ceph.conf

~
root @ node3 # snap list
Name    Version      Rev    Tracking       Publisher   Notes
core18  20220428     2409   latest/stable  canonical✓  base
core20  20220527     1518   latest/stable  canonical✓  base
lxd     5.2-79c3c3b  23155  latest/stable  canonical✓  in-cohort
snapd   2.56         16010  latest/stable  canonical✓  snapd

~
root @ node3 # lxc start transmission2
Error: Failed setting up device via monitor: Failed adding block device for disk device "root": Failed adding block device: error reading conf file /etc/ceph/ceph.conf: Permission denied
Try `lxc info --show-log transmission2` for more info

My LXC guests that have their disk on Ceph RBD can start just fine!

Changing the permissions of /etc/ceph and /etc/ceph/ceph.conf to 777/644 allows me to start my VMs again but obviously this is not something that is feasible.

changing ownership of /etc/ceph and /etc/ceph/ceph.conf to root:lxd with the original permissions also doesn’t work. What permissions do I need for this file + directory?

Timestamp on /etc/ceph says May 20, according to my unattended-upgrades log ceph packages were upgraded on that day to 16.2.9-1focal.

I know that @stgraber also runs on Ceph so I am curious to know if you also have this issue

~
root @ node2 # lxc info --show-log transmission2
Name: transmission2
Status: STOPPED
Type: virtual-machine
Architecture: x86_64
Location: node2
Created: 2021/03/01 11:45 CET
Last Used: 2022/06/28 10:01 CEST

Log:

warning: tap: open vhost char device failed: Permission denied
warning: tap: open vhost char device failed: Permission denied
warning: tap: open vhost char device failed: Permission denied
[...]
~
root @ node2 # ls -la /etc/ceph
total 28
drwxr-x---   2 root root  4096 May 20 06:49 .
drwxr-xr-x 120 root root 12288 Jun 28 06:21 ..
-rw-r-----   1 root root    64 Sep 24  2020 ceph.client.admin.keyring
-rw-r-----   1 root root   556 Sep 24  2020 ceph.conf
-rw-------   1 root root    84 Sep 24  2020 ceph.keyring

~
root @ node2 # chmod 755 /etc/ceph

~
root @ node2 # lxc start transmission2

works :-)

755 on /etc/ceph works, 750 on /etc/ceph does not work. Both with root:root and root:lxd as owner on directory and ceph.conf.

The funny thing about this test is that I leave ceph.conf on 640 in both cases… I’m a bit confused because of that.

~
root @ node2 # stat /etc/ceph
  File: /etc/ceph
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: fd03h/64771d    Inode: 525230      Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-06-28 10:02:45.973027312 +0200
Modify: 2022-05-20 06:49:19.454047867 +0200
Change: 2022-06-28 10:05:05.776382866 +0200
 Birth: -

~
root @ node2 # stat /etc/ceph/ceph.conf
  File: /etc/ceph/ceph.conf
  Size: 556             Blocks: 8          IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 525491      Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-06-28 10:03:34.532801869 +0200
Modify: 2020-09-24 10:11:02.923594501 +0200
Change: 2022-06-28 10:02:46.913022928 +0200
 Birth: -

Normal permissions for /etc/ceph/ceph.conf should be reasonably open, something like 644. What needs to be tightened down is the ceph keyring itself.

Here I have (and looks like a pretty default setup):

  • ceph.client.admin.keyring is root:root and 600
  • ceph.conf is root:root and 644