As of this morning, two nodes of my three node Microceph cluster are experiencing constant restarts of the microceph daemon. The error message is “Daemon failed to start: Failed to re-establish cluster connection: context deadline exceeded”. The Ceph cluster itself says it’s healthy and the LXD cluster shows all nodes as online. This all started this morning when I got a notification from my monitoring that the ceph cluster was in WARN because of some OSDs that appeared to bounce. This looks like the start of the trouble:
Apr 14 09:21:29 lxd01 audit[428130]: AVC apparmor="DENIED" operation="ptrace" namespace="root//lxd-nextcloud_<var-snap-lxd-common-lxd>" profile="snap.nextcloud.nextcloud-cron" >
Apr 14 09:21:29 lxd01 audit[428130]: AVC apparmor="DENIED" operation="ptrace" namespace="root//lxd-nextcloud_<var-snap-lxd-common-lxd>" profile="snap.nextcloud.nextcloud-cron" >
Apr 14 09:29:10 lxd01 snapd[3408143]: storehelpers.go:769: cannot refresh: snap has no updates available: "bpytop", "btop", "core20", "core22", "lxd", "microcloud", "snapd"
Apr 14 09:29:18 lxd01 systemd[1]: Reloading.
Apr 14 09:29:23 lxd01 systemd[1]: Mounting Mount unit for microceph, revision 318...
Apr 14 09:29:23 lxd01 kernel: loop16: detected capacity change from 0 to 181128
Apr 14 09:29:23 lxd01 systemd[1]: Mounted Mount unit for microceph, revision 318.
Apr 14 09:29:24 lxd01 systemd[1]: Stopping Service for snap application microceph.mon...
Apr 14 09:29:24 lxd01 microceph.mon[1812]: 2023-04-14T09:29:24.068-0500 7f4c37fff640 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0
Apr 14 09:29:24 lxd01 microceph.mon[1812]: 2023-04-14T09:29:24.068-0500 7f4c37fff640 -1 mon.lxd01@2(peon) e4 *** Got Signal Terminated ***
Apr 14 09:29:24 lxd01 systemd[1]: snap.microceph.mon.service: Deactivated successfully.
Apr 14 09:29:24 lxd01 systemd[1]: Stopped Service for snap application microceph.mon.
Apr 14 09:29:24 lxd01 systemd[1]: snap.microceph.mon.service: Consumed 12h 53min 20.848s CPU time.
Apr 14 09:29:24 lxd01 systemd[1]: Stopping Service for snap application microceph.daemon...
Apr 14 09:29:54 lxd01 systemd[1]: snap.microceph.daemon.service: State 'stop-sigterm' timed out. Killing.
Apr 14 09:29:54 lxd01 systemd[1]: snap.microceph.daemon.service: Killing process 1808 (microcephd) with signal SIGKILL.
Apr 14 09:29:54 lxd01 systemd[1]: snap.microceph.daemon.service: Killing process 3714 (microcephd) with signal SIGKILL.
Apr 14 09:29:54 lxd01 systemd[1]: snap.microceph.daemon.service: Killing process 3743 (microcephd) with signal SIGKILL.
Apr 14 09:29:54 lxd01 systemd[1]: snap.microceph.daemon.service: Killing process 4693 (n/a) with signal SIGKILL.
Apr 14 09:29:54 lxd01 systemd[1]: snap.microceph.daemon.service: Killing process 8394 (n/a) with signal SIGKILL.
Apr 14 09:29:54 lxd01 systemd[1]: snap.microceph.daemon.service: Killing process 12558 (n/a) with signal SIGKILL.
Apr 14 09:29:54 lxd01 systemd[1]: snap.microceph.daemon.service: Main process exited, code=killed, status=9/KILL
Apr 14 09:29:54 lxd01 systemd[1]: snap.microceph.daemon.service: Failed with result 'timeout'.
Apr 14 09:29:54 lxd01 systemd[1]: Stopped Service for snap application microceph.daemon.
Apr 14 09:29:54 lxd01 systemd[1]: snap.microceph.daemon.service: Consumed 1h 38min 5.330s CPU time.
Apr 14 09:29:54 lxd01 systemd[1]: Stopping Service for snap application microceph.mds...
Apr 14 09:29:55 lxd01 microceph.mds[1809]: 2023-04-14T09:29:54.996-0500 7f2a175e7640 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0
Apr 14 09:29:55 lxd01 microceph.mds[1809]: 2023-04-14T09:29:54.996-0500 7f2a175e7640 -1 mds.lxd01 *** got signal Terminated ***
Apr 14 09:30:03 lxd01 systemd[1]: snap.microceph.mds.service: Deactivated successfully.
Apr 14 09:30:03 lxd01 systemd[1]: Stopped Service for snap application microceph.mds.
Apr 14 09:30:03 lxd01 systemd[1]: snap.microceph.mds.service: Consumed 1h 27min 59.255s CPU time.
Apr 14 09:30:03 lxd01 systemd[1]: Stopping Service for snap application microceph.mgr...
Apr 14 09:30:03 lxd01 systemd[1]: snap.microceph.mgr.service: Deactivated successfully.
Apr 14 09:30:03 lxd01 systemd[1]: Stopped Service for snap application microceph.mgr.
Apr 14 09:30:03 lxd01 systemd[1]: snap.microceph.mgr.service: Consumed 1h 1min 22.952s CPU time.
Apr 14 09:30:03 lxd01 systemd[1]: Stopping Service for snap application microceph.osd...
Apr 14 09:30:04 lxd01 kernel: libceph: osd18 (1)192.168.86.27:6827 socket closed (con state OPEN)
Apr 14 09:30:04 lxd01 kernel: libceph: osd20 (1)192.168.86.27:6843 socket closed (con state OPEN)
Apr 14 09:30:04 lxd01 kernel: libceph: osd15 (1)192.168.86.27:6803 socket closed (con state OPEN)
Apr 14 09:30:04 lxd01 kernel: libceph: osd22 (1)192.168.86.27:6859 socket closed (con state OPEN)
Apr 14 09:30:04 lxd01 kernel: libceph: osd21 (1)192.168.86.27:6851 socket closed (con state OPEN)
Apr 14 09:30:04 lxd01 kernel: libceph: osd18 (1)192.168.86.27:6827 socket closed (con state V1_BANNER)
Apr 14 09:30:04 lxd01 kernel: libceph: osd17 (1)192.168.86.27:6819 socket closed (con state OPEN)
Apr 14 09:30:04 lxd01 kernel: libceph: osd18 (1)192.168.86.27:6827 socket error on write
Apr 14 09:30:05 lxd01 kernel: libceph: osd18 (1)192.168.86.27:6827 socket error on write
Apr 14 09:30:05 lxd01 kernel: libceph: osd15 down
Apr 14 09:30:05 lxd01 kernel: libceph: osd16 down
Apr 14 09:30:05 lxd01 kernel: libceph: osd18 down
Apr 14 09:30:05 lxd01 kernel: libceph: osd19 down
Apr 14 09:30:05 lxd01 kernel: libceph: osd22 (1)192.168.86.27:6859 socket closed (con state V1_BANNER)
Apr 14 09:30:05 lxd01 kernel: libceph: osd21 (1)192.168.86.27:6851 socket closed (con state V1_BANNER)
Apr 14 09:30:05 lxd01 kernel: libceph: osd22 (1)192.168.86.27:6859 socket error on write
Apr 14 09:30:05 lxd01 kernel: libceph: osd21 (1)192.168.86.27:6851 socket error on write
Apr 14 09:30:06 lxd01 kernel: libceph: osd22 (1)192.168.86.27:6859 socket error on write
Apr 14 09:30:06 lxd01 kernel: libceph: osd20 down
Apr 14 09:30:06 lxd01 kernel: libceph: osd21 down
Apr 14 09:30:06 lxd01 kernel: libceph: osd22 down
Apr 14 09:30:07 lxd01 systemd[1]: snap.microceph.osd.service: Deactivated successfully.
Apr 14 09:30:07 lxd01 systemd[1]: Stopped Service for snap application microceph.osd.
Apr 14 09:30:07 lxd01 systemd[1]: snap.microceph.osd.service: Consumed 5d 8h 50min 32.310s CPU time.
Apr 14 09:30:07 lxd01 kernel: libceph: osd17 (1)192.168.86.27:6819 socket closed (con state V1_BANNER)
Apr 14 09:30:07 lxd01 snapd[3408143]: services.go:1090: RemoveSnapServices - disabling snap.microceph.daemon.service
Apr 14 09:30:07 lxd01 snapd[3408143]: services.go:1090: RemoveSnapServices - disabling snap.microceph.mgr.service
Apr 14 09:30:07 lxd01 snapd[3408143]: services.go:1090: RemoveSnapServices - disabling snap.microceph.mds.service
Apr 14 09:30:07 lxd01 snapd[3408143]: services.go:1090: RemoveSnapServices - disabling snap.microceph.rgw.service
Apr 14 09:30:07 lxd01 snapd[3408143]: services.go:1090: RemoveSnapServices - disabling snap.microceph.mon.service
Apr 14 09:30:07 lxd01 snapd[3408143]: services.go:1090: RemoveSnapServices - disabling snap.microceph.osd.service
Apr 14 09:30:07 lxd01 systemd[1]: Reloading.
Apr 14 09:30:07 lxd01 kernel: libceph: osd17 (1)192.168.86.27:6819 socket error on write
Apr 14 09:30:08 lxd01 kernel: libceph: osd17 (1)192.168.86.27:6819 socket error on write
Apr 14 09:30:08 lxd01 kernel: libceph: osd17 down
Apr 14 09:30:16 lxd01 audit[430964]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/snap/snapd/18596/usr>
Apr 14 09:30:16 lxd01 audit[430964]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/snap/snapd/18596/usr>
Apr 14 09:30:16 lxd01 kernel: kauditd_printk_skb: 33 callbacks suppressed
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.051:131897): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unc>
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.051:131898): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unc>
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.051:131898): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unc>
Apr 14 09:30:16 lxd01 audit[430968]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.mds" pid=430968 comm="apparmor_parser"
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.243:131899): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.mds" pid=43>
Apr 14 09:30:16 lxd01 audit[430974]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.rbd" pid=430974 comm="apparmor_parser"
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.247:131900): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.rbd" pid=43>
Apr 14 09:30:16 lxd01 audit[430973]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.microceph.radosgw-admin" pid=430973 comm="apparmor_parser"
Apr 14 09:30:16 lxd01 audit[430971]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.mon" pid=430971 comm="apparmor_parser"
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.251:131901): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.microceph.radosgw-admin">
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.251:131902): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.mon" pid=43>
Apr 14 09:30:16 lxd01 audit[430969]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.mgr" pid=430969 comm="apparmor_parser"
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.271:131903): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.mgr" pid=43>
Apr 14 09:30:16 lxd01 audit[430970]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.microceph" pid=430970 comm="apparmor_parser"
Apr 14 09:30:16 lxd01 audit[430967]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.daemon" pid=430967 comm="apparmor_parser"
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.279:131904): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.microceph" >
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.279:131905): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.daemon" pid>
Apr 14 09:30:16 lxd01 audit[430966]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.ceph" pid=430966 comm="apparmor_parser"
Apr 14 09:30:16 lxd01 kernel: audit: type=1400 audit(1681482616.287:131906): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.ceph" pid=4>
Apr 14 09:30:16 lxd01 audit[430975]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.rgw" pid=430975 comm="apparmor_parser"
Apr 14 09:30:16 lxd01 audit[430972]: AVC apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.microceph.osd" pid=430972 comm="apparmor_parser"
Apr 14 09:30:16 lxd01 audit[430978]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="snap-update-ns.microc>
Apr 14 09:30:16 lxd01 audit[430977]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="snap-update-ns.microc>
Apr 14 09:30:16 lxd01 audit[430979]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="snap.microcloud.daemo>
Apr 14 09:30:16 lxd01 audit[430980]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="snap.microcloud.micro>
Apr 14 09:30:16 lxd01 systemd[1]: Reloading.
Apr 14 09:30:17 lxd01 systemd[1]: Reloading.
Apr 14 09:30:19 lxd01 systemd[1]: Started Service for snap application microceph.daemon.
Apr 14 09:30:19 lxd01 systemd[1]: Started Service for snap application microceph.osd.
Apr 14 09:30:19 lxd01 systemd[1]: Started Service for snap application microceph.mon.
Apr 14 09:30:19 lxd01 systemd[1]: Started Service for snap application microceph.mds.
Apr 14 09:30:19 lxd01 systemd[1]: Started Service for snap application microceph.mgr.
Apr 14 09:30:19 lxd01 audit[431054]: AVC apparmor="DENIED" operation="capable" profile="/snap/snapd/18596/usr/lib/snapd/snap-confine" pid=431054 comm="snap-confine" capability=>
Apr 14 09:30:19 lxd01 audit[431054]: AVC apparmor="DENIED" operation="capable" profile="/snap/snapd/18596/usr/lib/snapd/snap-confine" pid=431054 comm="snap-confine" capability=>
Apr 14 09:30:19 lxd01 audit[431061]: AVC apparmor="DENIED" operation="capable" profile="/snap/snapd/18596/usr/lib/snapd/snap-confine" pid=431061 comm="snap-confine" capability=>
Apr 14 09:30:19 lxd01 audit[431061]: AVC apparmor="DENIED" operation="capable" profile="/snap/snapd/18596/usr/lib/snapd/snap-confine" pid=431061 comm="snap-confine" capability=>
Apr 14 09:30:19 lxd01 snapd[3408143]: storehelpers.go:769: cannot refresh snap "microceph": snap has no updates available
Apr 14 09:30:29 lxd01 audit[431092]: AVC apparmor="DENIED" operation="unlink" profile="snap.microceph.mgr" name="/var/snap/microceph/220/run/ceph-mgr.lxd01.asok" pid=431092 com>
Apr 14 09:30:29 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:29.035-0500 7f49f57dcdc0 -1 asok(0x561a692799c0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen:>
Apr 14 09:30:29 lxd01 kernel: kauditd_printk_skb: 10 callbacks suppressed
Apr 14 09:30:29 lxd01 kernel: audit: type=1400 audit(1681482629.035:131917): apparmor="DENIED" operation="unlink" profile="snap.microceph.mgr" name="/var/snap/microceph/220/run>
Apr 14 09:30:29 lxd01 audit[431075]: AVC apparmor="DENIED" operation="mknod" profile="snap.microceph.mds" name="/var/snap/microceph/220/run/ceph-mds.lxd01.asok" pid=431075 comm>
Apr 14 09:30:29 lxd01 microceph.mds[431075]: 2023-04-14T09:30:29.651-0500 7f7a2be0f6c0 -1 asok(0x55a4db036f20) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen:>
Apr 14 09:30:29 lxd01 microceph.mds[431075]: starting mds.lxd01 at
Apr 14 09:30:29 lxd01 kernel: audit: type=1400 audit(1681482629.651:131918): apparmor="DENIED" operation="mknod" profile="snap.microceph.mds" name="/var/snap/microceph/220/run/>
Apr 14 09:30:31 lxd01 audit[431068]: AVC apparmor="DENIED" operation="mknod" profile="snap.microceph.mon" name="/var/snap/microceph/220/run/ceph-mon.lxd01.asok" pid=431068 comm>
Apr 14 09:30:31 lxd01 microceph.mon[431068]: 2023-04-14T09:30:31.271-0500 7f1212758980 -1 asok(0x56275fc3c8f0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen:>
Apr 14 09:30:31 lxd01 kernel: audit: type=1400 audit(1681482631.271:131919): apparmor="DENIED" operation="mknod" profile="snap.microceph.mon" name="/var/snap/microceph/220/run/>
Apr 14 09:30:33 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:33.435-0500 7f49f57dcdc0 -1 mgr[py] Module alerts has missing NOTIFY_TYPES member
Apr 14 09:30:33 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:33.623-0500 7f49f57dcdc0 -1 mgr[py] Module balancer has missing NOTIFY_TYPES member
Apr 14 09:30:33 lxd01 audit[431330]: AVC apparmor="DENIED" operation="unlink" profile="snap.microceph.osd" name="/var/snap/microceph/220/run/ceph-osd.15.asok" pid=431330 comm=">
Apr 14 09:30:33 lxd01 microceph.osd[431330]: 2023-04-14T09:30:33.719-0500 7f3f7550b5c0 -1 asok(0x561e8a509e20) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen:>
Apr 14 09:30:33 lxd01 kernel: audit: type=1400 audit(1681482633.719:131920): apparmor="DENIED" operation="unlink" profile="snap.microceph.osd" name="/var/snap/microceph/220/run>
Apr 14 09:30:33 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:33.927-0500 7f49f57dcdc0 -1 mgr[py] Module crash has missing NOTIFY_TYPES member
Apr 14 09:30:34 lxd01 microceph.osd[431330]: 2023-04-14T09:30:34.615-0500 7f3f7550b5c0 -1 Falling back to public interface
Apr 14 09:30:35 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:35.487-0500 7f49f57dcdc0 -1 mgr[py] Module devicehealth has missing NOTIFY_TYPES member
Apr 14 09:30:35 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:35.595-0500 7f49f57dcdc0 -1 mgr[py] Module influx has missing NOTIFY_TYPES member
Apr 14 09:30:35 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:35.807-0500 7f49f57dcdc0 -1 mgr[py] Module iostat has missing NOTIFY_TYPES member
Apr 14 09:30:36 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:36.371-0500 7f49f57dcdc0 -1 mgr[py] Module orchestrator has missing NOTIFY_TYPES member
Apr 14 09:30:36 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:36.587-0500 7f49f57dcdc0 -1 mgr[py] Module osd_perf_query has missing NOTIFY_TYPES member
Apr 14 09:30:36 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:36.691-0500 7f49f57dcdc0 -1 mgr[py] Module osd_support has missing NOTIFY_TYPES member
Apr 14 09:30:36 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:36.903-0500 7f49f57dcdc0 -1 mgr[py] Module pg_autoscaler has missing NOTIFY_TYPES member
Apr 14 09:30:37 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:37.011-0500 7f49f57dcdc0 -1 mgr[py] Module progress has missing NOTIFY_TYPES member
Apr 14 09:30:37 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:37.503-0500 7f49f57dcdc0 -1 mgr[py] Module prometheus has missing NOTIFY_TYPES member
Apr 14 09:30:37 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:37.695-0500 7f49f57dcdc0 -1 mgr[py] Module rbd_support has missing NOTIFY_TYPES member
Apr 14 09:30:38 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:38.551-0500 7f49f57dcdc0 -1 mgr[py] Module selftest has missing NOTIFY_TYPES member
Apr 14 09:30:38 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:38.675-0500 7f49f57dcdc0 -1 mgr[py] Module snap_schedule has missing NOTIFY_TYPES member
Apr 14 09:30:39 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:39.003-0500 7f49f57dcdc0 -1 mgr[py] Module status has missing NOTIFY_TYPES member
Apr 14 09:30:39 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:39.111-0500 7f49f57dcdc0 -1 mgr[py] Module telegraf has missing NOTIFY_TYPES member
Apr 14 09:30:39 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:39.435-0500 7f49f57dcdc0 -1 mgr[py] Module telemetry has missing NOTIFY_TYPES member
Apr 14 09:30:39 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:39.751-0500 7f49f57dcdc0 -1 mgr[py] Module test_orchestrator has missing NOTIFY_TYPES member
Apr 14 09:30:40 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:40.147-0500 7f49f57dcdc0 -1 mgr[py] Module volumes has missing NOTIFY_TYPES member
Apr 14 09:30:40 lxd01 microceph.mgr[431092]: 2023-04-14T09:30:40.255-0500 7f49f57dcdc0 -1 mgr[py] Module zabbix has missing NOTIFY_TYPES member
Apr 14 09:31:01 lxd01 microceph.daemon[431054]: Error: Unable to start daemon: Daemon failed to start: Failed to re-establish cluster connection: context deadline exceeded
Apr 14 09:31:01 lxd01 systemd[1]: snap.microceph.daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 14 09:31:01 lxd01 systemd[1]: snap.microceph.daemon.service: Failed with result 'exit-code'.
Apr 14 09:31:01 lxd01 systemd[1]: snap.microceph.daemon.service: Consumed 4.850s CPU time.
Apr 14 09:31:02 lxd01 systemd[1]: snap.microceph.daemon.service: Scheduled restart job, restart counter is at 1.
Sorry for the large log dump but I wanted to capture from the original kill of the daemon by init to the error message “context deadline exceeded”.
I’m not super familiar with snaps but maybe due to a snap package update? Both the affected hosts are on “microceph 0+git.ec95dcb 318” and the unaffected host is “microceph 0+git.6208776 220”