I have incus host running on a debian 12 instance on gce, which was working fine but now gets stuck in emergency mode after a reboot.
The serial port output
[[0;32m OK [0m] Reached target [0;1;39mnetwork-online.target[0m - Network is Online.
[[0;32m OK [0m] Reached target [0;1;39mnss-lookup.…m - Host and Network Name Lookups.
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or "exit"
to boot into default mode.
Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.
Looking through the serial port logs, there are no “error” logs, but a fs related job times out. I’m guessing this is the attached zfs disk (non root).
[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] Job dev-disk-by\x2duuid-6b3277db\x2…tart running (1min 28s / 1min 30s)
M
[K[ [0;31m*[0;1;31m*[0m] Job dev-disk-by\x2duuid-6b3277db\x2…tart running (1min 29s / 1min 30s)
M
[K[ [0;31m*[0m] Job dev-disk-by\x2duuid-6b3277db\x2…tart running (1min 29s / 1min 30s)
M
[K[ [0;31m*[0;1;31m*[0m] Job dev-disk-by\x2duuid-6b3277db\x2…tart running (1min 30s / 1min 30s)
M
[K[[0;1;31m TIME [0m] Timed out waiting for device [0;1;…277db-5548-43c1-8dd5-179c5c2e4dae.
[K[[0;1;38;5;185mDEPEND[0m] Dependency failed for [0;1;39msyst…277db-5548-43c1-8dd5-179c5c2e4dae.
[[0;1;38;5;185mDEPEND[0m] Dependency failed for [0;1;39mmedi…ta\x2d1.mount[0m - /media/data-1.
[[0;1;38;5;185mDEPEND[0m] Dependency failed for [0;1;39mloca…s.target[0m - Local File Systems.
The first time this happened it was due to a “shielded_vm_integrity” event;
I have now disabled vTPM, Integrity Monitoring & Secure Boot options. I no longer get the integrity event, however the machine still goes to emergency mode.
I would like to recover this instance if possible. Any ideas on how to address? Even connect to the console would be a step in the right direction.
There is a mention about a mount point not being available, on /media/data-1. Is that the storage pool for Incus on there? Can you check the state of that mount point from the GCE environment?
Hmm, that’s likely the storage pool for incus. When I configured the pool I believe I used: incus storage create pd-standard zfs source=/dev/sdb
I’ve since learned that one shouldn’t use /dev/sdX on GCE as the device mapping may change. I can boot a recovery VM instance and mount the root disk, is there a way I can “fix” the incus storage pool? Or even just remove it, I hadn’t begun using it (effectively) yet.
Hi @jarrodu , beginner question; how can I delete the pool from an incus installation on a mounted disk? Would I install incus on the recovery VM and then somehow map incus to the configuration files on the mount?
My thinking is;
boot a recovery VM
mount the “original” root disk
install incus with parameters pointing to the mounted config?
If you want to try and rescue the server, could you share your fstab and/or the relevant .mount files for the failing disks?
The error seems pretty straight forward: a disk that previously existed isn’t there anymore, so the mount job times out. If you can find out what the UUID changed to, if it’s still there, you might be able to just change it.