Incus host on google compute engine boots to emergency mode

I have incus host running on a debian 12 instance on gce, which was working fine but now gets stuck in emergency mode after a reboot.

The serial port output


[[0;32m  OK  [0m] Reached target [0;1;39mnetwork-online.target[0m - Network is Online.


[[0;32m  OK  [0m] Reached target [0;1;39mnss-lookup.…m - Host and Network Name Lookups.


You are in emergency mode. After logging in, type "journalctl -xb" to view

system logs, "systemctl reboot" to reboot, "systemctl default" or "exit"

to boot into default mode.



Cannot open access to console, the root account is locked.

See sulogin(8) man page for more details.


Looking through the serial port logs, there are no “error” logs, but a fs related job times out. I’m guessing this is the attached zfs disk (non root).

[K[   [0;31m*[0;1;31m*[0m[0;31m*[0m] Job dev-disk-by\x2duuid-6b3277db\x2…tart running (1min 28s / 1min 30s)
M
[K[    [0;31m*[0;1;31m*[0m] Job dev-disk-by\x2duuid-6b3277db\x2…tart running (1min 29s / 1min 30s)
M
[K[     [0;31m*[0m] Job dev-disk-by\x2duuid-6b3277db\x2…tart running (1min 29s / 1min 30s)
M
[K[    [0;31m*[0;1;31m*[0m] Job dev-disk-by\x2duuid-6b3277db\x2…tart running (1min 30s / 1min 30s)
M
[K[[0;1;31m TIME [0m] Timed out waiting for device [0;1;…277db-5548-43c1-8dd5-179c5c2e4dae.
[K[[0;1;38;5;185mDEPEND[0m] Dependency failed for [0;1;39msyst…277db-5548-43c1-8dd5-179c5c2e4dae.
[[0;1;38;5;185mDEPEND[0m] Dependency failed for [0;1;39mmedi…ta\x2d1.mount[0m - /media/data-1.
[[0;1;38;5;185mDEPEND[0m] Dependency failed for [0;1;39mloca…s.target[0m - Local File Systems.

The first time this happened it was due to a “shielded_vm_integrity” event;

lateBootReportEvent: {
   actualMeasurements: [11]
   policyEvaluationPassed: false
   policyMeasurements: [3]
}

I have now disabled vTPM, Integrity Monitoring & Secure Boot options. I no longer get the integrity event, however the machine still goes to emergency mode.

I would like to recover this instance if possible. Any ideas on how to address? Even connect to the console would be a step in the right direction.

Welcome!

There is a mention about a mount point not being available, on /media/data-1. Is that the storage pool for Incus on there? Can you check the state of that mount point from the GCE environment?

Thank you @simos !

Hmm, that’s likely the storage pool for incus. When I configured the pool I believe I used:
incus storage create pd-standard zfs source=/dev/sdb

I’ve since learned that one shouldn’t use /dev/sdX on GCE as the device mapping may change. I can boot a recovery VM instance and mount the root disk, is there a way I can “fix” the incus storage pool? Or even just remove it, I hadn’t begun using it (effectively) yet.

Thanks!

Feel free to delete it and create a new one.

Hi @jarrodu , beginner question; how can I delete the pool from an incus installation on a mounted disk? Would I install incus on the recovery VM and then somehow map incus to the configuration files on the mount?

My thinking is;

  • boot a recovery VM
  • mount the “original” root disk
  • install incus with parameters pointing to the mounted config?

Or is there a better way?

I don’t think I understood your situation until I reread your post.

You are working in the cloud, so why not just destroy the whole thing and start over? I suspect anything else is going to get complicated.

In the cloud, instances are cheap and everything can be automated. It is not uncommon to redeploy instances on every change.

If you want to try and rescue the server, could you share your fstab and/or the relevant .mount files for the failing disks?

The error seems pretty straight forward: a disk that previously existed isn’t there anymore, so the mount job times out. If you can find out what the UUID changed to, if it’s still there, you might be able to just change it.

Thanks all, I deleted and started over. GCE provides a stable symlink /dev/disk/by-id/ that I will use as the block device for creation of the pool.

2 Likes