I have a slightly confusing issue, that seems to resolve itself, but is still buggy:
If I install IncusOS on one of my NVMe SSDs, I have no issue.
If I install IncusOS on a different NVMe SSD (in the machine at the same time, I wiped the other install), it gets stuck at “IncusOS is starting” for many minutes. Then I was watching the “IncusOS failed to start. Debug information follows.” messages go by… and noticed the boot drive appears in lsblk output with 8 partitions… but I waited longer and it seems like the disk partitioning finished in the background (?) and lsblk switched after a minute be all 11 partitions! It then hangs indefinitely showing debug output.
I rebooted manually, and subsequent boots work fine now.
No idea why this issue only affects one SSD (CT400T705SSD3), and not the other. I can give full hardware specs if you need them, but the above screenshots have a lot of info.
I can do some additional testing if you would like me to, but I’ll be away from this computer for a while after this week. And I actually wanted to use this 4TB disk as the root drive and don’t want to reinstall too much now that it’s actually working.
IncusOS 202605101621. Install from USB (not ISO), on flash drive, “installation” (not “operational”). Secure boot keys enrolled (I left the Microsoft keys for now, I’ll try clearing them in the future). fTPM. Provided specific target /dev/by-id/device for install on each drive.
IncusOS relies on systemd-repart to create the final three partitions on first boot after install. Partitions 9 and 10 are LUKS-encrypted bound to the TPM and the final one fills all remaining space for the local ZFS pool.
Given you eventually saw all 11 expected partitions and rebooting the system then worked, I would be very suspicious of the 4TB disk. The amount of data required to be written when creating the final three partitions is quite small, and even on spinning disks shouldn’t take more than a second or two. How long did the actual install and copying of data take?
I’m not sure if the journal logs would have been properly persisted to disk on that first boot, but if you could try to grab the systemd-repart logs there might be something useful there to help pinpoint what the issue was.
I’d also pull that disk and place it another machine (for ease of examination), and check smartctl as well as raw read/write speeds. (The ESP partition should have plenty of space for performing write tests, just remember to cleanup any temporary files.)
The only other thing I can think of that would slow down the initial partitioning so much is if your fTPM is super slow. But the only time we’ve encountered really slow TPMs has been in a few VMware-specific virtualization cases.
The strange part is that it boots near instantly when I install to the other SSD, so I doubt it’s the fTPM?
Also, the weird SSD is a PCIe 5.0 NVMe drive (vs the other is an older 4.0 drive). It’s actually very fast, and hasn’t had issues using it with Windows until now. SMART status inside incus is passed: true, I’ll do a more detailed SMART check at some point.
I don’t have a 2nd computer to test with easily, but I did remove all other drives, and the issue persists. I almost wonder if it’s like… slow to enumerate? but I’ve never seen that with PCIe before. With all of my additional tests, it actually looked like all 11 partitions were present the first time lsblk appears on screen. I think it may be only one time I’ve seen partitioning be incomplete, and I think it finished partitioning within ~1min by the 2nd time the lsblk info showed on screen.
A guess, but I wonder if the debug info triggers some kind of reenumeration that unblocks systemd-repart? I’ll look around in my BIOS settings for any PCIe settings, maybe it somehow isn’t waiting long enough before booting the OS.
This is a powerful gaming/workstation PC that I won’t be able to use while traveling for a bit, so I wanted to put something fun on it to test and access remotely. I’ll have access via a PiKVM (thus the nice screenshots), but will be physically away from it for a month+. If this problem is unique to me, I guess we can ignore it. But let me know if you want to do any further testing.
Is there any way to get logs during that first boot? The disk isn’t mounted, so it seems nothing gets written to it. The only systemd-repart logs I saw were from the 2nd boot where it said “No changes.” Can I enable console logging?
Yeah, that helps eliminate one possible issue. After that first boot issue, does IncusOS boot as quickly from the “weird” SSD?
IncusOS sets a 10 minute timeout for devices to become available. That’s way overkill for most systems. In your fourth screenshot I do see the final entry being “Timed out waiting for device dev-gpt\x2dauto\x2droot.device …”, which indicates the root partition hadn’t become available. (IncusOS relies on partition labels to automatically discover and mount the appropriate devices.) On first boot, systemd-repart will create and then automatically unlock the root partition, but that didn’t happen for some reason in your case.
The debug info is a simple service and script that activate if the system reaches the emergency.target, such as being unable to mount a root file system. It running shouldn’t affect/change any system state.
No, unfortunately in this case IncusOS locks things down pretty tight. Because we boot signed UKIs, we can’t even change the kernel command line. I was hoping that maybe when systemd-repart had succeeded other dependent services would have activated and saved the ephemeral journal contents, but I guess that didn’t happen.