For posterity: Issues I ran into and my resolution to them

Good morning, afternoon, evening all,

I just finished setting up three nodes with IncusOS and I figured I would go through some of the challenges I ran into and the “process” that seemed to give me the least amount of issues.

For reference, I have 2x Minisforum UM560XT (AMD) and 1x Minisforum MS-01 (Intel). I have them sitting on a Unifi network stack, on a VLAN for just my servers. I have DHCP on that network in the x.x.x.200-253 range, mainly so I can set static IPs in the Unifi console.


The issues I ran into:

  1. So, obviously, TPM issues. I’m not sure why IncusOS is so much more difficult than standard Debian. I was able to plug in a USB and install Debian without an issue, but IncusOS kept giving me that big, red, “SecureBoot Violation” error.
  • I tried manually adding the keys, I tried researching how TPMs worked, I tried everything, I couldnt figure out the right keys to add and where to add them. There are a few different “types” of keys and the TPM menu options aren’t the greatest; most of the time I kept getting “Failed to add key”.
  • Eventually, I learned that there was a setting called “automatically provision factory keys”, which I needed to Disable. Then I could enter Setup Mode.
  • When I entered setup mode, it would try to restart without saving, so I had to hit “cancel”. I verified all of the keys were deleted, then I was able to “Save and exit”.
  • If I didn’t disable the “Automatically provision keys”, when the system rebooted, all factory keys would come back before IncusOS had the chance to set their own keys.
  • Sometimes, I would still get that red warning, so I’d hit “enter” to get through it, it would load back to the BIOS, I’d verify the keys were still empty, and then “save and exit”, and it usually worked the second or third time.
  1. One issue I had was when IncusOS tried to install, it would “hang” on one of the partitions. I’m not sure if one of the partitions was still active, if there’s an issue iterating through LVM partitions, or just more than 3 partitions caused an issue.
  2. I rebooted the endpoint and tried to install Incus again, but I was getting some other weird issues. I don’t remember exactly what they were, but one time I got into incus and the mode was unsafe, and another time I got Starting Incus.... Enter the drive recovery key, which I didn’t have at that point.
  • For mode=unsafe, I think it was because IncusOS might have created/updated an ESP partition, but never “fully” set up; and somehow the TPM keys were still wrong.
  • For drive recovery key, I was trying to reinstall IncusOS from the USB, with the unsafe version still installed. Instead of just wiping the drive and reinstalling, like I expected it to, it was trying to “recover” the unsafe version, which never made a network connection so I would have never been able to get the recovery keys in the first place.
  • For issues 2 and 3, the fix was to delete all of the partitions on the drive and reinstall onto a “clean” system
  1. I wasn’t able to manage IncusOS because the Incus Client on my Windows PC was unable to use incus admin.
  • I discovered that I needed to update the Incus Client, so I reran that winget command and it was able to work.

The process I used on my other two nodes that worked without a problem:

  1. Boot into a live USB (I used MX Linux 25 with xfce)
  2. Use GParted to delete all partitions on the endpoint’s drive.
  3. At this point, I went into the Unifi web console and made sure these endpoints had a DHCP reserved IP (x.x.x.2, x.x.x.3, x.x.x.4), since I didn’t give IncusOS a network config when I downloaded the ISO
  4. Turn of the endpoint, swap USBs, turn it back on (it’s worth noting that I have my boot order set for USB and then system drives)
  5. Wait for the red error banner to pop up, and tap enter 4-5 times until it throws me into the BIOS. (It seems like I had three errors “stacked” and wanted me to acknowledge each one)
  6. Disable “Provision Factory Keys”, “Enter Setup Mode”, Cancel the reboot without saving, then “save and exit”
  7. If the error persists, verify there are no SecureBoot keys and reboot the system again. One time it took three reboots (two failed, one successful) for it to allow IncusOS to update the keys.
  8. Once IncusOS reboots, pull out the flash drive (because of my boot order), and then connect to it from my Windows PC.
  9. incus remote add [name] [ip], incus remote switch [name], incus admin os system security show >> Incus-[name].txt

The goal now is to figure out how to turn these three remote nodes into one cluster, which the IncusOS doc doesn’t seem to cover. I think I’ll start with an nice incus admin init and see if I break stuff. At least now I know how to reinstall IncusOS.

Then, I need to figure out how to create Incus Projects, each with a dedicated OVN network, which I struggled with on Incus on-top-of Debian. The Incus Docs aren’t the best for profiles/networking, but ChatGPT was able to find some random context online/on forums that I wasn’t able to find and then some trial-and-error, I was able to figure out most of my issues with OVN. I was able to get it down to just one issue, and I think it was the last hurdle, which was the host networking configuration controlling the NIC, so Incus wasn’t able to use it for the OVN router IP, so I’m hoping IncusOS naturally just “resolves” that for me.


Hopefully someone finds the weird, convoluted, process that I was able to use helpful.

Cheers.

2 Likes

For my “Creating a cluster” adventure, I was able to follow the bootstrapping guide for regular Incus to create a cluster of one node with:

> incus remote switch Incus-04
> incus config set core.https_address x.x.x.4:8443
> incus cluster enable Incus-04 [this is member name, which is different than remote name, I'm just lazy]
Clustering enabled
> incus config get core.https_address
Error: Get "https://x.x.x.4:8443/1.0": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, ::1, not x.x.x.4
> incus remote switch Incus-02
> incus remote delete Incus-04
> incus remote add Incus-04 x.x.x.4
Certificate Fingerprint: [fingerprint] ok (y/n/[fingerprint])? 
> y
> incus remote switch Incus-04
> incus config get core.https_address
x.x.x.4:8443

I guess setting the core HTTPs address and enabling clustering updates the API endpoint, which invalidated my fingerprint, so deleting it and re-adding it fixed that.

I got the key from incus cluster add [member name], and flipped through some of the options and looked at a handful of configs in a few different locations of one of the other Incus hosts, so far, nothing really pointing to joining an existing cluster. The docs for regular Incus only shows incus admin init and pre-seed files but neither of those options seem to work with IncusOS.

Ill do a bit more digging later tonight and update this comment with what I figure out, for posterity’s sake. I’m gonna look at the rest of the config options inside of the incus remote, incus cluster, incus config, incus admin os, and potentially others. If nothing seems really plausible, I might resort to throwing an Incus Operations Center onto a cheap ThinkCentre M93p Mini, that I have laying around, and see if that gives me the option.

Yeah, it’s definitely possible, the API supports it and that’s how we assemble clusters with Operations Center, but the CLI is currently lacking a standalone incus cluster join.

Thanks for the tip. Later tonight, I’ll just stand up an Operations Center box and build a cluster with that.

I’ve watched some of your(?) videos on the Operations Center, and its a little overkill for my home lab (being one three-node cluster, for now), but I have some old Lenovo minis that I have laying around for single-use purposes like this that would work great.

Haha, yeah, definitely a bit overkill just to get a cluster going, but the alternative right now is to directly poke the API until we get to add that CLI command.

Is there any documentation on the API already available?

I’ll likely just use the Ops Center, I’m just curious at this point

It’s possible to trigger the join by getting a token from the existing server as you’ve done before and then use incus query against a clean server (apply_defaults=false, “Apply default configuration” ticked off in download site) to have it join.

Getting the incus query -X PUT remote:/1.0/cluster -d JSON right may take a few tries as passing certificates around through that is not the most fun.

But it’s basically:

{
  "cluster_address": "EXISTING-CLUSTER-SERVER-ADDRESS-AND-PORT",
  "cluster_certificate": "PUBLIC-X509-PEM-CERTIFICATE-OF-CLUSTER",
  "cluster_token": "JOIN-TOKEN",
  "enabled": true,
  "member_config": [],
  "server_address": "JOINING-SERVER-ADDRESS-AND-PORT",
  "server_name": "JOINING-SERVER-NAME"
}

The member_config part you can figure out by running incus query /1.0/cluster against the existing cluster to see what server-specific keys need to be provided.

With the default IncusOS config, that will most likely be specifying the ZFS source and pool name for the local pool (both values should be local/incus).

1 Like