Details on this week's IncusOS updates and steps for updating to the current stable release

Hey everyone,

This past week was a bit bumpy with IncusOS adding support for a new Incus 7.0 LTS application, which exposed a couple of bugs that resulted in some error states on currently running IncusOS systems. Now that we’ve got things sorted, we want to share with the community what happened, how to get your IncusOS systems back into a good state, and what we’re doing to prevent this sort of thing from happening again.

Don’t panic

First, as long as you have your encryption recovery key(s), there’s no danger of any data loss. Each IncusOS system automatically generates a random encryption recovery key on first boot, which can be retrieved by running incus admin os system security show.

Second, unless your IncusOS system reports that it is running version 202605281810 or later, don’t reboot until fully reading this post.

What happened

With the release of Incus 7.0 LTS earlier this month, we wanted to add the option for IncusOS to install either a version of Incus that tracks the monthly feature releases (the existing incus application), or choose to follow the Incus 7.0 LTS series of releases (the new incus-lts-7.0 application). The new application was added in Add an `incus-lts-7.0` application by gibmat · Pull Request #1118 · lxc/incus-os · GitHub , and was first included in IncusOS 202605270119. Also included in that PR were a few more checks against having more than one primary application installed, which will come up again shortly. This version was available in the stable update channel for approximately 10 hours on May 27th.

Unfortunately, there was a bug in the existing update logic that resulted in IncusOS systems that had the Incus application installed to also download the Incus 7.0 LTS application. This was fixed by incus-osd/providers: Fix selection of proper application for download by gibmat · Pull Request #1120 · lxc/incus-os · GitHub , and included in IncusOS versions 202605272150 and later. It wasn’t caught in development testing because a simpler local update provider was used.

Also included in IncusOS version 202605272150 was an attempt to cleanup the accidental incus-lts-7.0 application. The initial attempt was a bit of a facepalm, since it properly removed the 202605270119 version, but not the fresh 202605272150. :flushed_face: This exposed a secondary bug where some startup code was incorrectly assuming a symlink would always exist for any installed application. Those issues were fixed by incus-osd: Make the cleanup logic for accidental incus LTS more robust by gibmat · Pull Request #1125 · lxc/incus-os · GitHub and incus-osd/update: Ensure a proper symlink exists for a newly installed application by gibmat · Pull Request #1128 · lxc/incus-os · GitHub , both included in IncusOS version 202605281810.

These two bugs resulted in either boot/update check errors about “more than one primary application present” or a boot-time lstat error.

Getting back to a good state

There are a few different states your IncusOS system may be in. The following sections will provide instructions on how to get your IncusOS system to the current stable release.

IncusOS is currently running and accessible via network API:

First, we need to determine what version of IncusOS is currently installed, and if any updates are pending by running incus admin os show. Check the “os_version” and “os_version_next” fields.

os_version >= 202605281810:

Your system is running the latest stable version and no further action is required.

202604070253 < os_version <= 202605181246:

Your IncusOS system is running an older release that should update cleanly to the current stable version. If the os_version_next value isn’t already 202605281810 or later, trigger an update by running incus admin os system update check. Your system will download the update, and once you see the message “IncusOS has been updated to version 202605281810” you can safely reboot the system.

os_version <= 202604070253:

Very old IncusOS systems still running a version from the first week of April (or earlier) may require two update cycles to fully update. If updated and rebooted into IncusOS version 202605281810 through 202605311846 (inclusive), the system will start but installed applications will not be started. The internal state is consistent, but due to a format change rebooting into the prior installed version of IncusOS will likely fail. When the next stable version of IncusOS is published, reboot the IncusOS system and it should then automatically fetch and apply the update restoring full functionality. If an automatic update check runs before rebooting, you may see an error about an existing symlink; this can be safely ignored.

If the old IncusOS system hasn’t yet been rebooted into the current IncusOS update, wait to reboot until a version > 202605311846 is available.

It is strongly advised to apply IncusOS updates, being sure to reboot the system, on a regular basis. Security updates are routinely applied to the components of IncusOS, but some fixes (such as the recent kernel LPEs) require a reboot of the system. Running old versions of IncusOS can leave you open to known exploits.

os_version is 202605270119 or 202605272150:

If your IncusOS is configured for the stable update channel (the default), this means your system grabbed the problematic update version 202605270119 while it was in the stable channel and was then rebooted. Luckily, the system should still have the stable release that was previously running available as the backup boot option which can then fetch the current stable release. Two important things must be verified before reboot: if there any pending updates and what the prior installed version of IncusOS was.

  • Verify that the os_version_next value is 202605270119, indicating that no subsequent update was downloaded and applied. (If this is somehow reported as version 202605281810 or later, it should be safe to reboot your system to the current stable release.) If version 202605272150 is reported, you will need to follow the instructions below as if the system were on the testing update channel.
  • Check what prior IncusOS version is installed. Because there’s not currently a direct way to get this information, we’ll have to check boot logs. The current IncusOS version can be retrieved by running
    incus admin os debug log -b 0 -u incus-osd | grep "System is ready"
    which will return a log entry that should end with “INFO System is ready release=202605270119”. We then need to iterate through prior boot logs by incrementing the -b 0 argument to -b -1, -b -2, etc. Once you see the reported release version change, as long as it’s 202605181246 or older, rebooting your system and selecting that prior version should boot properly, trigger an update check, and then automatically reboot into the current stable release.

If your IncusOS is configured for the testing update channel, and has already updated to or rebooted into version 202605272150, some manual intervention will be required. Make sure you have a copy of your encryption recovery key(s)! With IncusOS still running, it is possible to remove the accidental copy of incus-lts-7.0 by hand and then trigger an update. This is considered a very advanced task, so if you find yourself in this case, please DM either @stgraber or @gibmat and we’ll be happy to walk you through it. Alternatively, you can follow the steps in the next section to access your system with an encryption recovery key and remove the incus-lts-7.0 application that way.

IncusOS fails to start with error lstat /var/lib/extensions/incus-lts-7.0.raw:

As mentioned above, a secondary bug was revealed causing a lstat error on startup. The first thing to try is rebooting IncusOS into the prior installed version. If it’s 202605181246 or older, the new update should download and update automatically.

If both installed versions of IncusOS encounter a lstat error, you will need to use an encryption recovery key to manually unlock the system partition and clean out the incus-lts-7.0 application image(s).

Begin by following the steps outlined in Emergency Procedure for a Lost Client Certificate - IncusOS documentation to decrypt the LUKS root partition; there’s no need to make changes to the client certificate. Once mounted, run the following commands:

rm -f /mnt/incusos/var/lib/extensions/incus-lts-7.0.raw
rm -f /mnt/incusos/var/lib/incus-os-extensions/202605270119/incus-lts-7.0.raw
rm -f /mnt/incusos/var/lib/incus-os-extensions/202605272150/incus-lts-7.0.raw

That will remove the incus-lts-7.0 application from the system, and upon rebooting the update check should proceed normally.

Preventing similar issues in the future

The introduction of the incus-lts-7.0 application is the first time IncusOS has implemented the concept of equivalent applications, where two or more applications (Incus and Incus LTS) can provide the same functionality. The update metadata refers to this as a component, and the update code has been fixed to look at exact filename when downloading application updates to properly differentiate available equivalent applications.

The handling of application updates now also ensures an initial symlink is always created, if it doesn’t already exist. The logic for determining if an application is installed also will check for that symlink’s existence.

Additionally, we realized our existing hook to run a recovery script was running too late in the startup code. If it was available earlier, a signed update could have been provided which would have enabled an automatic cleanup of the incus-lts-7.0 application, regardless of the IncusOS system state. The hook has been moved to the earliest possible moment, right at the beginning of the daemon beginning its startup tasks.

Edited 6/1 to add details about very old IncusOS systems updating.

4 Likes

Thanks for the detailed background. Do you plan any changes to your testing strategy to try to avoid a recurrence? For example it seems like your testing environment was using a “simpler” update mechanism.

It’s definitely a tricky one to catch as it only affected downloads from an actual image server from systems already running the broken image. Effectively needing a set of 3 images for this to show up and get picked up.

With each image build taking almost 45 minutes, it’s not really realistic to fit that in the normal test flow for pull requests. And it’s obviously tricky to do in the daily tests too as those are meant to validate the public images sitting in the testing channel.

But I think I’d at least want us to move away from the local provider and instead run a local web server during testing so we can exercise as much of that code as possible.

It would look like:

  • Switch our test code to use an HTTP image server as described above, effectively doing away with the test-only “local” image source.
  • Add an internal API route to force updating the system to its current image. Effectively allowing us to exercise the image server logic present in the new code without needing to build an extra image.

This may have caught this particular issue and would certainly help catch other issues with the images provider which we otherwise only notice in pre-release manual tests.

@gibmat

1 Like

Makes sense, thanks for the thoughtful and detailed response.

I didn’t mean the question to sound critical, I know there’s a limit to what can be tested. I’m looking at my deployment automation and I was curious to see if you thought this was such a corner case it wasn’t worth putting effort into, or if it had exposed a gap you need to close.

1 Like

Hi folks. Just calling out a potential wrinkle (though I imagine it’ll work itself out with the next update if it’s not already intentional).

Was in the impacted but ideal state (stable channel, active installation on 202605270119, backup on 202605181246). Rebooted into the backup installation which ran the update to 202605311846. Notably, threw two errors in the process:

ERROR: Failed to parse existing state: no changes will be written to disk
WARN: Failed to parse state field 'Applications.Incus.State.FriendlyVersion', skipping

... snip ... (usual update output)

ERROR: Refusing to save state because we previously failed to properly load the existing state

Rebooted into 202605311846 but now logging:

WARN: Successfully downloaded application update, but not auto-updating while running from backup image.

Versions info:

$ incus admin os show
WARNING: The IncusOS API and configuration is subject to change

environment:
  hostname: [REDACT]
  machine_id: [REDACT]
  os_name: IncusOS
  os_version: "202605311846"
  os_version_next: "202605270119"
  system_is_ready: true
  uptime: 120

$ incus admin os application show incus
config:
  lxcfs:
    cpu_shares: false
    load_average: false
state:
  available_versions:
    - "202605272150"
    - "202605311846"
  friendly_version: 7.0.0 [202605272150]
  initialized: true
  version: "202605272150"

Attempting to switch versions results in:

$ incus admin os application switch-version incus
Are you sure you want to change the running version of the application? (yes/no) [default=no]: yes
Error: cannot rollback application as no earlier version is available locally

Editing to add: my installation is not mission critical, runs on somewhat exotic hardware, and is fully reproducible (iac, custom tpm root cert, etc). I’d be happy to opt into debug telemetry if it would help with any early detection or what have you. I know you have a super robust setup but figured I’d offer. Least I could do to give back in some minuscule way to a lovely set of projects.

1 Like

I had a similar thought as @stgraber regarding how our development (the local) provider is used. When getting started we didn’t have a nice way to run a locally-hosted fully-featured image server, but the pieces do exist now. We’re planning to rename the local provider to debug; it will still see some usage, but most development work and testing will transition to relying on a locally-hosted image server which will exercise the same codepaths as IncusOS systems receiving updates from the public image server. This would have caught the initial update bug last week, and will make it much more likely that any similar issue is noticed sooner in the future.

No worries, not taken in a negative manner. We’re always trying to make IncusOS better, and appreciate feedback/questions from the community. :slightly_smiling_face:

1 Like

So, I think you hit an edge case here, where the prior-installed version of IncusOS on your system didn’t recognize the FriendlyVersion state field, and therefore refused to save its state. (This safeguard is in place to prevent an older version of IncusOS accidentally corrupting the on-disk state that’s using new fields/formats it doesn’t recognize.) Because the state wasn’t saved, when the update to 202605311846 was applied, the system didn’t record it in the os_version_next field which still reports 202605270119. IncusOS uses that field to detect when it reboots into an older backup version, resulting in that warning you’re seeing.

The Incus application reports itself as version as 202605272150, the oldest version available on your system which is why it cannot automatically rollback any further. But you can switch to the new version by running incus admin os application switch-version incus -d '{"version":"202605311846"}'.

When the next stable release of IncusOS is published, your system should pick it up automatically, and the lingering reference to 202605270119 in os_version_next will be replaced.

As a general statement, we purposefully don’t have any sort of remote telemetry API in IncusOS. Some metadata is reported when Incus (the application) refreshes/fetches container/VM images from the public image server, but other than that there’s no ability for an IncusOS system to “phone home”. Privacy is important. :slightly_smiling_face:

1 Like

Thanks for the info—figured it would self-heal. Glad to know!

100%. I agree and respect that decision-making. Frankly, I personally opt out of telemetry in nearly every case. I only mentioned here because the project maintainership (namely, you and Stéphane) have been nothing but respectful and demonstrating of character so wanted to extend the single data point of “I would” for IncusOS (if helpful). But I completely get it and appreciate it.

2 Likes

I had a cluster of 4 nodes running 202605272150 that was affected by the issue. Unfortunately, the backup version 202605270119 was also affected, so I had to fix the problem using the emergency procedure, which worked well.

However, before that, I thought I could just evacuate and remove nodes one by one from the cluster and apply factory reset. Unfortunately, this did not work because due to mismatch between the installed applications I was not able to join the node after the factory reset back to the cluster.

Also, this command did not show me the previous version:

With -b 1 I did not obtain any matches, and with -b 2 I got some very old version:

[2026/04/07 21:53:51 CEST] incus-osd: 2026-04-07 21:53:51 INFO System is ready version=202604070253

Probably that was the version when I installed IncusOS for the first time on this node. Interestingly incrementing -b 3, -b 4, etc., gave me newer versions, not older.

Interestingly incrementing -b 3, -b 4, etc., gave me newer versions, not older.

I ran into this problem with an earlier issue. If you put a hyphen before the number ( -b -3 , -b -4 ) you’ll get the expected results:

incus admin os debug log -b -0 -u incus-osd | grep “System is ready”

and

incus admin os debug log -b -1 -u incus-osd | grep “System is ready”

1 Like

Ah, yes, sorry that was a typo. IncusOS passes that argument directly to journalctl’s --boot argument, so a negative value is correct to look at the most recent boots while a positive value will start with the oldest boots.