Incus 0.5.1 has been released

stgraber · January 29, 2024, 5:00am

Introduction

The Incus team is pleased to announce the release of Incus 0.5.1!

This is an unusual release as we normally do not issue point releases on top of the monthly feature releases. But we felt this was needed this time due to some pretty important bugfixes and a minor feature addition needed to accommodate those running CentOS/Alma/Rocky virtual machines.

Most changes are on the server side, so if you’re only using the command line client, there is no strong reason to upgrade from 0.5 to 0.5.1.

As usual, you can try it for yourself online: Linux Containers - Incus - Try it online

Enjoy!

Highlights

Alternative way to get the VM agent

With Incus 0.5, the distribution mechanism for the Incus VM agent changed a bit.
In the past, we had a single share named config which would include both the instance-specific agent configuration and the incus-agent binary.

This was a bit wasteful, requiring a copy of the 15-20MB large incus-agent for every VM but was still somewhat manageable. This share was also exposed as both 9p and virtiofs. Leading to two processes running on the host system for every Incus VM.

With support for multiple agent binaries, copying them for every VM really wasn’t an option anymore, so a separate share was introduced just for the binaries. As we really didn’t want to end up with another two processes running on the host per VM, we made the decision to only make those internal shares be available over 9p.

Testing on a variety of images, including CentOS 7 showed that this would be fine.
9p is lower performance than virtiofs but as those shares are only use for a couple of seconds on every VM boot, that really wasn’t a concern. User defined shares would still be exposed over virtiofs so those would still get the high performance option.

What we failed to notice is that for some reason, CentOS 8-Stream, CentOS 9-Stream and other distributions that are derivatives of RHEL 8/9, do not ship the 9p kernel driver at all…

This means that those instances no longer had a way to fetch an agent, leading to broken incus exec and incus file.

We still don’t feel like running 4 host processes for every single Incus VM just to make things work on those few images. Instead, what we’re introducing with Incus 0.5.1 is a new agent drive, effectively an extra disk which can be attached to those specific VMs, providing those files through what looks like a CD-ROM drive rather than being retrieved over a networked filesystem.

So to run CentOS 9-Stream, one now needs to do:

incus create images:centos/9-Stream centos --vm
incus config device add centos agent disk source=agent:config
incus start centos

If you run many such VMs, a better option is likely by creating a profile for it:

incus profile create vm-agent
incus profile device add vm-agent agent disk source=agent:config

At which point you can do:

incus launch images:centos/9-Stream centos --vm -p default -p vm-agent

This is obviously not ideal and adds a few more steps when creating VMs for those distributions but this new mechanism now offers a way to get the agent up and running in just about any environment.

NOTE: We’re not considering always providing that extra device as it takes some resources to generate the cdrom device and uses some extra disk on the host. So it’s best added only when needed.

Fixed handling of stopped instances during evacuation

A bug introduced with Incus 0.5 was causing stopped instances to get relocated to other systems during evacuation, even if the instance was configured to remain where it was.

This has now been corrected and instances using stopped, force-stop or stateful-stop are now guaranteed to remain on their current server.

Database performance fixes

Database improvements in Incus 0.5 accidentally caused some nested database transactions to occur when fetching network information details for a large number of instances.

This would only really become visible when using an Incus cluster that also serves DNS zones and has its metrics scraped by Prometheus. This combination would cause large spikes in API requests every 15s or so, which would then start triggering timeouts and retries, eventually leading to other API requests piling up and timing out.

The logic has now been changed to remove such nested transactions and further optimizations were also made to save some database interactions during very command API interactions like executing commands instance of instances.

Complete changelog

Here is a complete list of all changes in this release:

Full commit list

Translated using Weblate (German)
Translated using Weblate (Dutch)
incus/action: Fix resume
Translated using Weblate (Japanese)
Translated using Weblate (Japanese)
Translated using Weblate (Japanese)
doc: Remove net_prio
incusd/cgroup: Fully remove net_prio
incusd/warningtype: Remove net_prio
incusd/cgroup: Look for full cgroup controllers list at the root
incusd/dns: Serialize DNS queries
incusd/network: Optimize UsedByInstanceDevices
incusd/backups: Simplify missing backup errors
tests: Update for current backup errors
incusd/cluster: Optimize ConnectIfInstanceIsRemote
incusd/instance/qemu/agent-loader: Fix to work with busybox
doc/installing.md: add a gentoo-wiki link under Gentoo section
Translated using Weblate (French)
Translated using Weblate (Dutch)
incusd/device/disk: Better cleanup cloud-init ISO
incusd/instance/qemu/qmp: Add Eject command
incusd/instance/qemu/qmp: Handle eject requests
api: agent_config_drive
doc/devices/disk: Add agent:config drive
incusd/device/disk: Add agent config drive
incusd/project: Add support for agent config drive
incusd/instance/qemu/agent-loader: Handle agent drive
incusd/db/warningtype: gofmt
incusd/loki: Sort lifecycle context keys
incusd/instance/qemu/agent-loader: Don’t hardcode paths
incusd/cluster: Fix evacuation of stopped instances

Documentation

The Incus documentation can be found at:

Packages

There are no official Incus packages as Incus upstream only releases regular release tarballs. Below are some available options to get Incus up and running.

Installing the Incus server on Linux

Incus is available for most common Linux distributions. You’ll find detailed installation instructions in our documentation.

Homebrew package for the Incus client

The client tool is available through HomeBrew for both Linux and MacOS.

Chocolatey package for the Incus client

The client tool is available through Chocolatey for Windows users.

Winget package for the Incus client

The client tool is also available through Winget for Windows users.

Support

At this early stage, each Incus release will only be supported up until the next release comes out. This will change in a few months as we are planning an LTS release to coincide with the LTS releases of LXC and LXCFS.

Community support is provided at: https://discuss.linuxcontainers.org
Commercial support is available through: Zabbly - Incus services
Bugs can be reported at: Issues · lxc/incus · GitHub

RandomUser · January 30, 2024, 2:48am

Guys, what t.f. is going on with your quality control? How could you add inexistent package sgdisk as dependecy to incus-base and mark it stable??

stgraber · January 30, 2024, 3:05am

Sorry, the two past sleepless nights sorting out some of the issues with Incus 0.5 and releasing 0.5.1 made it so that I somehow ended up with pending packaging changes going through stable faster than through daily (where all the testing and validating is normally done).

A corrected build for this has been in progress ever since the publication actually happened and should be live in the repository in the next two hours. Can’t fix this any faster unfortunately.

RandomUser · January 30, 2024, 6:03am

That’s all right, take more rest, @stgraber !

RandomUser · January 30, 2024, 6:24am

But it looks that version 0.5x is still buggy. One of my virtual machines got completely unreachable from incus shell, its IPs are also not listed on incus ls - but it’s running inside fine and it has all IPs it had before up and running. It’s a Debian sid / cloud. The problem is that I haven’t started sshd in there as I did not need it before so now I have a sort of a ghost VM

I tried to add incus config device add debian agent disk source=agent:config to be sure it’s not the reason but it did not help.

simos · January 30, 2024, 7:12am

incus shell is an alias that execs su -l. What exactly do you get by running incus exec myinstance -- /bin/sh?

RandomUser · January 30, 2024, 7:19am

Sorry, I forgot about main thing indeed, it’s what I get:

Error: VM agent isn't currently running

The same VM was working fine on incus 0.4x

jdstrand · January 30, 2024, 1:50pm

In case it is helpful to others, I saw this too with one VM but not another. The one that wasn’t working was imported from a qemu image, Ubuntu 20.04 and had lxd-agent-loader installed. I:

removed lxd-agent-loader
ran incus config device add <name> agent disk source=agent:config on the incus host
ran sudo mount -t 9p config /mnt in the guest
ran sudo sh -c 'cd /mnt ; ./install.sh' in the guest
rebooted the guest

Now the IP address shows up in incus list and incus shell works again. Once it was working, I removed the agent:config with incus config device remove <name> agent.

stgraber · January 30, 2024, 4:27pm

I’ve managed to reproduce the lxd-agent-loader issue, it’s something we can fix on our side so I’m testing a fix now.

It’s basically an issue with our logic to pretend to be lxd-agent which worked fine until we also added support for multiple agent binaries. It causes the correct agent to be copied at the wrong location and causes a permission issue.

stgraber · January 30, 2024, 4:33pm

stgraber · January 30, 2024, 4:33pm

I just tested that one on an Incus VM which I’ve then manually converted back to the lxd-agent-loader script. I managed to reproduce the behavior described by @jdstrand and confirm that the fix above makes things behave again.

stgraber · January 30, 2024, 4:34pm

I expect to have this reviewed and merged in the next couple of hours with an update hitting the stable package in the Zabbly repo in the next 4-5 hours.

stgraber · January 30, 2024, 5:21pm

Package is building now, should be rolled out in the next 2 hours.

RandomUser · January 31, 2024, 3:28am

I can confirm it fixed access to my debian/sid VM - thank you!

RandomUser · February 1, 2024, 2:47am

But something is still wrong. Load Average on the server went from usual 6.0 to 14.0 after a longer while from boot. I am unable to identify the reason, the only change was Incus 0.4x–>0.5x

now on 0.5.1-202401301903-debian12

Edit: maybe it’s related? i just noticed I’m unable to access one of containers, it’s also a debian sid, not VM this time. it was also converted to incus from lxd.

After incus exec d10 -- /bin/sh prompt just hangs and it’s even impossible to “ctrl-c” it.

this container also doesn’t show its IP on the incus ls list.

stgraber · February 1, 2024, 10:09am

Hmm, try incus restart -f d10

If that also gets stuck, then look at sudo dmesg as that may be the sign of a kernel issue.

RandomUser · February 1, 2024, 10:20am

the funny thing it helped and load went to normal values, but i tried to restart the server and after booting the situation repeats, until i incus restart -f d10 it manually.

but it’s progress, thanks, i will try to analyse logs and find out the reason now when i have a working server back. it never happened before 0.5x though.

there is nothing in dmesg btw.

stgraber · February 1, 2024, 10:30am

This kind of behavior can happen if you have some resource limits applied to the container and the container is currently exceeding them. That effectively prevents Incus from starting any new task in there during exec and can similarly cause a high load average as new tasks can’t be scheduled inside the container either.

As a reminder the load average is just the current number of processes wanting to be scheduled.
When resource limits are in place, it can go to extremely high values simply because you have containers that would like to schedule work but aren’t allowed to due to limits.

RandomUser · February 1, 2024, 10:46am

Thanks for the hint, there are indeed resource limits applied to that container. I did not expect it might result in such a behaviour, it’s very important advice!