Hosting Incus for Automated Test Runner

kimbo · February 25, 2025, 12:14pm

I’ve been using Incus (previously LXD) for about six years as a platform for automated testing of various Linux software systems. Initially, I needed to host a test system comprising numerous CentOS 7 machines, three different database servers, and the ability to quickly spin them up for each test suite and tear them down afterward. LXD was a perfect fit.

I originally developed everything to run on my local Dell Optiplex PC, running Ubuntu 20.04 and the LXD Snap (I’ve since upgraded to Debian 12 with kernels and Incus from Zabbly repos). To run my tests in CI, I asked our IT department for a similar machine hosted in the data center, which I could connect to Jenkins as a node. They provided an Ubuntu VM with eight virtual cores, hosted on an ESXi cluster with approximately 116 other VMs. For a while, this mostly worked; the tests took slightly longer in CI, but that was acceptable.

However, the testing scope expanded to include extremely CPU-intensive tests and other tests that are both CPU and I/O intensive. These now take almost twice as long to run in CI on the VM. We tolerated this, sometimes waiting up to five days for a test cycle to complete.

Now, the testing scope has increased again, and I need to test Windows application software that interacts with a Linux-hosted backend. On my Optiplex, I’ve developed this using Incus to spin up Windows VMs and Linux containers for each test, and it works well. But as soon as I tried running it in CI on the VM, I encountered strange issues with Windows. I can’t seem to complete the Windows installation and image creation on the VMs; it frequently times out and reboots. So, I create the Windows images on my Optiplex and copy them to the VMs—not ideal. Furthermore, when the Windows VMs run on the Ubuntu VM (nested virtualization), performance is much slower, and other test components time out, producing errors I don’t get on my PC.

It seems we’ve reached the limit with the slow Incus VM on an overallocated hypervisor, especially with nested virtualization now involved.

I’m looking for suggestions on what to request from IT? Does anyone else run similar workloads that only run for a few hours at a time, requiring significant performance during those periods but remaining idle the rest of the time? A bare metal Incus machine would be magic, but I don’t think they will agree to that.

Perhaps there is a way to host Incus on a cloud providers VM rather then our own on-prem, and be able to turn on a very powerful VM only when a test needs to be run, and then turn it off the rest of the time. But I guess at that point, I don’t really need Incus anymore and could use something like Terraform.

Maybe just connecting my Optiplex PC to Jenkins as a node is the best solution.

jarrodu · February 25, 2025, 4:06pm

It sounds like you could use AWS EC2 Spot instances. As you mentioned, at that point you don’t really need Incus. But there is at least one advantage to still using it. You can develop your images locally then push them to an Incus server running on EC2. You don’t need a cluster. You just need a way to push the images and then run them.

osch · February 26, 2025, 2:07am

It would be also worth looking at hosting providers where you can get a root (bare metal) server for a decent price. I’m aware of a few block posts where community members have tested on Hetzner Cloud and others. On the long run they are cheaper compared to the big ones or offer better hardware.

Running Incus on a virtualised VM as you described will always suffer in particular if you need peak resources for your tests.

kimbo · March 1, 2025, 4:29am

Thanks for those suggestions. Spot instances I was not aware of, nor had I heard of bare metal cloud providers. I will be looking into both.