Wait for incus agent

candlerb · February 24, 2025, 12:12pm

$ incus start outer   # this was created with --vm
$ incus shell outer
Error while executing alias expansion: incus exec outer -- su -l
Error: VM agent isn't currently running

I have to keep repeating the command until it works.

It would be really nice to have a way to wait until the incus agent is running; perhaps a --wait flag to incus start and incus launch. Or am I missing some feature that already exists? I’m using 6.0.3 LTS

Cheers,

Brian.

Andrew_Wilson · February 24, 2025, 3:02pm

I second this. I actually use a little script to achieve this, but --wait would be MUCH better.

Here’s me starting a stopped vm, and gaining a shell. Once you exit, it takes you back to your prompt as usual. Simple. Not elegant. But it stops me guessing and spamming my terminal when I am impatient.

andrew@Yoda:~$ incus list vm
+------+---------+------+------+-----------------+-----------+
| NAME |  STATE  | IPV4 | IPV6 |      TYPE       | SNAPSHOTS |
+------+---------+------+------+-----------------+-----------+
| vm   | STOPPED |      |      | VIRTUAL-MACHINE | 0         |
+------+---------+------+------+-----------------+-----------+
andrew@Yoda:~$ launch vm
vm
Waiting for incus agent to become active...

root@vm:~# #User your shell as normal
root@vm:~# exit
logout

Here’s my simple script:

#!/bin/bash

incus start "$1"
echo "Waiting for incus agent to become active..."
running=false

while ! $running; do
	incus exec $1  -- echo  2>/dev/null
	if [ $? -eq 0 ]; then
		sleep 1
		break;
	else
		sleep 0.5
	fi
done

incus shell "$1"

The loop checks for the shell access every half a second and won’t quit until it works.

V/R

Andrew

stgraber · February 24, 2025, 3:47pm

I think I mentioned this a few times on the forum as this is a pretty common question, but basically Incus itself has no idea whether the agent is running or not.

There is no persistent connection going on between the agent and Incus as otherwise we’d have some scalability issues when systems run thousands or tens of thousands of VMs as that would lead to a LOT of go routines, open files, …

So basically whenever you ask for data that involves the Incus agent, Incus tries to connect to it at that point, but it otherwise doesn’t touch the agent and so can’t get you a convenient “wait” type endpoint (other than have itself do the loop internally).

candlerb · February 24, 2025, 3:56pm

I think that’s what we’re asking for

Andrew_Wilson · February 24, 2025, 6:33pm

So because it’s raining in Florida today, I played at my terminal and modified my lazy script to be potentially more useful. This is it in action:

andrew@Yoda:~$ launch images:debian/12 vz --vm -c limits.cpu=2 -c limits.memory=4GiB --profile=br0
Waiting for incus agent to become active in Instance vz......................
root@vz:~# exit
logout

And with my limited testing it seems to pull the parameters over properly:

andrew@Yoda:~$ incus config show vz
architecture: x86_64
config:
  image.architecture: amd64
  image.description: Debian bookworm amd64 (20250224_05:24)
  image.os: Debian
  image.release: bookworm
  image.serial: "20250224_05:24"
  image.type: disk-kvm.img
  image.variant: default
  limits.cpu: "2"
  limits.memory: 4GiB
  volatile.base_image: 33459dcfec716681ec94e4c5068dba3c9163cb4735c508db3eed80e92b23cae6
  volatile.cloud-init.instance-id: ac541576-7316-413c-a403-4622dd48ffaa
  volatile.eth0.host_name: tap2adb79d0
  volatile.eth0.hwaddr: 00:16:3e:38:e7:4e
  volatile.last_state.power: RUNNING
  volatile.uuid: 77dacc82-031e-4b29-b356-b90e6c08fbfc
  volatile.uuid.generation: 77dacc82-031e-4b29-b356-b90e6c08fbfc
  volatile.vsock_id: "2730131791"
devices: {}
ephemeral: false
profiles:
- br0
stateful: false
description: ""

The script is shown below. You can save it as a bash script or even add it to your .bashrc file as a function (which is what I have done). It’s still not as a good as a --wait feature, but I think Stephane does a lot for us, so this is maybe at least somewhat useful in the interim and allows him to focus on the big ticket items.

#!/bin/bash
# No warranty!
#
output=$(incus launch $@)
instance=$(echo "$output" | grep -o "Launching.*" | cut -d " " -f 2) 

echo -n "Waiting for incus agent to become active in Instance $instance..."
running=false

while ! $running; do
        incus exec $instance  -- echo  2>/dev/null
        if [ $? -eq 0 ]; then
                break;
        else
                echo -n "."
                sleep 0.5
        fi
done

incus shell "$instance"

V/R

Andrew

tregubovav · March 5, 2025, 12:33am

I did read several similar request in this forum in past already. This means this problem is valued not for the single person and it probably needs more attention from the Incus team.

Inability to know the status of VM forces to prepare client based workarounds which increases client side complexity and decrease supportability.
However, adding agent-status object to /1.0/instances/{name}/state API call, incus-agent status output in incus info <instance> command and adding agent-status field into incus terraform resource "incus_instance" "<instance>"could help to solving such problems in more simple manner.

stgraber · March 5, 2025, 4:08am

The common way to get the agent status from /1.0/instances/NAME/state is to look for the processes field, if it’s > 0 then the agent is running.

That’s the same logic that’s used by the Terraform provider and others to check on the agent.

tregubovav · March 5, 2025, 6:09am

Thank you Stephane for the confirmation that processes field could be used as incus-agent ready with the RestAPI and cli calls.
However, terraform provider does not check communication with the incus-agent yet and report Running status when qemu reports that VM is running or when IP address is received by any interface if wait_for_network argument is true.

Steps to reproduce:

prepare configuration for terraform or OpenTofu

`main.tf`

terraform {
  required_providers {
    incus = {
      source = "lxc/incus"
      version = "0.2.0"
    }
  }
}

provider "incus" {
  # Configuration options
}

resource "incus_instance" "instance1" {
  name     = "instance1"
  image    = "images:alpine/edge/amd64"
  profiles = ["default"]
  type = "virtual-machine"
  wait_for_network ="true"

  config = {
    "security.secureboot" = "false"
    "user.access_interface" = "eth0"
  }
}

Apply configuration using terraform apply or tofu apply command. Configuration will be applied when IPv4 address is assigned to eth0 interface (I use default settings for incusbr0 network)
Disable incus-agent and incus-agent-services for startup in the VM using incus exec instance1 -- sh -c "rc-update del incus-agent; rc-update del incus-agent-setup and stop the VM after that. (Note: Access to the VM will be lost after reboot. Please set root password before reboot if you need access to VM using incus console instance1).
re-apply configuration using terraform apply or tofu apply command. Configuration will be applied when IPv4 address is assigned to eth0 interface (even incus-agent is not run).
Check whether incus-agent run or not using incus info instance1.

P.S.
Terraform now is less usable now than RestAPI or cli as it still not support exec configuration attribute (Allow executing commands following creation in incus_instance).

stgraber · March 5, 2025, 6:40am

running = true in Terraform uses isInstanceOperational which does check for processes > 0.

But anyway, the entire waiting mechanism in Terraform is being reworked now ahead of the 1.0 of the provider and will have the ability to specifically wait for the agent (as well as network on specific interfaces and more).

tregubovav · March 5, 2025, 6:36pm

I’m glad to hear you are heading to a stable release of terraform provider! That would be nice to get working features like:

commands execution in the containers/VMs
managing files in offline VMs and Storage Volumes (with ability to populate files and symlinks before instance starts)
etc.

Unfortunately, the incus_instance.<instance>.running resource always return true when VM started successfuly even though the incus-agent does not run or incusd can’t communicate with it.

Should I create bug for this?

stgraber · March 5, 2025, 7:16pm

Probably best to hold off until after this entire logic is replaced with the new wait_for syntax. @maveonair was saying he’s hoping to get the remaining bits sorted this weekend.