Howdy!
tl;dr: I think the first argument on this line should be the address of the node, not the name:
It seems the cluster resources cache at /var/cache/incus/resources/
is no longer being updated on my incus nodes, which seems to be causing issues with starting VMs that have migration.stateful: "true"
set. When starting the a VM I get this following message:
$ incus start homecluster:ipa1
Error: Failed to get resources for incus2: Get "https://incus2/1.0/resources": Unable to connect to: incus2:443 ([dial tcp 192.168.10.8:443: connect: connection refused])
Try `incus info --show-log homecluster:ipa1` for more info
With incus monitor --pretty
I see the following messages while starting a VM:
DEBUG [2025-05-16T23:12:26-05:00] [incus1] Starting device device=config instance=ipa1 instanceType=virtual-machine project=default type=disk
DEBUG [2025-05-16T23:12:26-05:00] [incus1] Connecting to a remote Incus over HTTPS url="https://incus2"
DEBUG [2025-05-16T23:12:26-05:00] [incus1] Sending request to Incus etag= method=GET url="https://incus2/1.0/resources"
DEBUG [2025-05-16T23:12:26-05:00] [incus1] Instance operation lock finished action=start err="Failed to get resources for incus2: Get \"https://incus2/1.0/resources\": Unable to connect to: incus2:443 ([dial tcp 192.168.10.8:443: connect: connection refused])" instance=ipa1 project=default reusable=false
DEBUG [2025-05-16T23:12:26-05:00] [incus1] Stopping device device=agent instance=ipa1 instanceType=virtual-machine project=default type=disk
This is weird since it’s not specifying a port number and going by the node name instead of IP in the API call. I see there was some recent re-work in this area as part of this PR Move cluster resource caching to point of consumption by janetkimmm · Pull Request #2072 · lxc/incus · GitHub and it looks like the first argument of cluster.Connect
was switched from the address to the node name in the process, which is probably the issue. I don’t see this PR included in the tagged 6.12 release, however it seems like it was cherry picked in the stable release for the deb package here: incus/.github/workflows/builds.yml at 317a13643fde465d464f5b4e72cd6c9ed9df8989 · zabbly/incus · GitHub
At the moment I’ve been able to work-around this issue by running touch /var/cache/incus/resources/*.yaml
on each of my nodes in the cluster or by setting migration.stateful: "false"
on the instance in question.