Cluster resources cache no longer being generated

testeddoughnut · May 17, 2025, 7:03am

Howdy!

tl;dr: I think the first argument on this line should be the address of the node, not the name:

internal/server/instance/drivers/util.go

8520c07c8


      
          	var res *api.Resources
          	if name == s.ServerName {
          		// Handle the local node.
          		// We still cache the data as it's not particularly cheap to get.
          		res, err = resources.GetResources()
          		if err != nil {
          			return nil, err
          		}
          	} else {
          		// Handle remote nodes.
          		client, err := cluster.Connect(name, s.Endpoints.NetworkCert(), s.ServerCert(), nil, true)
          		if err != nil {
          			return nil, err
          		}
          
          		res, err = client.GetServerResources()
          		if err != nil {
          			return nil, err
          		}
          	}

It seems the cluster resources cache at /var/cache/incus/resources/ is no longer being updated on my incus nodes, which seems to be causing issues with starting VMs that have migration.stateful: "true" set. When starting the a VM I get this following message:

$ incus start homecluster:ipa1
Error: Failed to get resources for incus2: Get "https://incus2/1.0/resources": Unable to connect to: incus2:443 ([dial tcp 192.168.10.8:443: connect: connection refused])
Try `incus info --show-log homecluster:ipa1` for more info

With incus monitor --pretty I see the following messages while starting a VM:

DEBUG  [2025-05-16T23:12:26-05:00] [incus1] Starting device                      device=config instance=ipa1 instanceType=virtual-machine project=default type=disk
DEBUG  [2025-05-16T23:12:26-05:00] [incus1] Connecting to a remote Incus over HTTPS  url="https://incus2"
DEBUG  [2025-05-16T23:12:26-05:00] [incus1] Sending request to Incus             etag= method=GET url="https://incus2/1.0/resources"
DEBUG  [2025-05-16T23:12:26-05:00] [incus1] Instance operation lock finished     action=start err="Failed to get resources for incus2: Get \"https://incus2/1.0/resources\": Unable to connect to: incus2:443 ([dial tcp 192.168.10.8:443: connect: connection refused])" instance=ipa1 project=default reusable=false
DEBUG  [2025-05-16T23:12:26-05:00] [incus1] Stopping device                      device=agent instance=ipa1 instanceType=virtual-machine project=default type=disk

This is weird since it’s not specifying a port number and going by the node name instead of IP in the API call. I see there was some recent re-work in this area as part of this PR Move cluster resource caching to point of consumption by janetkimmm · Pull Request #2072 · lxc/incus · GitHub and it looks like the first argument of cluster.Connect was switched from the address to the node name in the process, which is probably the issue. I don’t see this PR included in the tagged 6.12 release, however it seems like it was cherry picked in the stable release for the deb package here: incus/.github/workflows/builds.yml at 317a13643fde465d464f5b4e72cd6c9ed9df8989 · zabbly/incus · GitHub

At the moment I’ve been able to work-around this issue by running touch /var/cache/incus/resources/*.yaml on each of my nodes in the cluster or by setting migration.stateful: "false" on the instance in question.