"Failed transferring image" when using remote image in a cluster

Hi LXD users,

I’m running a remote LXD server as my image server. Let’s call it ‘img-sever’.

And my usage of the img-server is in a cluster of 20 lxd machines.

When I’m trying to launch an image from the remote server, sometimes it will give me this error.

$ lxc launch img-server:ubuntu-2204-empty --vm apt-test -s remote
Creating apt-test
Error: Failed instance creation: Failed transferring image "f59c49298bf34eaaf3222da5868facfbdf0bf2e39174e528b8de936dc1275dcf" from "<lxd-9>:8443": open /var/snap/lxd/common/lxd/images/f59c49298bf34eaaf3222da5868facfbdf0bf2e39174e528b8de936dc1275dcf: no such file or directory

Seems it’s trying to copy the image from one of the cluster member, lxd-9, who does not have the image locally.

In fact, this image is available on 3 of my 20 lxd cluster members. But not this one

lxd-8
lxd-19 (current database leader)
lxd-15

Also the lxd-9 does not serve any special functionality within the cluster. I’m wondering how can I fix this?

My current workaround is to manually copy the image file to the designated cluster member. Clearly it’s not a good option.

Relevant info:
LXD version: 5.11
lxc image info img-server:ubuntu-2204-empty: this is an image we built ourselves

Fingerprint: f59c49298bf34eaaf3222da5868facfbdf0bf2e39174e528b8de936dc1275dcf
Size: 354.32MB
Architecture: x86_64
Type: virtual-machine
Public: no
Timestamps:
    Created: 2023/02/20 06:25 UTC
    Uploaded: 2023/02/20 06:59 UTC
    Expires: 2022/12/27 07:48 UTC
    Last used: never
Properties:
   ...
Aliases:
    - ubuntu-2204-empty
Cached: no
Auto update: disabled
Profiles:
    - default

Please can you show the output of:

sudo lxd sql global 'SELECT nodes.address FROM nodes LEFT JOIN images_nodes ON images_nodes.node_id = nodes.id LEFT JOIN images ON images_nodes.image_id = images.id WHERE images.fingerprint = "f59c49298bf34eaaf3222da5868facfbdf0bf2e39174e528b8de936dc1275dcf"'

It shows

+-----------------+
|     address     |
+-----------------+
| lxd-04:8443 |
| lxd-09:8443 |
| lxd-18:8443 |
+-----------------+

If you delete this row using lxd sql global 'DELETE FROM images_nodes where ID = x that should fix it. Where x is the ID of the image reference for lxd-09 node.

You can get the ID to use by running a modified query:

sudo lxd sql global 'SELECT nodes.address,  images_nodes.id FROM nodes LEFT JOIN images_nodes ON images_nodes.node_id = nodes.id LEFT JOIN images ON images_nodes.image_id = images.id WHERE images.fingerprint = "f59c49298bf34eaaf3222da5868facfbdf0bf2e39174e528b8de936dc1275dcf"'