"Failed transferring image" when using remote image in a cluster

Hi LXD users,

I’m running a remote LXD server as my image server. Let’s call it ‘img-sever’.

And my usage of the img-server is in a cluster of 20 lxd machines.

When I’m trying to launch an image from the remote server, sometimes it will give me this error.

$ lxc launch img-server:ubuntu-2204-empty --vm apt-test -s remote
Creating apt-test
Error: Failed instance creation: Failed transferring image "f59c49298bf34eaaf3222da5868facfbdf0bf2e39174e528b8de936dc1275dcf" from "<lxd-9>:8443": open /var/snap/lxd/common/lxd/images/f59c49298bf34eaaf3222da5868facfbdf0bf2e39174e528b8de936dc1275dcf: no such file or directory

Seems it’s trying to copy the image from one of the cluster member, lxd-9, who does not have the image locally.

In fact, this image is available on 3 of my 20 lxd cluster members. But not this one

lxd-8
lxd-19 (current database leader)
lxd-15

Also the lxd-9 does not serve any special functionality within the cluster. I’m wondering how can I fix this?

My current workaround is to manually copy the image file to the designated cluster member. Clearly it’s not a good option.

Relevant info:
LXD version: 5.11
lxc image info img-server:ubuntu-2204-empty: this is an image we built ourselves

Fingerprint: f59c49298bf34eaaf3222da5868facfbdf0bf2e39174e528b8de936dc1275dcf
Size: 354.32MB
Architecture: x86_64
Type: virtual-machine
Public: no
Timestamps:
    Created: 2023/02/20 06:25 UTC
    Uploaded: 2023/02/20 06:59 UTC
    Expires: 2022/12/27 07:48 UTC
    Last used: never
Properties:
   ...
Aliases:
    - ubuntu-2204-empty
Cached: no
Auto update: disabled
Profiles:
    - default

Please can you show the output of:

sudo lxd sql global 'SELECT nodes.address FROM nodes LEFT JOIN images_nodes ON images_nodes.node_id = nodes.id LEFT JOIN images ON images_nodes.image_id = images.id WHERE images.fingerprint = "f59c49298bf34eaaf3222da5868facfbdf0bf2e39174e528b8de936dc1275dcf"'

It shows

+-----------------+
|     address     |
+-----------------+
| lxd-04:8443 |
| lxd-09:8443 |
| lxd-18:8443 |
+-----------------+

If you delete this row using lxd sql global 'DELETE FROM images_nodes where ID = x that should fix it. Where x is the ID of the image reference for lxd-09 node.

You can get the ID to use by running a modified query:

sudo lxd sql global 'SELECT nodes.address,  images_nodes.id FROM nodes LEFT JOIN images_nodes ON images_nodes.node_id = nodes.id LEFT JOIN images ON images_nodes.image_id = images.id WHERE images.fingerprint = "f59c49298bf34eaaf3222da5868facfbdf0bf2e39174e528b8de936dc1275dcf"'

Thanks! That seems to work.

So what is the possible cause of the problem? My usual workflow is to

  1. lxc publish an image from an instance
  2. lxc copy the image to the remote image server
  3. lxc init / launch the image directly from the remote server

This seems pretty normal but sometimes step 2 and 3 will fail with some errors.

1 Like