Cannot launch any container. It looks like the image download fails

I tried to run an ansible’s role I made a few years ago on a new dedicated server with Almalinux 9 to set up an LXD host. Unfortunately, it no longer works for some reason.

I installed LXD through snap. I’m able to init it, so it kinda works, but I cannot create any container.

When I run lxc launch images:almalinux/9/cloud c1 --debug (or any other image. I tried with Ubuntu also), I’m getting stuck on:

DEBUG  [2023-02-06T12:03:33+01:00] Sending request to LXD                        etag= method=GET url="http://unix.socket/1.0/operations/7a504103-7d41-4988-8a77-1e8ad5b93e4f"
DEBUG  [2023-02-06T12:03:33+01:00] Got response struct from LXD                 
DEBUG  [2023-02-06T12:03:33+01:00] 
        {
                "id": "7a504103-7d41-4988-8a77-1e8ad5b93e4f",
                "class": "task",
                "description": "Creating instance",
                "created_at": "2023-02-06T12:03:33.001889822+01:00",
                "updated_at": "2023-02-06T12:03:33.001889822+01:00",
                "status": "Running",
                "status_code": 103,
                "resources": {
                        "containers": [
                                "/1.0/containers/c1"
                        ],
                        "instances": [
                                "/1.0/instances/c1"
                        ]
                },
                "metadata": null,
                "may_cancel": false,
                "err": "",
                "location": "none"
        }

Container doesn’t get created, and it looks like lxd froze. What can I do to debug this? I don’t remember having such issues 2 years ago, when I was playing with lxd the last time. Could it be something almalinux related?

My ansible role is using lxd_container module which returns a peculiar error:

fatal: [hypervisor.***]: FAILED! => {"actions": [], "changed": false, "msg": "Failed to connect to LXD server \"https://images.linuxcontainers.org\": Failed to fetch https://us.lxd.images.canonical.com/1.0: 404 Not Found"}

If I change protocol in its configuration from lxd into simplestreams it also just hangs.

Does anyone have an idea what could be responsible for that?

Which version of lxd is this?

Please can you follow these steps to enable pprof and get the running go routine info?

Thanks

LXD/LXC version is 5.10. Thanks, I’ll take a look.

1 Like

This is what I got.

Does reloading lxd fix it for now?

sudo systemctl reload snap.lxd.daemon

I restarted few times and finally I got something working:

$ lxc launch images:almalinux/9/cloud c1
Creating c1
Retrieving image: rootfs: 4% (316.62kB/s)

And then, suddenly:

Error: Failed instance creation: read tcp 188.68.[redacted]:45646->91.189.91.124:443: read: connection reset by peer

Seems to be an issue either with network or repository. Either way, it would be nice to have a bit better feedback on the lxc/lxd side if there are such interruptions (I was waiting >15 minutes without any feedback when it was stuck on processing this previously). Well, I guess I will just try a bit later :slight_smile: . Thanks!

Yeah we added some shorter timeouts a while back but that didn’t go well on slower networks.

So we increased it again. I suspect we could co with a timeout on transfer rate rather than absolute time, but I dont recall Go providing this out of the box with its http client.

Seeing similar issues here. Have a VM running in Azure (US-East) trying to launch a new Debian 11 container. “Retrieving image: rootfs: 46% (245.21kB/s)”

Seems like the repository is very slow at the moment.

This is likely because all image traffic is currently being directed at the US mirror, see

Thanks Tom!