"No LXD socket found" in vm created from custom image

Hi,

We’ve created some images of our own from the public image (images:ubuntu/jammy/cloud). The only modification is we installed several apt pakcages. And then we “lxd publish” to create a local image.

However, when we’re trying to launch new vm from this image. The /var/log/cloud-init.log says that

2022-11-11 03:19:19,463 - __init__.py[DEBUG]: Looking for data source in: ['LXD', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM']
2022-11-11 03:19:19,466 - __init__.py[DEBUG]: Searching for local data source in: ['DataSourceLXD']
2022-11-11 03:19:19,466 - handlers.py[DEBUG]: start: init-local/search-LXD: searching for local data from DataSourceLXD
2022-11-11 03:19:19,466 - __init__.py[DEBUG]: Seeing if we can get any data from <class 'cloudinit.sources.DataSourceLXD.DataSourceLXD'>
2022-11-11 03:19:19,466 - __init__.py[DEBUG]: Update datasource metadata and network config due to events: boot-new-instance
2022-11-11 03:19:19,466 - DataSourceLXD.py[DEBUG]: Not an LXD datasource: No LXD socket found.
2022-11-11 03:19:19,466 - __init__.py[DEBUG]: Datasource DataSourceLXD not updated for events: boot-new-instance
2022-11-11 03:19:19,466 - handlers.py[DEBUG]: finish: init-local/search-LXD: SUCCESS: no local data found from DataSourceLXD
2022-11-11 03:19:19,466 - main.py[DEBUG]: No local datasource found

Seems the /dev/lxd/sock does not exist and thus our cloud-init config all got lost. This seems to happen randomly.

Additionally, when I “lxc exec” into the vm, the socket exists and is working,

root@testvm:~# curl -v --unix-socket /dev/lxd/sock http://x/1.0/config
*   Trying /dev/lxd/sock:0...
* Connected to x (/dev/lxd/sock) port 80 (#0)
> GET /1.0/config HTTP/1.1
> Host: x
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Fri, 11 Nov 2022 03:35:14 GMT
< Content-Length: 65
<
["/1.0/config/user.user-data","/1.0/config/user.network-config"]
* Connection #0 to host x left intact

Could you point some directions that we could look at?

This is most likely to be a race condition. We have some fixes around LXD agent races coming for LXD 5.8 (next week) which may help resolve this.

Thanks very much. In the meanwhile we’ll keep this thread updated for relevant information

See