LXD API does not reply to container creation request but creates the container

Hey,

I have a deploying issue on my platform, that sometimes, a deploy does not go through successful.
I am debugging it since a bit and it seems like I found the cause of it.

Basically, I first talk to LXD to create me a new network bridge, I do put each container on its own bridge, so each container can fully utilize its /64 if available.

This Bridge creation request, always goes through as successful, it never seem to fail.

“{“type”:“sync”,“status”:“Success”,“status_code”:200,“operation”:”",“error_code”:0,“error”:"",“metadata”:null}"

Works fine, everytime.
Now when it does come to the container, it sometimes, does not reply but gets the request.
But it seems to getting it and creating it, according to lxd.log

t=2020-12-28T14:58:45+0100 lvl=info msg=“Creating container” ephemeral=false name=lxc1d7279c8 project=default
t=2020-12-28T14:58:45+0100 lvl=info msg=“Created container” ephemeral=false name=lxc1d7279c8 project=default

However, the API response is none, LXD won’t reply.
I get HTTP Code 0, with nothing.

Despite in the background LXD is creating the container.
My Code does retry 8 times, all 8 times, I get the same HTTP Code 0 with no response.

What can be the cause of, that LXD gets the request, creates the container but does not reply?
It seems only to happen to container creation requests, everything else is fine.

If I repeat that again after the failed attempt, it works flawless.

Thanks.

I did increase the curl timeout + connectimeout lets see.
I report back when I know more.

But the creation request is async right? So it sould not wait that long actually.

When you send the request to create the instance it should return something like this with a 202 status code.

{
    "id": "db6cfdde-4a8e-4406-9f24-28d42b98bd05",
    "class": "task",
    "description": "Creating container",
    "created_at": "2020-12-28T16:41:03.18141142Z",
    "updated_at": "2020-12-28T16:41:03.18141142Z",
    "status": "Running",
    "status_code": 103,
    "resources": {
        "containers": [
            "\/1.0\/containers\/alpine2"
        ],
        "instances": [
            "\/1.0\/instances\/alpine2"
        ]
    },
    "metadata": null,
    "may_cancel": false,
    "err": "",
    "location": "none"
}

With that you can take the id which is a UUID and send that to: /operations/{$uuid}/wait to wait until the operation has completed and tell you wether it was successful or not.

Are you sure that it is LXD not giving you a response and not how you are handling the response? If you take the same data that you are sending then send via curl or use the command lxc query do you get the same problem?

I get no response in the first place, so no UUID.

Despite LXD is deploying in the background.
If I would get an UUID, my code would wait until the deployment is successful, which only takes a few seconds since the images are local available.

The Documentation says its ASYNC, means I should get a response from LXD with the UUID, and the process continues running in the background while I can use the UUID to track its progress.
But thats exactly the point that does not happen.

The curl works, otherwise It would not work when it failed and I tried again afterwards.
I suspect that either LXD takes to long to reply despite its an async task or LXD does not reply at all.
Maybe its a bug, I have no idea.

But yeah I get the point, I will update my debug report and attach the CURL to my debug email.

Send the exact same data using curl or lxc query and see if you get a response.

Extending the timeout does not seem to fix the problem at all, LXD just dosen’t reply.
I will post again when I tested the curl directly.

I found something interesting, I would call it a bug with lxc monitor.
If the deploy goes successful it looks like that:

location: none
metadata:
  context: {}
  level: dbug
  message: Responding to instance create
timestamp: "2020-12-30T23:56:08.40572247+01:00"
type: logging


location: none
metadata:
  context:
    name: 7af8acf9b1980662036dcc3b283a11d6af2f7f20e1f9c0f7854b6194218c0d0f
  level: dbug
  message: Found cert
timestamp: "2020-12-30T23:56:08.405659953+01:00"
type: logging


location: none
metadata:
  context: {}
  level: dbug
  message: Connecting to a remote simplestreams server
timestamp: "2020-12-30T23:56:08.405917645+01:00"
type: logging


location: none
metadata:
  context:
    ip: xx.xx.x.x.x:53808
    method: POST
    protocol: tls
    url: /1.0/instances
    username: 7af8acf9b1980662036dcc3b283a11d6af2f7f20e1f9c0f7854b6194218c0d0f
  level: dbug
  message: Handling
timestamp: "2020-12-30T23:56:08.405712031+01:00"
type: logging


location: none
metadata:
  context: {}
  level: dbug
  message: 'New task Operation: 7a07ee7d-11f9-45eb-b254-f16faeb86b0e'
timestamp: "2020-12-30T23:56:08.543208977+01:00"
type: logging

If it goes wrong:

location: none
metadata:
  context: {}
  level: dbug
  message: Responding to instance create
timestamp: "2020-12-30T23:44:08.299658966+01:00"
type: logging


location: none
metadata:
  context:
    name: 7af8acf9b1980662036dcc3b283a11d6af2f7f20e1f9c0f7854b6194218c0d0f
  level: dbug
  message: Found cert
timestamp: "2020-12-30T23:44:08.299594897+01:00"
type: logging


location: none
metadata:
  context:
    ip: x.x.x.x.x:53712
    method: POST
    protocol: tls
    url: /1.0/instances
    username: 7af8acf9b1980662036dcc3b283a11d6af2f7f20e1f9c0f7854b6194218c0d0f
  level: dbug
  message: Handling
timestamp: "2020-12-30T23:44:08.299648197+01:00"
type: logging


location: none
metadata:
  context: {}
  level: dbug
  message: Connecting to a remote simplestreams server
timestamp: "2020-12-30T23:44:08.300467169+01:00"
type: logging


location: none
metadata:
  context: {}
  level: dbug
  message: Responding to instance create
timestamp: "2020-12-30T23:44:29.965461087+01:00"
type: logging


location: none
metadata:
  context:
    ip: x.x.x.x.x:53712
    method: POST
    protocol: tls
    url: /1.0/instances
    username: 7af8acf9b1980662036dcc3b283a11d6af2f7f20e1f9c0f7854b6194218c0d0f
  level: dbug
  message: Handling
timestamp: "2020-12-30T23:44:29.965447662+01:00"
type: logging

Yes my application is trying to send the same request again, since it got no response.
But the interesting part is " message: 'New task Operation" does not show up in lxc monitor.

Any idea why? Could explain the bug.

That would be consistent with that POST /1.0/instances having immediately failed with an error, in such cases no background operations have been created yet so the New task Operation line would be missing.

But if it fails with an error, why does LXD not reply with one?

Have you checked the key err ?

Sometimes it responds with an error err string with no HTTP status code or explicit failure outside the err key being set with some string

I get a timeout, means no http code, no response just plain nothing.
I would be happy if I would get an error message.

LXD does not have any timeout logic in its request handling code, so what’s timing out?

My curl request to the LXD API times out without any response.
I have it currently set to 12s, I tried it setting it to 45s which makes no sense on ASYNC but I tried it anyway, same result.

Show your POST request (I.E the params) what library are you using?

This is suggesting that your POST may not be valid JSON or doesn’t have the JSON content-type set, causing LXD to think you’re pushing a file (as would be the case for a backup being restored) and so just sitting there waiting for you to send more.

@turtle0x1

Basically PHP + curl request with.

curl_setopt($request, CURLOPT_URL,$url);
curl_setopt($request, CURLOPT_RETURNTRANSFER, true);
curl_setopt($request, CURLOPT_SSLKEY, $this->key);
curl_setopt($request, CURLOPT_SSLCERT, $this->cert);
curl_setopt($request, CURLOPT_CONNECTTIMEOUT ,12);
curl_setopt($request, CURLOPT_TIMEOUT, 12);
curl_setopt($request, CURLOPT_POST, true);
curl_setopt($request, CURLOPT_POSTFIELDS, $payload);

I tried it with as @stgraber asked

curl_setopt($request, CURLOPT_HTTPHEADER, array(‘Content-Type:application/json’));

Does not seem to make any difference.

My JSON request looks like this:

However, I cannot explain these errors with invalid json, since it does randomly work, it fails, I try again it works. The JSON looks valid to me.

Network creation always works, just the container fails.

DBUG[12-31|00:46:02] Handling                                 ip=x.x.x.x:55146 method=POST protocol=tls url=/1.0/networks username=xxx
DBUG[12-31|00:46:02] Found cert                               name=xxx
DBUG[12-31|00:46:02] Create                                   clientType=normal config="map[ipv4.address:x.0.x.x/28 ipv4.nat:true ipv6.address:2602:x:x:x:x::1/70]" driver=bridge network=vmbr10
DBUG[12-31|00:46:02] Start                                    driver=bridge network=vmbr10
DBUG[12-31|00:46:02] Setting up network                       network=vmbr10 driver=bridge
DBUG[12-31|00:46:02] Scheduler: network: vmbr10 has been added: updating network priorities 
DBUG[12-31|00:46:02] Stable MAC generated                     network=vmbr10 seed=1372102e9b09d9f5dcfccca936a65497184c69ab8c8cb04c3387ffb88b1aeefb.0.67 driver=bridge hwAddr=00:16:3e:70:39:d0
WARN[12-31|00:46:02] IPv6 networks with a prefix larger than 64 aren't properly supported by dnsmasq driver=bridge network=vmbr10
DBUG[12-31|00:46:08] Handling                                 url=/1.0/instances username=xxx ip=x.x.x.x:55150 method=POST protocol=tls
DBUG[12-31|00:46:08] Found cert                               name=xxx
DBUG[12-31|00:46:08] Responding to instance create 
DBUG[12-31|00:46:08] Connecting to a remote simplestreams server 
DBUG[12-31|00:46:28] Responding to instance create 
DBUG[12-31|00:46:28] Found cert                               name=xxx
DBUG[12-31|00:46:28] Handling                                 ip=x.x.x.x:55150 method=POST protocol=tls url=/1.0/instances username=xxx
DBUG[12-31|00:46:47] Responding to instance create 
DBUG[12-31|00:46:47] Found cert                               name=xxx
DBUG[12-31|00:46:47] Handling                                 username=xxx ip=x.x.x.x:55150 method=POST protocol=tls url=/1.0/instances
DBUG[12-31|00:46:51] New task Operation: 00ccdd08-f336-495b-9d0d-a5eef01d67fd 
DBUG[12-31|00:46:51] Started task operation: 00ccdd08-f336-495b-9d0d-a5eef01d67fd 
DBUG[12-31|00:46:51] Connecting to a remote simplestreams server 
DBUG[12-31|00:46:51] Image "xxx" already exists in the DB 
INFO[12-31|00:46:51] Creating container                       ephemeral=false name=lxcae88eaba project=default
INFO[12-31|00:46:51] Created container                        name=lxcae88eaba project=default ephemeral=false
DBUG[12-31|00:46:51] CreateInstanceFromImage started          driver=lvm instance=lxcae88eaba pool=primary project=default
DBUG[12-31|00:46:51] EnsureImage started                      driver=lvm fingerprint=xxx pool=primary
DBUG[12-31|00:46:51] EnsureImage finished                     driver=lvm fingerprint=xxx pool=primary
DBUG[12-31|00:46:51] Logical volume snapshot created          src_dev=/dev/primary/images_xxx thin=true vg_name=primary driver=lvm lv_name=containers_lxcae88eaba pool=primary
DBUG[12-31|00:46:51] Activated logical volume                 dev=/dev/primary/containers_lxcae88eaba driver=lvm pool=primary
DBUG[12-31|00:46:51] Mounted logical volume                   driver=lvm options=discard path=/var/snap/lxd/common/lxd/storage-pools/primary/containers/lxcae88eaba pool=primary dev=/dev/primary/containers_lxcae88eaba
DBUG[12-31|00:46:51] Unmounted logical volume                 path=/var/snap/lxd/common/lxd/storage-pools/primary/containers/lxcae88eaba pool=primary driver=lvm keepBlockDev=false
DBUG[12-31|00:46:51] Deactivated logical volume               dev=/dev/primary/containers_lxcae88eaba driver=lvm pool=primary
DBUG[12-31|00:46:51] Activated logical volume                 dev=/dev/primary/containers_lxcae88eaba driver=lvm pool=primary
DBUG[12-31|00:46:52] Logical volume filesystem shrunk         dev=/dev/primary/containers_lxcae88eaba driver=lvm pool=primary size=2560000000b
DBUG[12-31|00:46:53] Logical volume resized                   driver=lvm pool=primary size=2560000000b dev=/dev/primary/containers_lxcae88eaba
DBUG[12-31|00:46:53] Deactivated logical volume               dev=/dev/primary/containers_lxcae88eaba driver=lvm pool=primary
DBUG[12-31|00:46:53] CreateInstanceFromImage finished         driver=lvm instance=lxcae88eaba pool=primary project=default
DBUG[12-31|00:46:53] SetInstanceQuota started                 driver=lvm instance=lxcae88eaba pool=primary project=default
DBUG[12-31|00:46:53] UpdateInstanceBackupFile started         pool=primary project=default driver=lvm instance=lxcae88eaba
DBUG[12-31|00:46:53] SetInstanceQuota finished                driver=lvm instance=lxcae88eaba pool=primary project=default
DBUG[12-31|00:46:53] Activated logical volume                 dev=/dev/primary/containers_lxcae88eaba driver=lvm pool=primary
DBUG[12-31|00:46:53] Mounted logical volume                   dev=/dev/primary/containers_lxcae88eaba driver=lvm options=discard path=/var/snap/lxd/common/lxd/storage-pools/primary/containers/lxcae88eaba pool=primary
DBUG[12-31|00:46:53] Unmounted logical volume                 path=/var/snap/lxd/common/lxd/storage-pools/primary/containers/lxcae88eaba pool=primary driver=lvm keepBlockDev=false
DBUG[12-31|00:46:53] Success for task operation: 00ccdd08-f336-495b-9d0d-a5eef01d67fd 
DBUG[12-31|00:46:53] Deactivated logical volume               pool=primary dev=/dev/primary/containers_lxcae88eaba driver=lvm
DBUG[12-31|00:46:53] UpdateInstanceBackupFile finished        instance=lxcae88eaba pool=primary project=default driver=lvm
DBUG[12-31|00:47:04] Responding to instance create 
DBUG[12-31|00:47:04] Handling                                 ip=x.x.x.x:55150 method=POST protocol=tls url=/1.0/instances username=xxx
DBUG[12-31|00:47:04] Found cert                               name=xxx
DBUG[12-31|00:47:25] Responding to instance create 
DBUG[12-31|00:47:25] Found cert                               name=xxx
DBUG[12-31|00:47:25] Handling                                 ip=x.x.x.x:55150 method=POST protocol=tls url=/1.0/instances username=xxx
DBUG[12-31|00:47:47] Found cert                               name=xxx
DBUG[12-31|00:47:47] Responding to instance create 
DBUG[12-31|00:47:47] Handling                                 ip=x.x.x.x:55150 method=POST protocol=tls url=/1.0/instances username=xxx