LXD cluster - launch command creates container on different node - is it possible to disable this by default?

roman · July 8, 2020, 5:00pm

I have LXD cluster. When I create new container with lxd launch, containers are created automatically on a random node. I think by default LXD tries to load balance, or uses round robin or some other method.

I have two problems.

My infrastructure is small, and I want to manually control on which hosts I want to start container, i.e. my API server should be not share nodes with containers redis or static-web containers. I can avoid this with --target parameter.
But most of the time I will be creating containers with Ansible. Ansible plugin does not support “–target”. (At least, to best of my knowledge.)
That means, if I have cluster with 2 nodes in it, those nodes will take turn in hosting new container. That means my ansible will be failing every second time.

github.com/ansible/ansible

lxd_container: Contrainer creation task failure on remote LXD cluster node

opened 12:14AM - 07 Mar 19 UTC

closed 05:59PM - 17 Aug 20 UTC

ghost

cloud module support:community bug traceback affects_2.7 collection collection:community.general needs_collection_redirect bot_closed

##### SUMMARY On a LXD cluster configuration, an ansible task using the Ans…ible module lxd_container for container creation will end up in failure if the container is created on a remote LXD cluster node. The container is however created, but stopped. As a result, in the case of a 2 nodes cluster configuration, and knowing that the LXD cluster create the new containers on the node with the least amount of containers, the task will fail each 2 runs. ##### ISSUE TYPE - Bug Report ##### COMPONENT NAME lxd_container ##### ANSIBLE VERSION ``` $ ansible --version ansible 2.7.8 config file = /home/toto/.ansible.cfg configured module search path = [u'/home/toto/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Oct 30 2018, 23:45:53) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] ``` ##### CONFIGURATION ``` $ ansible-config dump --only-changed ANSIBLE_PIPELINING(/home/toto/.ansible.cfg) = True ANSIBLE_SSH_ARGS(/home/toto/.ansible.cfg) = -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no DEFAULT_BECOME(/home/toto/.ansible.cfg) = True DEFAULT_BECOME_METHOD(/home/toto/.ansible.cfg) = sudo DEFAULT_BECOME_USER(/home/toto/.ansible.cfg) = root DEFAULT_HOST_LIST(/home/toto/.ansible.cfg) = [u'/etc/ansible/hosts_xlab', u'/home/toto/ansible/hosts'] DEFAULT_LOG_PATH(/home/toto/.ansible.cfg) = /var/log/ansible.log DEFAULT_PRIVATE_KEY_FILE(/home/toto/.ansible.cfg) = /home/toto/.ssh/id_rsa_ansible DEFAULT_REMOTE_USER(/home/toto/.ansible.cfg) = ansible DEFAULT_ROLES_PATH(/home/toto/.ansible.cfg) = [u'/home/toto/ansible/roles', u'/home/toto/roles'] DEFAULT_VAULT_PASSWORD_FILE(env: ANSIBLE_VAULT_PASSWORD_FILE) = /home/toto/.vault_pass.txt RETRY_FILES_ENABLED(/home/toto/.ansible.cfg) = False ``` ##### OS / ENVIRONMENT ``` $ cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) ``` ``` $ lxd version 3.10 $ lxc version Client version: 3.10 Server version: 3.10 ``` ##### STEPS TO REPRODUCE Enough to have a 2 nodes LXD cluster configuration: ``` $ lxc cluster list +------------+---------------------------+----------+--------+-------------------+ | NAME | URL | DATABASE | STATE | MESSAGE | +------------+---------------------------+----------+--------+-------------------+ | space01 | https://10.33.254.21:8443 | YES | ONLINE | fully operational | +------------+---------------------------+----------+--------+-------------------+ | space01-fr | https://10.33.254.22:8443 | YES | ONLINE | fully operational | +------------+---------------------------+----------+--------+-------------------+ ``` A simple task to create containers: ``` --- - name: Create new container TEST connection: local lxd_container: name: "{{ container }}" state: started source: type: image alias: centos7 profiles: ["default"] wait_for_ipv4_addresses: true ``` Run the task twice, both from the same cluster node, space01-fr in my case. mycontainer will then be created on space01-fr, and mycontainer2 on space01: ``` ansible-playbook ~/ansible/playbooks/create_container.yml --extra-vars "container=mycontainer" --limit debian9 -vvvv ansible-playbook ~/ansible/playbooks/create_container.yml --extra-vars "container=mycontainer2" --limit debian9 -vvvv ``` ##### EXPECTED RESULTS Both playbook runs should successfully end up by creating first the running "mycontainer" container on space01-fr and another running one, mycontainer2 on space01 ##### ACTUAL RESULTS First run correctly work, task successfully ends up: ``` TASK [container : Create new container TEST] ************************************************************************************************************************************ task path: /home/toto/roles/container/tasks/main.yml:2 Using module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/lxd/lxd_container.py <10.0.3.210> ESTABLISH LOCAL CONNECTION FOR USER: toto <10.0.3.210> EXEC /bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-kowkhphfkqcrepmoevartsayavowlvxb; /usr/bin/python'"'"' && sleep 0' ok: [debian9] => { "actions": [], "addresses": { "eth0": [ "10.0.3.141" ] }, "changed": false, "invocation": { "module_args": { "architecture": null, "cert_file": "/root/.config/lxc/client.crt", "config": null, "description": null, "devices": null, "ephemeral": null, "force_stop": false, "key_file": "/root/.config/lxc/client.key", "name": "mycontainer", "profiles": [ "default" ], "source": { "alias": "centos7", "type": "image" }, "state": "started", "timeout": 30, "trust_password": null, "url": "unix:/var/lib/lxd/unix.socket", "wait_for_ipv4_addresses": true } }, "log_verbosity": 4, "logs": [ { "request": { "json": null, "method": "GET", "timeout": null, "url": "/1.0/containers/mycontainer" }, "response": { "json": { "error": "", "error_code": 0, "metadata": { "architecture": "x86_64", "config": { "volatile.base_image": "e95126069b4318f855b5785d1152920911001a5ecd7b20ca965c17076d6b7cc2", "volatile.eth0.hwaddr": "00:16:3e:72:e2:83", "volatile.idmap.base": "0", "volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":65536},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":65536}]", "volatile.last_state.idmap": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":65536},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":65536}]", "volatile.last_state.power": "RUNNING" }, "created_at": "2019-03-07T00:08:33.191688456+01:00", "description": "", "devices": {}, "ephemeral": false, "expanded_config": { "volatile.base_image": "e95126069b4318f855b5785d1152920911001a5ecd7b20ca965c17076d6b7cc2", "volatile.eth0.hwaddr": "00:16:3e:72:e2:83", "volatile.idmap.base": "0", "volatile.idmap.next": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":65536},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":65536}]", "volatile.last_state.idmap": "[{\"Isuid\":true,\"Isgid\":false,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":65536},{\"Isuid\":false,\"Isgid\":true,\"Hostid\":1000000,\"Nsid\":0,\"Maprange\":65536}]", "volatile.last_state.power": "RUNNING" }, "expanded_devices": { "eth0": { "name": "eth0", "nictype": "bridged", "parent": "lxcbr0", "type": "nic" }, "root": { "path": "/", "pool": "local", "type": "disk" } }, "last_used_at": "2019-03-07T00:08:46.07437376+01:00", "location": "space01-fr", "name": "mycontainer", "profiles": [ "default" ], "stateful": false, "status": "Running", "status_code": 103 }, "operation": "", "status": "Success", "status_code": 200, "type": "sync" } }, "type": "sent request" }, { "request": { "json": null, "method": "GET", "timeout": null, "url": "/1.0/containers/mycontainer/state" }, "response": { "json": { "error": "", "error_code": 0, "metadata": { "cpu": { "usage": 1626940399 }, "disk": {}, "memory": { "swap_usage": 0, "swap_usage_peak": 0, "usage": 15630336, "usage_peak": 16789504 }, "network": { "eth0": { "addresses": [ { "address": "10.0.3.141", "family": "inet", "netmask": "24", "scope": "global" }, { "address": "fe80::216:3eff:fe72:e283", "family": "inet6", "netmask": "64", "scope": "link" } ], "counters": { "bytes_received": 3102, "bytes_sent": 2736, "packets_received": 16, "packets_sent": 20 }, "host_name": "vethURUBCA", "hwaddr": "00:16:3e:72:e2:83", "mtu": 1500, "state": "up", "type": "broadcast" }, "lo": { "addresses": [ { "address": "127.0.0.1", "family": "inet", "netmask": "8", "scope": "local" }, { "address": "::1", "family": "inet6", "netmask": "128", "scope": "local" } ], "counters": { "bytes_received": 0, "bytes_sent": 0, "packets_received": 0, "packets_sent": 0 }, "host_name": "", "hwaddr": "", "mtu": 65536, "state": "up", "type": "loopback" } }, "pid": 6860, "processes": 12, "status": "Running", "status_code": 103 }, "operation": "", "status": "Success", "status_code": 200, "type": "sync" } }, "type": "sent request" } ], "old_state": "started" } META: ran handlers META: ran handlers PLAY RECAP ***************************************************************************************************************************************************************************** debian9 : ok=2 changed=0 unreachable=0 failed=0 ``` Container up and running: ``` $ lxc list +--------------+---------+-------------------+------+------------+-----------+------------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION | +--------------+---------+-------------------+------+------------+-----------+------------+ | mycontainer | RUNNING | 10.0.3.141 (eth0) | | PERSISTENT | | space01-fr | +--------------+---------+-------------------+------+------------+-----------+------------+ ``` But the second playbook run unexpectedly fails, the container is created but stopped: ``` task path: /home/toto/roles/container/tasks/main.yml:2 Using module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/lxd/lxd_container.py <10.0.3.210> ESTABLISH LOCAL CONNECTION FOR USER: toto <10.0.3.210> EXEC /bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-lomqbbiwaftywxrwshfxyzqixwhznesf; /usr/bin/python'"'"' && sleep 0' The full traceback is: WARNING: The below traceback may *not* be related to the actual failure. File "/tmp/ansible_lxd_container_payload_sUBNvM/__main__.py", line 509, in run action() File "/tmp/ansible_lxd_container_payload_sUBNvM/__main__.py", line 394, in _started self._create_container() File "/tmp/ansible_lxd_container_payload_sUBNvM/__main__.py", line 339, in _create_container self.client.do('POST', '/1.0/containers', config) File "/tmp/ansible_lxd_container_payload_sUBNvM/ansible_lxd_container_payload.zip/ansible/module_utils/lxd.py", line 97, in do self._raise_err_from_json(resp_json) File "/tmp/ansible_lxd_container_payload_sUBNvM/ansible_lxd_container_payload.zip/ansible/module_utils/lxd.py", line 132, in _raise_err_from_json raise LXDClientException(self._get_err_from_resp_json(resp_json), **err_params) fatal: [debian9]: FAILED! => { "actions": [], "changed": false, "invocation": { "module_args": { "architecture": null, "cert_file": "/root/.config/lxc/client.crt", "config": null, "description": null, "devices": null, "ephemeral": null, "force_stop": false, "key_file": "/root/.config/lxc/client.key", "name": "mycontainer2", "profiles": [ "default" ], "source": { "alias": "centos7", "type": "image" }, "state": "started", "timeout": 30, "trust_password": null, "url": "unix:/var/lib/lxd/unix.socket", "wait_for_ipv4_addresses": true } }, "logs": [ { "request": { "json": null, "method": "GET", "timeout": null, "url": "/1.0/containers/mycontainer2" }, "response": { "json": { "error": "not found", "error_code": 404, "type": "error" } }, "type": "sent request" }, { "request": { "json": { "name": "mycontainer2", "profiles": [ "default" ], "source": { "alias": "centos7", "type": "image" } }, "method": "POST", "timeout": null, "url": "/1.0/containers" }, "response": { "json": { "error": "", "error_code": 0, "metadata": { "class": "task", "created_at": "2019-03-07T00:09:53.791817652+01:00", "description": "Creating container", "err": "", "id": "44397294-5375-4f5c-88fb-07aac4efaec0", "may_cancel": false, "metadata": null, "resources": { "containers": [ "/1.0/containers/mycontainer2" ] }, "status": "Running", "status_code": 103, "updated_at": "2019-03-07T00:09:53.791817652+01:00" }, "operation": "/1.0/operations/44397294-5375-4f5c-88fb-07aac4efaec0?project=default", "status": "Operation created", "status_code": 100, "type": "async" } }, "type": "sent request" }, { "request": { "json": null, "method": "GET", "timeout": null, "url": "/1.0/operations/44397294-5375-4f5c-88fb-07aac4efaec0?project=default/wait" }, "response": { "json": { "error": "", "error_code": 0, "metadata": { "class": "task", "created_at": "2019-03-07T00:09:53.791817652+01:00", "description": "Creating container", "err": "", "id": "44397294-5375-4f5c-88fb-07aac4efaec0", "may_cancel": false, "metadata": null, "resources": { "containers": [ "/1.0/containers/mycontainer2" ] }, "status": "Running", "status_code": 103, "updated_at": "2019-03-07T00:09:53.791817652+01:00" }, "operation": "", "status": "Success", "status_code": 200, "type": "sync" } }, "type": "sent request" } ], "msg": "" } PLAY RECAP ***************************************************************************************************************************************************************************** debian9 : ok=1 changed=0 unreachable=0 failed=1 ``` ``` $ lxc list +--------------+---------+-------------------+------+------------+-----------+------------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION | +--------------+---------+-------------------+------+------------+-----------+------------+ | mycontainer | RUNNING | 10.0.3.141 (eth0) | | PERSISTENT | | space01-fr | +--------------+---------+-------------------+------+------------+-----------+------------+ | mycontainer2 | STOPPED | | | PERSISTENT | | space01 | +--------------+---------+-------------------+------+------------+-----------+------------+ ```

Is it possible to configure LXD cluster to disable this behaviour?

Thanks for your time everyone.

roman · July 8, 2020, 5:05pm

Additional issue https://github.com/ansible/ansible/issues/40479

freeekanayaka · July 8, 2020, 5:39pm

I don’t think there’s much that can be done on the LXD side. If you want explicit placement, --target is the option to use (or ?target= if you are using the REST API). Without explicit placement I don’t see a way which LXD can guess the right placement for all possible workflows it might used for.

roman · July 8, 2020, 5:48pm

Instead of guessing just disable that behaviour so it would always create container on itself.

freeekanayaka · July 8, 2020, 6:57pm

Okay that might be a reasonable knob to have I guess. @stgraber thoughts? That’d be basically a way to change the default placement algorithm.

roman · July 8, 2020, 7:22pm

Thanks freekanayaka, it would be nice to have that feature, not that it must or should be present.

After having a long conversation with someone on IRC, I think it would be more reasonable for lxd module in ansible to enable support of --target parameter. LXD has it and Ansible is missing it.

turtle0x1 · July 8, 2020, 7:41pm

If you guys do decide to this, would you consider adding “reaching out” to URL’s for “roll your own” ?

roman · July 13, 2020, 8:43pm

This is related to issue of this subject. I am trying to figure out how does LXD API behaves with creating instances and targets.

Here is my JSON file (names edited), data.json:

{
 "name": "test-4",
 "source": {"type": "image",
          "alias": "my-template"}
}

curl command:
curl -XPOST -k --cert x.crt --key x.key https://host-01:8443/1.0/instances?target=host-01 --data @data.json | jq
Instance created in first host, here is JSON result:

{
  "type": "async",
  "status": "Operation created",
  "status_code": 100,
  "operation": "/1.0/operations/84b22a5d-1c6e-48be-af95-d8e70ac31d8a",
  "error_code": 0,
  "error": "",
  "metadata": {
    "id": "84b22a5d-1c6e-48be-af95-d8e70ac31d8a",
    "class": "task",
    "description": "Creating container", 
    "created_at": "2020-07-13T21:19:10.866729854+01:00",
    "updated_at": "2020-07-13T21:19:10.866729854+01:00", 
    "status": "Running",
    "status_code": 103,
    "resources": {
      "containers": [
        "/1.0/containers/test-4"
      ],
      "instances": [
        "/1.0/instances/test-4"
      ]
    },
    "metadata": null,
    "may_cancel": false,
    "err": "",
    "location": "host-01"
  }
}

lxs ls:

$ lxc ls
+-----------+---------+------------------------+------+-----------+-----------+----------+
|   NAME    |  STATE  |          IPV4          | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+-----------+---------+------------------------+------+-----------+-----------+----------+
| test-4    | STOPPED |                        |      | CONTAINER | 0         | host-01  |
+-----------+---------+------------------------+------+-----------+-----------+----------+

Now I will try to repeat the command, and specify different target. Normally, if I do it with lxc, it will not allow me, regardless of target. With curl it is different, and that is what I do not understand, what is happening. If LXC clsuter tried move the container to another host - that does not happen. And If ansible will try to do same, using LXD API, and achieve nothing, then it will confuse the user. So, what am I doing wrong or misunderstanding?

First, repeating the same command:
curl -XPOST -k --cert x.crt --key x.key https://host-01:8443/1.0/instances?target=host-01 --data @data.json | jq

{
  "type": "async", 
  "status": "Operation created",
  "status_code": 100,
  "operation": "/1.0/operations/76638686-bd4f-45b3-93ba-33e8c1737df5",
  "error_code": 0,
  "error": "",
  "metadata": {
    "id": "76638686-bd4f-45b3-93ba-33e8c1737df5",
    "class": "task",
    "description": "Creating container",
    "created_at": "2020-07-13T21:19:19.911966704+01:00",
    "updated_at": "2020-07-13T21:19:19.911966704+01:00",
    "status": "Running",
    "status_code": 103,
    "resources": {
      "containers": [
        "/1.0/containers/test-4"
      ],
      "instances": [
        "/1.0/instances/test-4"
      ]
    },
    "metadata": null,
    "may_cancel": false,
    "err": "",
    "location": "host-01"
  }
}

repeating same command, but with different target:
curl -XPOST -k --cert x.crt --key x.key https://host-01:8443/1.0/instances?target=host-02 --data @data.json | jq

{
  "type": "async",
  "status": "Operation created",
  "status_code": 100,
  "operation": "/1.0/operations/07b7b322-3b84-4086-85bd-1b823f1d9742?project=default",
  "error_code": 0,
  "error": "",
  "metadata": {
    "id": "07b7b322-3b84-4086-85bd-1b823f1d9742",
    "class": "task",
    "description": "Creating container",
    "created_at": "2020-07-13T21:19:31.264413533+01:00",
    "updated_at": "2020-07-13T21:19:31.264413533+01:00",
    "status": "Running",
    "status_code": 103,
    "resources": {
      "containers": [
        "/1.0/containers/test-4"
      ],
      "instances": [
        "/1.0/instances/test-4"
      ]
    },
    "metadata": null,
    "may_cancel": false,
    "err": "",
    "location": "host-02"
  }
}

However, lxc ls disagrees with this:
$ lxc ls

+-----------+---------+------------------------+------+-----------+-----------+----------+
|   NAME    |  STATE  |          IPV4          | IPV6 |   TYPE    | SNAPSHOTS | LOCATION |
+-----------+---------+------------------------+------+-----------+-----------+----------+
| test-4    | STOPPED |                        |      | CONTAINER | 0         | host-01  |
+-----------+---------+------------------------+------+-----------+-----------+----------+

What I am trying to achieve is to test, in case Ansible user tries to run same script again - same API call will be executed. But API does not behave same as LXC command would. Should ansible developer perform additional checks before running API, to check whether instance exists already, and if yes - instead of POST, decide whether to do UPDATE?

Let’s say there is ansible scenario described by ansible user:
Instance A to be deployed on nodeA.
Ansible developer will decide to have these rules then.

check if container with this name already exists
if not, create container - send POST API call
if container already exists, then check if it is already exists on requested node - do nothing
if container already exists, but not on the node it is requested - move it to another node requested. Should above POST worked for this, if not - then what?

Just FYI, my storage type is directory.

Thanks

freeekanayaka · July 14, 2020, 8:33am

The reason is that POST /1.0./instances is asynchronous . You receive back the ID of an operation that tracks the progress of the container creation, and to know whether the creation succeed or failed you need to wait for that operation to complete. You can do that with something like:

op=$(curl -XPOST -k --cert x.crt --key x.key https://host-01:8443/1.0/instances?target=host-01 --data @data.json | jq -r .operation)
curl -k --cert x.crt --key x.key https://host-01:8443$op/wait

And that should give you back something like this:

{
  "type": "sync",
  "status": "Success",
  "status_code": 200,
  "operation": "",
  "error_code": 0,
  "error": "",
  "metadata": {
    "id": "98a96cc2-d48f-4d17-bec9-7b637f721ff1",
    "class": "task",
    "description": "Creating container",
    "created_at": "2020-07-14T08:29:46.211091569Z",
    "updated_at": "2020-07-14T08:29:46.211091569Z",
    "status": "Failure",
    "status_code": 400,
    "resources": {
      "containers": [
        "/1.0/containers/test-4"
      ],
      "instances": [
        "/1.0/instances/test-4"
      ]
    },
    "metadata": null,
    "may_cancel": false,
    "err": "Create instance: Add instance info to the database: This instance already exists",
    "location": "host-01"
  }
}

Note the err field in the above payload. That’s basically what the lxc CLI does as well.

roman · July 30, 2020, 8:27am

Just an update, there is pull request in github which will introduce --target feature to ansible lxd_cluster module. That is where I needed it.
Ansible community will evaluate those code changes, but I am afraid that will not include someone actually testing it on their own cluster.
I had tested myself and it works for my case, however I did not perform any rigorous tests, guess we will find out in future if anyone has problems.