Odd issues with Ansible lxd_container

lebonez · May 18, 2023, 9:30pm

I’ve been running into what is seemingly a pretty odd issue with a Rocky 9 cephadm deployed Ceph cluster and snap installed LXD. Everything is functioning fine but when I run this ansible playbook the first VM 'typically is created successfully but the second one ends up with this issue RBD issue when issuing LXC move. The VM is like partially created then gets stuck mounting the config and never creates the Block devices. I’m on latest stable with cephadm ceph quincy. It is a three server cluster with 15 total OSDs. Below is the ansible I run it is like LXD struggles creating the second VM and ends up in a undeletable state until you clear up the stuck namespace mounts.

tracking: latest/stable
refresh-date: today at 17:28 UTC
installed: 5.13-8e2d7eb (24846) 174MB -

Ansible playbook main.yaml

---
- name: Deploy Virtual Machines to LXD cluster
  hosts: localhost
  connection: local
  tasks:
    - name: command | Create drbd disks in LXD
      command: "lxc storage volume create remote {{ item }}-drbd size=400GB --type block"
      failed_when: "drbd_volumes.stderr != '' and 'Volume by that name already exists' not in drbd_volumes.stderr"
      changed_when: "'Volume by that name already exists' not in drbd_volumes.stderr"
      register: drbd_volumes
      loop: "{{ groups['management'] }}"

    - name: community.general.lxd_container | Create node LXD VMs
      community.general.lxd_container:
        name: "{{ item }}"
        type: virtual-machine
        state: started
        profiles: ["default"]
        source:
          protocol: simplestreams
          type: image
          mode: pull
          server: https://images.linuxcontainers.org
          alias: rockylinux/8/cloud
        config:
          limits.cpu: 8
          limits.mem: 8GB
          cloud-init.user-data: |
            #cloud-config
            hostname: {{ item }}
            fqdn: {{ item }}.test
            users:
              - name: root
                ssh_authorized_keys: <snipped>
                lock_passwd: false
            disable_root: false
            ssh_pwauth: true
            swap:
              filename: /swapfile
              size: 4G
              maxsize: 4G
          cloud-init.network-config: |
            version: 2
            ethernets:
              enp5s0:
                addresses:
                  - {{ hostvars[item].ip_address_public }}
                gateway4: <public_gateway_ip>
                nameservers:
                  addresses:
                    - <dns_server_ips>
                  search: test
              enp6s0:
                addresses:
                  - {{ hostvars[item].ip_address_internal }}
                routes:
                  - to: <internal_network>
                    via: <internal_gateway>
        devices:
          eth0:
            name: eth0
            network: vlan905
            type: nic
          eth1:
            name: eth1
            network: vlan1120
            type: nic
          drbd:
            source: "{{ item }}-drbd"
            pool: remote
            type: disk
          root:
            path: /
            pool: remote
            size: 200GB
            type: disk
      loop: "{{ groups['management'] }}"

Inventory file:

all:
  children:
    staging:
        management:
          hosts:
            testmn1:
              management_primary: true
              ip_address_public: <public_ip>/25
              ip_address_internal: <internal_ip>/24
            testmn2:
              ip_address_public: <public_ip>/25
              ip_address_internal: <internal_ip>/24

The LXD and Ceph nodes are bare bones servers fresh OS hardly anything has been installed outside of ansible and disabling selinux and firewalld then deployed ceph using cephadm quincy version. Then installed snap and LXD. Followed the LXD init then testing a simple VM ran the ansible above and the second VM was dead in the water.