I’ve been running into what is seemingly a pretty odd issue with a Rocky 9 cephadm deployed Ceph cluster and snap installed LXD. Everything is functioning fine but when I run this ansible playbook the first VM 'typically is created successfully but the second one ends up with this issue RBD issue when issuing LXC move. The VM is like partially created then gets stuck mounting the config and never creates the Block devices. I’m on latest stable with cephadm ceph quincy. It is a three server cluster with 15 total OSDs. Below is the ansible I run it is like LXD struggles creating the second VM and ends up in a undeletable state until you clear up the stuck namespace mounts.
tracking: latest/stable
refresh-date: today at 17:28 UTC
installed: 5.13-8e2d7eb (24846) 174MB -
Ansible playbook main.yaml
---
- name: Deploy Virtual Machines to LXD cluster
hosts: localhost
connection: local
tasks:
- name: command | Create drbd disks in LXD
command: "lxc storage volume create remote {{ item }}-drbd size=400GB --type block"
failed_when: "drbd_volumes.stderr != '' and 'Volume by that name already exists' not in drbd_volumes.stderr"
changed_when: "'Volume by that name already exists' not in drbd_volumes.stderr"
register: drbd_volumes
loop: "{{ groups['management'] }}"
- name: community.general.lxd_container | Create node LXD VMs
community.general.lxd_container:
name: "{{ item }}"
type: virtual-machine
state: started
profiles: ["default"]
source:
protocol: simplestreams
type: image
mode: pull
server: https://images.linuxcontainers.org
alias: rockylinux/8/cloud
config:
limits.cpu: 8
limits.mem: 8GB
cloud-init.user-data: |
#cloud-config
hostname: {{ item }}
fqdn: {{ item }}.test
users:
- name: root
ssh_authorized_keys: <snipped>
lock_passwd: false
disable_root: false
ssh_pwauth: true
swap:
filename: /swapfile
size: 4G
maxsize: 4G
cloud-init.network-config: |
version: 2
ethernets:
enp5s0:
addresses:
- {{ hostvars[item].ip_address_public }}
gateway4: <public_gateway_ip>
nameservers:
addresses:
- <dns_server_ips>
search: test
enp6s0:
addresses:
- {{ hostvars[item].ip_address_internal }}
routes:
- to: <internal_network>
via: <internal_gateway>
devices:
eth0:
name: eth0
network: vlan905
type: nic
eth1:
name: eth1
network: vlan1120
type: nic
drbd:
source: "{{ item }}-drbd"
pool: remote
type: disk
root:
path: /
pool: remote
size: 200GB
type: disk
loop: "{{ groups['management'] }}"
Inventory file:
all:
children:
staging:
management:
hosts:
testmn1:
management_primary: true
ip_address_public: <public_ip>/25
ip_address_internal: <internal_ip>/24
testmn2:
ip_address_public: <public_ip>/25
ip_address_internal: <internal_ip>/24
The LXD and Ceph nodes are bare bones servers fresh OS hardly anything has been installed outside of ansible and disabling selinux and firewalld then deployed ceph using cephadm quincy version. Then installed snap and LXD. Followed the LXD init then testing a simple VM ran the ansible above and the second VM was dead in the water.