first of all: LXD is great. Thanks for your hard work!
I’m running a lxd cluster with 15 nodes. All nodes are KVM VMs with Ubuntu 22.04 and latest LXD installed via Snap.
There are ~950 relatively small containers on the cluster which all have the same software installed (Apache, PHP, Asterisk) and all containers are running but doing nothing.
lxc list takes about 50 seconds to return a result.
lxc list containers on about half of the nodes are shown in ERROR state although they are running and working correctly.
Load on the database leader is around 2.
top shows lxd consuming > 100 % CPU most of the time. Load on other nodes is around 0.5.
I noticed high outgoing network traffic on the database leader.
iftop shows almost 2 GB of data sent to other nodes in the cluster within 1 minute.
systemctl reload snap.lxd.daemon on the database leader load and network traffic goes down and another node becomes the leader showing the same thing (load going up, CPU usage of lxd process > 100 %, high outgoing network traffic).
Is there anything I can do to improve my installation or is that expected behavior?