I’ve had troubles keeping a cluster up and running for more than one day.
My proof of concept setup is composed of three nodes connected on the same switch running with lxd 3.18 (and now 3.19), openvswitch taking their nic over in order to give a transparent access to every container (they receive the DHCP configuration from a separated server), and juju to deploy applications.
The problem is the cluster is unstable, will not survive a reboot, and sometimes will not survive its own existence.
I’ve had every kind of issues, ranging from core_address being randomly deleted in the local.db database, to troubles even starting lxd.
More often than not it was unix socket related but some odd errors shown up too (some saying they’re not errors).
What have i been doing wrong?
Have any of you met this kind of issues and overcome them (since lxd is supposed to be production ready)?