Define lxd cluster as juju cloud corrupt cluster

Hi all,
I’m trying to define an lxd cluster as a juju cloud following this post

After defining the lxd cluster as a juju cloud the first node refuse to start lxd.
Here the step I followed:

I have a working lxd cluster that I can start and stop without problems

sysop@kvmnode1:~$ lxc cluster list
+----------+-----------------------------+----------+--------+-------------------+
|   NAME   |             URL             | DATABASE | STATE  |      MESSAGE      |
+----------+-----------------------------+----------+--------+-------------------+
| kvmnode1 | https://192.168.201.11:8443 | YES      | ONLINE | fully operational |
+----------+-----------------------------+----------+--------+-------------------+
| kvmnode2 | https://192.168.201.12:8443 | YES      | ONLINE | fully operational |
+----------+-----------------------------+----------+--------+-------------------+
| kvmnode3 | https://192.168.201.13:8443 | YES      | ONLINE | fully operational |
+----------+-----------------------------+----------+--------+-------------------+

All three nodes are KVM virtual machines with Kubuntu 18.04 and lxd 3.13 installed by snap

Then I installed juju 2.6.2 by snap on the first node (kvmnode1) and the cluster continued to work

After this I tried to define the lxd cluster as a juju cloud.
I defined a file juju-lxd.yaml containing

clouds:
lxd-cloud:
type: lxd
auth-types: [interactive, certificate]
endpoint: https://192.168.201.11:8443

And a file juju-credentials.yaml containing

credentials:
lxd-cloud:
admin:
auth-type: interactive
trust-password: ZuleicaDobson

Then I issued the commands to add cloud definitions and credentials to juju.
Here the transcript:

sysop@kvmnode1:~/SVILUPPO/for_juju$ kate juju-lxd.yaml
sysop@kvmnode1:~/SVILUPPO/for_juju$ juju add-cloud lxd-cloud ./juju-lxd.yaml
Since Juju 2 is being run for the first time, downloading latest cloud information.
Fetching latest public cloud list...
Your list of public clouds is up to date, see `juju clouds`.
There are no controllers running.
Adding cloud to local cache so you can use it to bootstrap a controller.
sysop@kvmnode1:~/SVILUPPO/for_juju$ juju clouds
There are no controllers running.
You can bootstrap a new controller using one of these clouds:
Cloud           Regions  Default          Type        Description
aws                  15  us-east-1        ec2         Amazon Web Services
aws-china             2  cn-north-1       ec2         Amazon China                                                                                                             
aws-gov               1  us-gov-west-1    ec2         Amazon (USA Government)                                                                                                  
azure                27  centralus        azure       Microsoft Azure                                                                                                          
azure-china           2  chinaeast        azure       Microsoft Azure China                                                                                                    
cloudsigma           12  dub              cloudsigma  CloudSigma Cloud                                                                                                         
google               18  us-east1         gce         Google Cloud Platform                                                                                                    
joyent                6  us-east-1        joyent      Joyent Cloud
oracle                4  us-phoenix-1     oci         Oracle Cloud Infrastructure
oracle-classic        5  uscom-central-1  oracle      Oracle Cloud Infrastructure Classic
rackspace             6  dfw              rackspace   Rackspace Cloud
localhost             1  localhost        lxd         LXD Container Hypervisor
lxd-cloud             0                   lxd         LXD Container Hypervisor
sysop@kvmnode1:~/SVILUPPO/for_juju$ kate juju-credentials.yaml
sysop@kvmnode1:~/SVILUPPO/for_juju$ juju add-credential lxd-cloud -f ./juju-credentials.yaml
Generating client cert/key in "/home/sysop/.local/share/juju/lxd"
Uploaded certificate to LXD server.
Credentials "admin" added for cloud "lxd-cloud".
sysop@kvmnode1:~/SVILUPPO/for_juju$ 

And all seems to work correctly. But on a cluster reboot the first node never start lxd again
I end up with this:

    From kvmnode3
+----------+-----------------------------+----------+---------+------------------------------------+
|   NAME   |             URL             | DATABASE |  STATE  |              MESSAGE               |
+----------+-----------------------------+----------+---------+------------------------------------+
| kvmnode1 | https://192.168.201.11:8443 | YES      | OFFLINE | no heartbeat since 30m56.10063924s |
+----------+-----------------------------+----------+---------+------------------------------------+
| kvmnode2 | https://192.168.201.12:8443 | YES      | ONLINE  | fully operational                  |
+----------+-----------------------------+----------+---------+------------------------------------+
| kvmnode3 | https://192.168.201.13:8443 | YES      | ONLINE  | fully operational                  |
+----------+-----------------------------+----------+---------+------------------------------------+

And, from kvmnode1, I see that the lxd status is

    sysop@kvmnode1:/var/zdata$ sudo systemctl status snap.lxd.daemon
● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2019-05-22 12:13:34 CEST; 12min ago
Process: 8257 ExecStart=/usr/bin/snap run lxd.daemon (code=exited, status=1/FAILURE)
Main PID: 8257 (code=exited, status=1/FAILURE)

mag 22 12:13:34 kvmnode1 systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
mag 22 12:13:34 kvmnode1 systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 57.
mag 22 12:13:34 kvmnode1 systemd[1]: Stopped Service for snap application lxd.daemon.
mag 22 12:13:34 kvmnode1 systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
mag 22 12:13:34 kvmnode1 systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
mag 22 12:13:34 kvmnode1 systemd[1]: Failed to start Service for snap application lxd.daemon.
sysop@kvmnode1:/var/zdata$

What I’m doing wrong?

UPDATE

Tried to use the predefined localhost cloud; again corrupted node 1 but a different error (address already in use)

Transcript:

sysop@kvmnode1:/var/zdata$ juju bootstrap
Clouds
aws
aws-china
aws-gov
azure
azure-china
cloudsigma
google
joyent
localhost
oracle
oracle-classic
rackspace
Select a cloud [localhost]: 
Enter a name for the Controller [localhost-localhost]: ctrl-localhost
Creating Juju controller "ctrl-localhost" on localhost/localhost
Looking for packaged Juju agent version 2.6.2 for amd64                                                                                                                        
To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md                                          
Launching controller instance(s) on localhost/localhost...                                                                                                                     
 - juju-837276-0 (arch=amd64)                                                                                                                                                  
Installing Juju agent on bootstrap instance                                                                                                                                    
Fetching Juju GUI 2.14.0                                                                                                                                                       
Waiting for address
Attempting to connect to 240.11.0.149:22
Connected to 240.11.0.149
Running machine configuration script...
Bootstrap agent now started
Contacting Juju controller at 240.11.0.149 to verify accessibility...
Bootstrap complete, controller "ctrl-localhost" now is available
Controller machines are in the "controller" model
Initial model "default" added
sysop@kvmnode1:/var/zdata$ 

And the error on kvmnode1:

sysop@kvmnode1:/var/zdata$ sudo systemctl status snap.lxd.daemon
● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static; vendor preset: enabled)
Active: active (running) since Wed 2019-05-22 14:50:36 CEST; 13s ago
Main PID: 22202 (daemon.start)
Tasks: 2 (limit: 4915)
CGroup: /system.slice/snap.lxd.daemon.service
├─22202 /bin/sh /snap/lxd/10756/commands/daemon.start
└─22271 logrotate -f /snap/lxd/10756/etc/logrotate.conf -s /etc/logrotate.status

mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Setting up mntns symlink (mnt:[4026532413])
mag 22 14:50:36 kvmnode1 systemd[1]: Stopped Service for snap application lxd.daemon.
mag 22 14:50:36 kvmnode1 systemd[1]: Started Service for snap application lxd.daemon.
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Setting up kmod wrapper
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Preparing /boot
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Preparing a clean copy of /run
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Preparing a clean copy of /etc
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Setting up ceph configuration
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Setting up LVM configuration
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Rotating logs
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: ==> Setting up ZFS (0.7)
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: ==> Escaping the systemd cgroups
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: ==> Escaping the systemd process resource limits
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: ==> Disabling shiftfs on this kernel (auto)
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: => Re-using existing LXCFS
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: => Starting LXD
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: t=2019-05-22T14:50:50+0200 lvl=warn msg=“CGroup memory swap accounting is disabled, swap limits will be ignored.”
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: t=2019-05-22T14:50:50+0200 lvl=eror msg="Failed to start the daemon: Listen to cluster address: listen tcp 192.168.201.11:8443: bin
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: Error: Listen to cluster address: listen tcp 192.168.201.11:8443: bind: address already in use
sysop@kvmnode1:/var/zdata$

Replay to myself

Well I found that you cannot install juju on any lxd cluster node.

I needed to install juju in a different machine.

To setup a lxd cluster as a juju cloud I had to:

  1. create a three node cluster on three different machines

±---------±----------------------------±---------±-------±------------------+
| NAME | URL | DATABASE | STATE | MESSAGE |
±---------±----------------------------±---------±-------±------------------+
| kvmnode1 | https://192.168.201.11:8443 | YES | ONLINE | fully operational |
±---------±----------------------------±---------±-------±------------------+
| kvmnode2 | https://192.168.201.12:8443 | YES | ONLINE | fully operational |
±---------±----------------------------±---------±-------±------------------+
| kvmnode3 | https://192.168.201.13:8443 | YES | ONLINE | fully operational |
±---------±----------------------------±---------±-------±------------------+

  1. create a different machine (kvmnode0 with IP 192.168.201.9)

  2. in kvmnode0 add a static route to the cluster fan network using one cluster node as gateway

     network:
         version: 2
         renderer: networkd
         ethernets:
             ens3:
                 dhcp4: no
                 dhcp6: no
                 addresses: [192.168.202.9/24]
                 gateway4: 192.168.202.1
                 nameservers: 
                     addresses: [192.168.202.1, 8.8.8.8]
    
         ens4:
             dhcp4: no
             dhcp6: no
             addresses: [192.168.200.9/24]
    
         ens5:
             dhcp4: no
             dhcp6: no
             addresses: [192.168.201.9/24]                                                                                                                                                                            
             routes:                                                                                                                                                                                                  
                 - to: 240.0.0.0/8                                                                                                                                                                                    
                   via: 192.168.201.11        
    

Then I was able to install juju on kvmnode0 and define an “lxd-remote” cloud for juju

sysop@kvmnode0:~$ juju clouds                                                                                                                                                                                        
Clouds on controller "lxd-remote-default":                                                                                                                                                                           
                                                                                                                                                                                                                     
Cloud       Regions  Default  Type  Description
lxd-remote        1  default  lxd   

sysop@kvmnode0:~$ 

After this I was able to start and stop the cluster machines without problems

BTW to start correctly the cluster I need to start one node at time waiting for the complete startup before starting the next node