Hi all,
I’m trying to define an lxd cluster as a juju cloud following this post
After defining the lxd cluster as a juju cloud the first node refuse to start lxd.
Here the step I followed:
I have a working lxd cluster that I can start and stop without problems
sysop@kvmnode1:~$ lxc cluster list
+----------+-----------------------------+----------+--------+-------------------+
| NAME | URL | DATABASE | STATE | MESSAGE |
+----------+-----------------------------+----------+--------+-------------------+
| kvmnode1 | https://192.168.201.11:8443 | YES | ONLINE | fully operational |
+----------+-----------------------------+----------+--------+-------------------+
| kvmnode2 | https://192.168.201.12:8443 | YES | ONLINE | fully operational |
+----------+-----------------------------+----------+--------+-------------------+
| kvmnode3 | https://192.168.201.13:8443 | YES | ONLINE | fully operational |
+----------+-----------------------------+----------+--------+-------------------+
All three nodes are KVM virtual machines with Kubuntu 18.04 and lxd 3.13 installed by snap
Then I installed juju 2.6.2 by snap on the first node (kvmnode1) and the cluster continued to work
After this I tried to define the lxd cluster as a juju cloud.
I defined a file juju-lxd.yaml containing
clouds:
lxd-cloud:
type: lxd
auth-types: [interactive, certificate]
endpoint: https://192.168.201.11:8443
And a file juju-credentials.yaml containing
credentials:
lxd-cloud:
admin:
auth-type: interactive
trust-password: ZuleicaDobson
Then I issued the commands to add cloud definitions and credentials to juju.
Here the transcript:
sysop@kvmnode1:~/SVILUPPO/for_juju$ kate juju-lxd.yaml sysop@kvmnode1:~/SVILUPPO/for_juju$ juju add-cloud lxd-cloud ./juju-lxd.yaml Since Juju 2 is being run for the first time, downloading latest cloud information. Fetching latest public cloud list... Your list of public clouds is up to date, see `juju clouds`. There are no controllers running. Adding cloud to local cache so you can use it to bootstrap a controller. sysop@kvmnode1:~/SVILUPPO/for_juju$ juju clouds There are no controllers running. You can bootstrap a new controller using one of these clouds: Cloud Regions Default Type Description aws 15 us-east-1 ec2 Amazon Web Services aws-china 2 cn-north-1 ec2 Amazon China aws-gov 1 us-gov-west-1 ec2 Amazon (USA Government) azure 27 centralus azure Microsoft Azure azure-china 2 chinaeast azure Microsoft Azure China cloudsigma 12 dub cloudsigma CloudSigma Cloud google 18 us-east1 gce Google Cloud Platform joyent 6 us-east-1 joyent Joyent Cloud oracle 4 us-phoenix-1 oci Oracle Cloud Infrastructure oracle-classic 5 uscom-central-1 oracle Oracle Cloud Infrastructure Classic rackspace 6 dfw rackspace Rackspace Cloud localhost 1 localhost lxd LXD Container Hypervisor lxd-cloud 0 lxd LXD Container Hypervisor sysop@kvmnode1:~/SVILUPPO/for_juju$ kate juju-credentials.yaml sysop@kvmnode1:~/SVILUPPO/for_juju$ juju add-credential lxd-cloud -f ./juju-credentials.yaml Generating client cert/key in "/home/sysop/.local/share/juju/lxd" Uploaded certificate to LXD server. Credentials "admin" added for cloud "lxd-cloud". sysop@kvmnode1:~/SVILUPPO/for_juju$
And all seems to work correctly. But on a cluster reboot the first node never start lxd again
I end up with this:
From kvmnode3
+----------+-----------------------------+----------+---------+------------------------------------+
| NAME | URL | DATABASE | STATE | MESSAGE |
+----------+-----------------------------+----------+---------+------------------------------------+
| kvmnode1 | https://192.168.201.11:8443 | YES | OFFLINE | no heartbeat since 30m56.10063924s |
+----------+-----------------------------+----------+---------+------------------------------------+
| kvmnode2 | https://192.168.201.12:8443 | YES | ONLINE | fully operational |
+----------+-----------------------------+----------+---------+------------------------------------+
| kvmnode3 | https://192.168.201.13:8443 | YES | ONLINE | fully operational |
+----------+-----------------------------+----------+---------+------------------------------------+
And, from kvmnode1, I see that the lxd status is
sysop@kvmnode1:/var/zdata$ sudo systemctl status snap.lxd.daemon
● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2019-05-22 12:13:34 CEST; 12min ago
Process: 8257 ExecStart=/usr/bin/snap run lxd.daemon (code=exited, status=1/FAILURE)
Main PID: 8257 (code=exited, status=1/FAILURE)
mag 22 12:13:34 kvmnode1 systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
mag 22 12:13:34 kvmnode1 systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 57.
mag 22 12:13:34 kvmnode1 systemd[1]: Stopped Service for snap application lxd.daemon.
mag 22 12:13:34 kvmnode1 systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
mag 22 12:13:34 kvmnode1 systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
mag 22 12:13:34 kvmnode1 systemd[1]: Failed to start Service for snap application lxd.daemon.
sysop@kvmnode1:/var/zdata$
What I’m doing wrong?
UPDATE
Tried to use the predefined localhost cloud; again corrupted node 1 but a different error (address already in use)
Transcript:
sysop@kvmnode1:/var/zdata$ juju bootstrap Clouds aws aws-china aws-gov azure azure-china cloudsigma google joyent localhost oracle oracle-classic rackspace Select a cloud [localhost]: Enter a name for the Controller [localhost-localhost]: ctrl-localhost Creating Juju controller "ctrl-localhost" on localhost/localhost Looking for packaged Juju agent version 2.6.2 for amd64 To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md Launching controller instance(s) on localhost/localhost... - juju-837276-0 (arch=amd64) Installing Juju agent on bootstrap instance Fetching Juju GUI 2.14.0 Waiting for address Attempting to connect to 240.11.0.149:22 Connected to 240.11.0.149 Running machine configuration script... Bootstrap agent now started Contacting Juju controller at 240.11.0.149 to verify accessibility... Bootstrap complete, controller "ctrl-localhost" now is available Controller machines are in the "controller" model Initial model "default" added sysop@kvmnode1:/var/zdata$
And the error on kvmnode1:
sysop@kvmnode1:/var/zdata$ sudo systemctl status snap.lxd.daemon
● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static; vendor preset: enabled)
Active: active (running) since Wed 2019-05-22 14:50:36 CEST; 13s ago
Main PID: 22202 (daemon.start)
Tasks: 2 (limit: 4915)
CGroup: /system.slice/snap.lxd.daemon.service
├─22202 /bin/sh /snap/lxd/10756/commands/daemon.start
└─22271 logrotate -f /snap/lxd/10756/etc/logrotate.conf -s /etc/logrotate.statusmag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Setting up mntns symlink (mnt:[4026532413])
mag 22 14:50:36 kvmnode1 systemd[1]: Stopped Service for snap application lxd.daemon.
mag 22 14:50:36 kvmnode1 systemd[1]: Started Service for snap application lxd.daemon.
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Setting up kmod wrapper
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Preparing /boot
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Preparing a clean copy of /run
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Preparing a clean copy of /etc
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Setting up ceph configuration
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Setting up LVM configuration
mag 22 14:50:39 kvmnode1 lxd.daemon[22202]: ==> Rotating logs
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: ==> Setting up ZFS (0.7)
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: ==> Escaping the systemd cgroups
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: ==> Escaping the systemd process resource limits
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: ==> Disabling shiftfs on this kernel (auto)
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: => Re-using existing LXCFS
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: => Starting LXD
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: t=2019-05-22T14:50:50+0200 lvl=warn msg=“CGroup memory swap accounting is disabled, swap limits will be ignored.”
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: t=2019-05-22T14:50:50+0200 lvl=eror msg="Failed to start the daemon: Listen to cluster address: listen tcp 192.168.201.11:8443: bin
mag 22 14:50:50 kvmnode1 lxd.daemon[22202]: Error: Listen to cluster address: listen tcp 192.168.201.11:8443: bind: address already in use
sysop@kvmnode1:/var/zdata$