LXD has stopped working - Failed connecting to global database:no available dqlite leader server foundfailed to create dqlite connection:

Information

  • Distribution is Ubuntu 18.04 LTS
  • LXD 3.0.3 (default version)
  • Three nodes cluster Machine1 : 192.168.89.164,Machine2 : 192.168.89.162,Machine3 : 192.168.89.163

Description

Hello,
This morning, I started my nodes and I tried to execute the command “lxc list” and it did not respond.
Any command starting with “lxc” doesn’t work.
The only command I did yesterday was “lxc config set core.https_address :8443” on the 3 nodes to be able to manage my cluster remotely with “lxc remote …”.
When I try to do “sudo systemctl start lxd.service”, it doesn’t react.
I do not see how to resolve this error. Below I put errors messages:

FBorate@Machine2:~$ sudo systemctl start lxd
Job for lxd.service failed because the control process exited with error code.
See “systemctl status lxd.service” and “journalctl -xe” for details.

FBorate@Machine2:~$ systemctl status lxd.service

● lxd.service - LXD - main daemon

Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled)

Active: activating (start-post) since Tue 2020-05-12 09:36:18 CEST; 24s ago

    Docs: man:lxd(1)

Process: 2100 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS)

Main PID: 2119 (lxd); Control PID: 2120 (lxd)

    Tasks: 18

CGroup: /system.slice/lxd.service

        ├─2119 /usr/lib/lxd/lxd --group lxd --logfile=/var/log/lxd/lxd.log

        └─2120 /usr/lib/lxd/lxd waitready --timeout=600

mai 12 09:36:18 Machine2 systemd[1]: Starting LXD - main daemon...

mai 12 09:36:18 Machine2 lxd[2119]: t=2020-05-12T09:36:18+0200 lvl=warn msg="CGroup memory swap accounting is disabled, swap limit

...skipping...

● lxd.service - LXD - main daemon

Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled)

Active: activating (start-post) since Tue 2020-05-12 09:36:18 CEST; 24s ago

    Docs: man:lxd(1)

Process: 2100 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS)

Main PID: 2119 (lxd); Control PID: 2120 (lxd)

    Tasks: 18

CGroup: /system.slice/lxd.service

        ├─2119 /usr/lib/lxd/lxd --group lxd --logfile=/var/log/lxd/lxd.log

        └─2120 /usr/lib/lxd/lxd waitready --timeout=600

mai 12 09:36:18 Machine2 systemd[1]: Starting LXD - main daemon...

mai 12 09:36:18 Machine2 lxd[2119]: t=2020-05-12T09:36:18+0200 lvl=warn msg="CGroup memory swap accounting is disabled, swap limit

FBorate@Machine2:~$ journalctl -xe

mai 12 09:26:37 Machine2 systemd-timesyncd[558]: Network configuration changed, trying to establish connection.

mai 12 09:26:37 Machine2 systemd-timesyncd[558]: Synchronized to time server 91.189.89.199:123 (ntp.ubuntu.com).

mai 12 09:27:46 Machine2 lxd[2019]: t=2020-05-12T09:27:46+0200 lvl=warn msg="Failed connecting to global database (attempt 6): failed to create dqlite connection: no available dqlite leader server found"

...

mai 12 09:29:31 Machine2 sudo[1995]: pam_unix(sudo:session): session closed for user root

mai 12 09:29:42 Machine2 lxd[2019]: t=2020-05-12T09:29:42+0200 lvl=warn msg="Failed connecting to global database (attempt 15): failed to create dqlite connection: no available dqlite leader server found"

...

mai 12 09:36:03 Machine2 sudo[2083]: darochafa : TTY=pts/0 ; PWD=/home/darochafa ; USER=root ; COMMAND=/bin/systemctl start lxd

mai 12 09:36:03 Machine2 sudo[2083]: pam_unix(sudo:session): session opened for user root by darochafa(uid=0)

mai 12 09:36:06 Machine2 lxd[2019]: t=2020-05-12T09:36:06+0200 lvl=warn msg="Failed connecting to global database (attempt 45): failed to create dqlite connection: no available dqlite leader server found"

mai 12 09:36:17 Machine2 lxd[2020]: Error: LXD still not running after 600s timeout (<nil>)

mai 12 09:36:17 Machine2 systemd[1]: lxd.service: Control process exited, code=exited status=1

mai 12 09:36:17 Machine2 systemd[1]: lxd.service: Failed with result 'exit-code'.

mai 12 09:36:17 Machine2 systemd[1]: Failed to start LXD - main daemon.

-- Subject: L'unité (unit) lxd.service a échoué

-- Defined-By: systemd

-- Support: http://www.ubuntu.com/support

--

-- L'unité (unit) lxd.service a échoué, avec le résultat RESULT.

mai 12 09:36:17 Machine2 sudo[2083]: pam_unix(sudo:session): session closed for user root

mai 12 09:36:18 Machine2 systemd[1]: lxd.service: Service hold-off time over, scheduling restart.

mai 12 09:36:18 Machine2 systemd[1]: lxd.service: Scheduled restart job, restart counter is at 1.

-- Subject: Le redémarrage automatique d'une unité (unit) a été planifié

-- Defined-By: systemd

-- Support: http://www.ubuntu.com/support

--

-- Le redémarrage automatique de l'unité (unit) lxd.service a été planifié, en

-- raison de sa configuration avec le paramètre Restart=.

mai 12 09:36:18 Machine2 systemd[1]: Stopped LXD - main daemon.

-- Subject: L'unité (unit) lxd.service a terminé son arrêt

-- Defined-By: systemd

-- Support: http://www.ubuntu.com/support

--

-- L'unité (unit) lxd.service a terminé son arrêt.

mai 12 09:36:18 Machine2 systemd[1]: Starting LXD - main daemon...

-- Subject: L'unité (unit) lxd.service a commencé à démarrer

-- Defined-By: systemd

-- Support: http://www.ubuntu.com/support

--

-- L'unité (unit) lxd.service a commencé à démarrer.

mai 12 09:36:18 Machine2 audit[2114]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/bin/lxc-start" pid=2114 comm="apparmor_parser"

mai 12 09:36:18 Machine2 kernel: audit: type=1400 audit(1589268978.232:42): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/bin/lxc-start" pid=2114 comm="apparmor_par

mai 12 09:36:18 Machine2 audit[2118]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default" pid=2118 comm="apparmor_parser"

mai 12 09:36:18 Machine2 audit[2118]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-cgns" pid=2118 comm="apparmor_parser"

mai 12 09:36:18 Machine2 audit[2118]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-mounting" pid=2118 comm="apparmor_parser"

mai 12 09:36:18 Machine2 audit[2118]: AVC apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-nesting" pid=2118 comm="apparmor_parser"

mai 12 09:36:18 Machine2 kernel: audit: type=1400 audit(1589268978.240:43): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default" pid=2118 comm="apparmor_

mai 12 09:36:18 Machine2 kernel: audit: type=1400 audit(1589268978.240:44): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-cgns" pid=2118 comm="appa

mai 12 09:36:18 Machine2 kernel: audit: type=1400 audit(1589268978.240:45): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-mounting" pid=2118 c

mai 12 09:36:18 Machine2 kernel: audit: type=1400 audit(1589268978.240:46): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="lxc-container-default-with-nesting" pid=2118 co

mai 12 09:36:18 Machine2 lxd[2119]: t=2020-05-12T09:36:18+0200 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored."

FBorate@Machine2:~$ cat /var/log/lxd/lxd.log

t=2020-05-12T07:46:00+0000 lvl=info msg="LXD 3.0.3 is starting in normal mode" path=/var/lib/lxd

t=2020-05-12T07:46:00+0000 lvl=info msg="Kernel uid/gid map:"

t=2020-05-12T07:46:00+0000 lvl=info msg=" - u 0 0 4294967295"

t=2020-05-12T07:46:00+0000 lvl=info msg=" - g 0 0 4294967295"

t=2020-05-12T07:46:00+0000 lvl=info msg="Configured LXD uid/gid map:"

t=2020-05-12T07:46:01+0000 lvl=info msg=" - u 0 100000 1000000000"

t=2020-05-12T07:46:01+0000 lvl=info msg=" - g 0 100000 1000000000"

t=2020-05-12T07:46:01+0000 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored."

t=2020-05-12T07:46:01+0000 lvl=info msg="Kernel features:"

t=2020-05-12T07:46:01+0000 lvl=info msg=" - netnsid-based network retrieval: no"

t=2020-05-12T07:46:01+0000 lvl=info msg=" - unprivileged file capabilities: yes"

t=2020-05-12T07:46:01+0000 lvl=info msg="Initializing local database"

t=2020-05-12T07:46:01+0000 lvl=info msg="Starting /dev/lxd handler:"

t=2020-05-12T07:46:01+0000 lvl=info msg=" - binding devlxd socket" socket=/var/lib/lxd/devlxd/sock

t=2020-05-12T07:46:01+0000 lvl=info msg="REST API daemon:"

t=2020-05-12T07:46:01+0000 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/lib/lxd/unix.socket

t=2020-05-12T07:46:01+0000 lvl=info msg=" - binding TCP socket" socket=[::]:8443

t=2020-05-12T07:46:01+0000 lvl=info msg="Initializing global database"

t=2020-05-12T07:47:00+0000 lvl=warn msg="Failed connecting to global database (attempt 6): failed to create dqlite connection: no available dqlite leader server found"

t=2020-05-12T07:47:13+0000 lvl=warn msg="Failed connecting to global database (attempt 7): failed to create dqlite connection: no available dqlite leader server found"

... (attempt 45)

I’d recommend upgrading to LXD 4.0, with the snap. This is very likely to be a bug that was fixed since 3.0.

What does:

sqlite3 /var/lib/lxd/database/local.db "SELECT * FROM raft_nodes"

return? (you might have to apt-get install sqlite3).

Thank you for your reply !

Command result:

FBorate@Machine2:~$ sudo sqlite3 /var/lib/lxd/database/local.db “SELECT * FROM raft_nodes”
2|192.168.89.162:8443
3|192.168.89.163:8443
4|192.168.89.164:8443

It seems that the lxc config set core.https_address :8443 command you run has changed the node bind address from IPv4 to IPv6:

t=2020-05-12T07:46:01+0000 lvl=info msg=" - binding TCP socket" socket=[::]:8443

you might be able to recover by running:

lxc config set cluster.https_address 192.168.89.162:8443

on the node which originally had core.https_address set to 192.168.89.162:8443. And the same for other two nodes.

If that does not work you’ll have to revert core.https_address to its original IPv4 value.

If I do a command starting with “lxc”, I have this error :

FBorate@Machine2:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:65:26:f3 brd ff:ff:ff:ff:ff:ff
inet 192.168.89.162/24 brd 192.168.89.255 scope global dynamic ens33
valid_lft 1622sec preferred_lft 1622sec
inet6 fe80::20c:29ff:fe65:26f3/64 scope link
valid_lft forever preferred_lft forever
FBorate@Machine2:~$ lxc config set cluster.https_address 192.168.89.162:8443
Error: Get http://unix.socket/1.0: EOF

How can I revert core.https_address to its original IPv4 value ?

Try to shutdown the lxd daemon (systemctl stop lxd) then to run this:

sqlite3 /var/lib/lxd/database/local.db "UPDATE config SET value='192.168.89.162:8443' WHERE key='core.https_adress'

and restart lxd.

(you might need to install the SQLite cli with apt-get install sqlite3).

I did that but when I start the LXD service, it doesn’t respond.
(I added a " to the end of command.)

FBorate@machine1:~$ sudo systemctl stop lxd
Warning: Stopping lxd.service, but it can still be activated by:
lxd.socket
FBorate@machine1:~$ sudo systemctl stop lxd.socket
FBorate@machine1:~$ sudo systemctl stop lxd

FBorate@machine1:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:63:04:55 brd ff:ff:ff:ff:ff:ff
inet 192.168.89.164/24 brd 192.168.89.255 scope global dynamic ens33
valid_lft 1299sec preferred_lft 1299sec
inet6 fe80::20c:29ff:fe63:455/64 scope link
valid_lft forever preferred_lft forever

FBorate@machine1:~$ sudo sqlite3 /var/lib/lxd/database/local.db "UPDATE config SET value='192.168.89.164:8443
FBorate@machine1:~$ sudo systemctl start lxd
^C
FBorate@machine1:~$ sudo systemctl start lxd.socket
FBorate@machine1:~$ sudo systemctl start lxd
^C


FBorate@Machine2:~$ sudo systemctl stop lxd.socket
FBorate@Machine2:~$ sudo systemctl stop lxd

FBorate@Machine2:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:65:26:f3 brd ff:ff:ff:ff:ff:ff
inet 192.168.89.162/24 brd 192.168.89.255 scope global dynamic ens33
valid_lft 1146sec preferred_lft 1146sec
inet6 fe80::20c:29ff:fe65:26f3/64 scope link
valid_lft forever preferred_lft forever

FBorate@Machine2:~$ sudo sqlite3 /var/lib/lxd/database/local.db “UPDATE config SET value=‘192.168.89.162:8443’ WHERE key=‘core.https_adress’”
FBorate@Machine2:~$ sudo systemctl start lxd.socket
FBorate@Machine2:~$ sudo systemctl start lxd
^C


And Machine3 the same command with IP: 192.168.89.163

Can you paste /var/log/lxd/lxd.log?

FBorate@machine1:~$ sudo cat /var/log/lxd/lxd.log

t=2020-05-13T19:16:55+0200 lvl=warn msg=“Failed connecting to global database (attempt 30): failed to create dqlite connection: no available dqlite leader server found”
t=2020-05-13T19:17:59+0200 lvl=warn msg=“Failed connecting to global database (attempt 35): failed to create dqlite connection: no available dqlite leader server found”
t=2020-05-13T19:19:03+0200 lvl=warn msg=“Failed connecting to global database (attempt 40): failed to create dqlite connection: no available dqlite leader server found”
t=2020-05-13T19:20:08+0200 lvl=warn msg=“Failed connecting to global database (attempt 45): failed to create dqlite connection: no available dqlite leader server found”
t=2020-05-13T19:20:21+0200 lvl=info msg=“LXD 3.0.3 is starting in normal mode” path=/var/lib/lxd
t=2020-05-13T19:20:21+0200 lvl=info msg=“Kernel uid/gid map:”
t=2020-05-13T19:20:21+0200 lvl=info msg=" - u 0 0 4294967295"
t=2020-05-13T19:20:21+0200 lvl=info msg=" - g 0 0 4294967295"
t=2020-05-13T19:20:21+0200 lvl=info msg=“Configured LXD uid/gid map:”
t=2020-05-13T19:20:21+0200 lvl=info msg=" - u 0 100000 1000000000"
t=2020-05-13T19:20:21+0200 lvl=info msg=" - g 0 100000 1000000000"
t=2020-05-13T19:20:21+0200 lvl=warn msg=“CGroup memory swap accounting is disabled, swap limits will be ignored.”
t=2020-05-13T19:20:21+0200 lvl=info msg=“Kernel features:”
t=2020-05-13T19:20:21+0200 lvl=info msg=" - netnsid-based network retrieval: no"
t=2020-05-13T19:20:21+0200 lvl=info msg=" - unprivileged file capabilities: yes"
t=2020-05-13T19:20:21+0200 lvl=info msg=“Initializing local database”
t=2020-05-13T19:20:21+0200 lvl=info msg=“Starting /dev/lxd handler:”
t=2020-05-13T19:20:21+0200 lvl=info msg=" - binding devlxd socket" socket=/var/lib/lxd/devlxd/sock
t=2020-05-13T19:20:21+0200 lvl=info msg=“REST API daemon:”
t=2020-05-13T19:20:21+0200 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/lib/lxd/unix.socket
t=2020-05-13T19:20:21+0200 lvl=info msg=" - binding TCP socket" socket=[::]:8443
t=2020-05-13T19:20:21+0200 lvl=info msg=“Initializing global database”
t=2020-05-13T19:21:48+0200 lvl=warn msg=“Failed connecting to global database (attempt 6): failed to create dqlite connection: no available dqlite leader server found”
t=2020-05-13T19:22:01+0200 lvl=warn msg=“Failed connecting to global database (attempt 7): failed to create dqlite connection: no available dqlite leader server found”
t=2020-05-13T19:22:14+0200 lvl=warn msg=“Failed connecting to global database (attempt 8): failed to create dqlite connection: no available dqlite leader server found”

It seems it’s still binding the new IPv6 address. Please can you paste the output of:

sqlite3 /var/lib/lxd/database/local.db "SELECT * FROM raft_nodes"

I have the same result as before:

FBorate@machine1:~$ sudo sqlite3 /var/lib/lxd/database/local.db "SELECT * FROM raft_nodes"
2|192.168.89.162:8443
3|192.168.89.163:8443
4|192.168.89.164:8443

Sorry, my bad. I actually meant the output of:

sqlite3 /var/lib/lxd/database/local.db "SELECT * FROM config"

thanks.

No problem.
It keeps the :8443.

FBorate@machine1:~$ sudo sqlite3 /var/lib/lxd/database/local.db “SELECT * FROM config”
2|core.https_address|:8443

Ok, there was a typo in my initial UPDATE statement (‘adress’ vs ‘address’). Please try:

sqlite3 /var/lib/lxd/database/local.db "UPDATE config SET value='192.168.89.162:8443' WHERE key='core.https_address'"

Sorry, I didn’t see the error.
Perfect, now it works again!
Thank you for your help !