No Solution Yet-Problem after upgrade of Ubuntu 18.04 (emergency)


(Free Ekanayaka) #41

It’s not version mismatch. The logs would have looked different.


#42

lxd --version

on all?


(Tony Anytime) #43

Is there a way to delete database in JOE, and copy a new one from Moe for example.
Or perhaps manually remove joe from cluster


(Free Ekanayaka) #44

I don’t know how else to say it: joe is not the problem. All nodes are affected.


(Tony Anytime) #45

lxd --version
3.0.3


#46

on all servers it says that?


(Tony Anytime) #47

yep


(Tony Anytime) #48

what is latest?


#49

3.10 on snap. 3.0.3 shows on my apt.


#50

I would circle back to the cluster IP. What is that WAN IP pointed to?


(Tony Anytime) #51

Yeah so version numbers may not be 100% accurate or it is stuck in in between. Cluster ip? This all working fine for months until upgrade. All on same IP range


(Tony Anytime) #52

This is interesting
lxd
WARN[02-08|11:55:51] CGroup memory swap accounting is disabled, swap limits will be ignored.
WARN[02-08|11:55:56] Raft: Heartbeat timeout from “” reached, starting election
WARN[02-08|11:56:00] Raft: Election timeout reached, restarting election
WARN[02-08|11:56:03] Raft: Election timeout reached, restarting election
WARN[02-08|11:56:08] Raft: Election timeout reached, restarting election
WARN[02-08|11:56:12] Raft: Election timeout reached, restarting election
WARN[02-08|11:56:17] Raft: Election timeout reached, restarting election
WARN[02-08|11:56:17] Raft: AppendEntries to {Voter 4 64.71.77.80:8443} rejected, sending older logs (next: 5066720)
WARN[02-08|11:56:18] Raft: Failed to contact 2 in 1.505950989s
WARN[02-08|11:56:20] Raft: Failed to contact 2 in 2.825055425s
WARN[02-08|11:56:21] Raft: Failed to contact 2 in 4.094632009s


#53

Which server is that?


(Tony Anytime) #54

Joe, 4th server that had upgrade
NFO[02-08|12:00:07] Kernel features:
INFO[02-08|12:00:07] - netnsid-based network retrieval: no
INFO[02-08|12:00:07] - unprivileged file capabilities: yes
INFO[02-08|12:00:07] Initializing local database
DBUG[02-08|12:00:07] Initializing database gateway
DBUG[02-08|12:00:07] Connecting to a local LXD over a Unix socket
DBUG[02-08|12:00:07] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
DBUG[02-08|12:00:07] Detected stale unix socket, deleting
DBUG[02-08|12:00:07] Detected stale unix socket, deleting
INFO[02-08|12:00:07] Starting /dev/lxd handler:
INFO[02-08|12:00:07] - binding devlxd socket socket=/var/lib/lxd/devlxd/sock
INFO[02-08|12:00:07] REST API daemon:
INFO[02-08|12:00:07] - binding Unix socket socket=/var/lib/lxd/unix.socket
INFO[02-08|12:00:07] - binding TCP socket socket=[::]:8443
INFO[02-08|12:00:07] Initializing global database
DBUG[02-08|12:00:07] Dqlite: server connection failed err=failed to establish network connection: Head https://64.71.77.29:8443/internal/database: dial tcp 64.71.77.29:8443: connect: connection refused address=64.71.77.29:8443 attempt=0
DBUG[02-08|12:00:07] Dqlite: server connection failed err=failed to establish network connection: Head https://64.71.77.32:8443/internal/database: dial tcp 64.71.77.32:8443: connect: connection refused address=64.71.77.32:8443 attempt=0
DBUG[02-08|12:00:07] Dqlite: server connection failed err=failed to establish network connection: 503 Service Unavailable address=64.71.77.80:8443 attempt=0
DBUG[02-08|12:00:07] Dqlite: connection failed err=no available dqlite leader server found attempt=0


(Tony Anytime) #55

BTW Firewall does allow port 8443


#56

Looks to me like joe is operating correctly for the most part.


(Tony Anytime) #57

Not familiar with this log


(Tony Anytime) #58

Just no one is talking to each other


#59

Joe is saying service unavailable 503. Might investigate more down that path, never seen that one before.


(Tony Anytime) #60

Going out a bit, it is prime internet time here, So far I am going to backup and reboot tomorrow morning early and pray, pray alot. I will attack this later with a fresh perspective, thank you all for help. Let me know if you think of anything.