No Solution Yet-Problem after upgrade of Ubuntu 18.04 (emergency)

(Stéphane Graber) #21

Hmm, no, and a failure to backup should either have been fatal or at least result in a log entry, so that’s pretty confusing.

(Stéphane Graber) #22

Pretty sure, yes, this isn’t a snapped installation, so we’re talking 3.0.x here and all logs above show a startup failure caused by storage config problem.

(Stéphane Graber) #23

Hmm, if this was 3.0.0 it may predate the backup feature maybe? I can’t remember when we added unconditional .bak of the clustered db.


Odd timing I suppose.

(Tony Anytime) #25

I upgraded all servers, but then only rebooted one, joe. And thiis might have caused problem.
Remember ubuntu 18.04, just did a major upgrade and perhaps not everything took
lxc version
Client version: 3.0.3
Server version: unreachable
Not a snap installation, I believe


Once stgraber gets this sorted for ya, I would setup a snap cluster to migrate to. Pretty happy so far, ran into some issues with my ceph storage pool on the 18.04 apt version when I was getting that setup.

(Tony Anytime) #27

I tried the snap in earlier version and lost my zpool twice. I wish there was a way to uncluster this cluster and then recluster it again into a new cluster. I could do that with my 4 server. But I think right now, it is all or nothing deal.


I would avoid ZFS like the plague, way too easy to lose data IMO especially if you are using loops.

(Tony Anytime) #29

Yeah this is my fear, if I lose the lxd, I can lose my data. What would you recommend better.


Anything really, I setup a ceph pool it is working well with ~90 containers hitting it.


My current setup is 4 lxd servers and 4 ceph servers with SSD storage.

(Free Ekanayaka) #32

Dumping the database seems to indicate that the source config is actually there:

sqlite> select * from storage_pools;
sqlite> select * from storage_pools_config;

so I’m not totally sure what’s going on.


Does that exist? Previous ZFS adventures lead me to believe it does not.

If it does you might be able to remount it manually.

(Tony Anytime) #34

Yes, they exist on all server. But containers not live on Joe becuase LXD not running


I would make a backup of that file everywhere before doing anything else, if you have the disk space.

I noticed you had a WAN IP setup as the cluster IP. Can you telnet to that, and get a response?

Hopefully it is a static IP.

(Tony Anytime) #36

Doing it again, just in case.
Here is the thing, I don’t care about this cluster member as much, except all the others are not working either. Simply turning this one off does nothing. Is it holding other members down?

(Free Ekanayaka) #37

Tony the problem is not one node, is all nodes.

(Tony Anytime) #38

So it is a corrupt database, or servers stuck in between upgrades. zpool?
What do you think so far?

(Free Ekanayaka) #39

The database does not seem corrupted, at first sight, that’s why I’m scratching my head.


Data probably OK as the file still exists, I would say version mismatch probably. My nodes didn’t come back until they were all upgraded.