No Solution Yet-Problem after upgrade of Ubuntu 18.04 (emergency)


(Stéphane Graber) #21

Hmm, no, and a failure to backup should either have been fatal or at least result in a log entry, so that’s pretty confusing.


(Stéphane Graber) #22

Pretty sure, yes, this isn’t a snapped installation, so we’re talking 3.0.x here and all logs above show a startup failure caused by storage config problem.


(Stéphane Graber) #23

Hmm, if this was 3.0.0 it may predate the backup feature maybe? I can’t remember when we added unconditional .bak of the clustered db.


#24

Odd timing I suppose.


(Tony Anytime) #25

I upgraded all servers, but then only rebooted one, joe. And thiis might have caused problem.
Remember ubuntu 18.04, just did a major upgrade and perhaps not everything took
lxc version
Client version: 3.0.3
Server version: unreachable
Not a snap installation, I believe


#26

Once stgraber gets this sorted for ya, I would setup a snap cluster to migrate to. Pretty happy so far, ran into some issues with my ceph storage pool on the 18.04 apt version when I was getting that setup.


(Tony Anytime) #27

I tried the snap in earlier version and lost my zpool twice. I wish there was a way to uncluster this cluster and then recluster it again into a new cluster. I could do that with my 4 server. But I think right now, it is all or nothing deal.


#28

I would avoid ZFS like the plague, way too easy to lose data IMO especially if you are using loops.


(Tony Anytime) #29

Yeah this is my fear, if I lose the lxd, I can lose my data. What would you recommend better.


#30

Anything really, I setup a ceph pool it is working well with ~90 containers hitting it.


#31

My current setup is 4 lxd servers and 4 ceph servers with SSD storage.


(Free Ekanayaka) #32

Dumping the database seems to indicate that the source config is actually there:

sqlite> select * from storage_pools;
1|local|zfs||1
sqlite> select * from storage_pools_config;
2|1|1|size|100GB
3|1|1|source|/var/lib/lxd/disks/local.img
4|1|1|zfs.pool_name|local
5|1|2|size|100GB
6|1|2|source|/var/lib/lxd/disks/local.img
7|1|2|zfs.pool_name|local
8|1|3|size|100GB
9|1|3|source|/var/lib/lxd/disks/local.img
10|1|3|zfs.pool_name|local
11|1|4|size|100GB
12|1|4|source|/var/lib/lxd/disks/local.img
13|1|4|zfs.pool_name|local

so I’m not totally sure what’s going on.


#33
/var/lib/lxd/disks/local.img

Does that exist? Previous ZFS adventures lead me to believe it does not.

If it does you might be able to remount it manually.


(Tony Anytime) #34

Yes, they exist on all server. But containers not live on Joe becuase LXD not running


#35

I would make a backup of that file everywhere before doing anything else, if you have the disk space.

I noticed you had a WAN IP setup as the cluster IP. Can you telnet to that, and get a response?

Hopefully it is a static IP.


(Tony Anytime) #36

Doing it again, just in case.
Here is the thing, I don’t care about this cluster member as much, except all the others are not working either. Simply turning this one off does nothing. Is it holding other members down?


(Free Ekanayaka) #37

Tony the problem is not one node, is all nodes.


(Tony Anytime) #38

So it is a corrupt database, or servers stuck in between upgrades. zpool?
What do you think so far?


(Free Ekanayaka) #39

The database does not seem corrupted, at first sight, that’s why I’m scratching my head.


#40

Data probably OK as the file still exists, I would say version mismatch probably. My nodes didn’t come back until they were all upgraded.