Can I upgrade my broken apt lxd to a snap lxd, may be this will help whatever is broken, Is it some how possible to use a previous zfs pool with a new lxd init

Notice this upgrade procedure will this work on my broken cluster

  1. If you already use LXD from the deb package , then
    1.1 Install the LXD snap package.
    1.2 Run lxd.migrate to migrate the existing installation to the snap version of LXD
    1.3 When prompted, you can get lxd.migrate to remove the old LXD deb packages.
  2. If you have not used at all LXD from the deb package , then
    2.1 Uninstall the deb package of LXD.
    2.2 Install the snap package.
    2.3 You may have to logout and log in again (or run hash -r ) to refresh the list of executables. You do that if it says something like command not found .

Upgrading from deb to snap with clustering was only very recently allowed and hasn’t been widely used yet and that’s with a properly working LXD cluster as the source, I would not recommend doing so with a broken cluster.

Your best bet is to wait for @freeekanayaka to be back to work on Monday and to get things back online from there. I know how tempting it can be to try just about every solution you can find on the internet when running into a problem, but this tends to make things much much worse to the point where we can’t really help you anymore.

I am leaving 3 servers untouched, and playing with my 4 broken server which is expendable.
Migrate didn’t work surprise surprise. But I did get the snap lxd working. Got have a plan b ready in case plan A (fixing original cluster doesn’t work)

I am looking forward to working with @freeekanayaka to see how can get these puppies talking with each other or how far we can go to fix this or move it.

Hello @Tony_Anytime, I think I might have a solution. It’s similar to the one I already suggested, but with a small difference that should make things work this time.

Please follow exactly these steps:

  1. Make sure LXD is not running on any of the nodes, and that systemd won’t try start it during this procedure.
  2. Make a backup of the /var/lib/lxd/database directory on all nodes (e.g. cp -a /var/lib/lxd/database /var/lib/lxd/database.bak-20190211)
  3. Take the moe.database.tar.gz file attachment in the mail that you sent me on Friday, with subject LXD Databases. If you don’t have that email anymore, please let me know and I’ll send that file to you.
  4. Extract that tarball somewhere in a temporary space: tar xfz moe.database.tar.gz, this will create a database/ directory in your temporary space.
  5. Remove the logs.db file from the above directory: rm database/global/logs.db
  6. Remove the /var/lib/lxd/database/global directory from all nodes: e.g. rm -r /var/lib/lxd/database/global.
  7. Copy the database/global directory from your temporary space to each of the nodes, e.g. scp -r database/global <node>:/var/lib/lxd/database/global
  8. Start LXD on all nodes.

It’s essential that you use moe.database.tar.gz, and not any other tarball from another node.

It’s also essential that you replace only the database/global subdirectory and not the whole database/ directory.

This should bring your cluster back.

I will try this at end of day today, thank you very much for your efforts.

When I do all this, I guess that just need to restart lxd. No need to reboot server or lxc?

Correct, no need to reboot the servers or restart the lxc containers.

Seems to be working. Thanks again. You saved a week of work and worse.
Minus 4th server JOE (disposable) that I had install snap lxc also,
It says
service lxd start --debug
Failed to start lxd.service: Unit lxd.service is masked.
It is no big for me to do lxd init on this server to get is back to cluster.
Or may be remove snap lxd

But the question, how I got here to begin with, what is going to happen when I reboot servers.
Am I going to have issue again, and then…
How do I upgrade to snap version of Cluster.
Or should I just stay at 3.0.3

Well it almost works, Yes the cluster is up,
But and this a big one. All the data, and info in containers are incomplete. Almost like disk scrambled. Funny thing they do mount and I can see files in via /lxd/storage-pools/local/containersand even copy them but Lxc is confused.
if I do lxc exec container bash and do ls I get some weird partial directory listing.

By the way it is very very slow to copy the files, I am backing up most important files now.

The problem is much worse it seems to happening on all of the servers.
If I copy container over to other service via lxc copy It copies it over, all the files are there but ubuntu itself in container is scamble up, like fat in container is mess up. It is like ls is messup but I can navigate via mc just fine.

@@@ FIXED - Looks like zfs ran out of space even thought it had plenty
I was able to move some containers out and it is operating normally with one big issue.
It show only 3.7 gb of 25gb available when it should be something like 25gb of 100gb zfs.
So far I am going to move stuff out to another and then see what I can do.

root@LARRY:/home/ic2000# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
local 99.5G 92.7G 6.78G - 78% 93% 1.00x ONLINE

I still wonder what I should do about migrating this lxd to a snap lxd.
Your suggestion on this are welcomed.

New information above