Lxc snapshot errors after migrate

OK try doing this:

sudo lxd sql global 'delete from storage_volumes_snapshots where id in (43706, 43707)'

Yay! Rows affected: 2

|       NAME        |       TAKEN AT       | EXPIRES AT | STATEFUL |
| pre-trim-database | 2021/11/29 14:37 EST |            | NO       |
$ lxc storage show default | grep "pitemp/snapshots"
- /1.0/instances/pitemp/snapshots/pre-trim-database
- /1.0/instances/pitemp/snapshots/pre-trim-database
$ lxc storage volume list default | grep "TYPE\|pitemp"
|         TYPE         |                               NAME                               |  DESCRIPTION  | CONTENT-TYPE | USED BY |
| container            | pitemp                                                           |               | filesystem   | 1       |
| container (snapshot) | pitemp/pre-trim-database                                         |               | filesystem   | 1       |
| container (snapshot) | pitemp/pre-trim-database                                         | Auto repaired | filesystem   | 1       |
$ lxd sql global 'select storage_volumes.name, storage_volumes_snapshots.* from storage_volumes_snapshots join storage_volumes on storage_volumes_snapshots.storage_volume_id = storage_volumes.id' | grep "pitemp"
| pitemp           | 43705 | 21134             | pre-trim-database                             | Auto repaired | 0001-01-01T00:00:00Z |

Success!

I know this is a lot to ask, but is there a particular sequence I should follow to clean up the snapshots on all of the other containers I have? Or should I just do lxc delete [container]/[snapshot] and then use the other commands to find the anomalies?

If you can use lxc delete command then that would be preferable as it’ll clean up both records and the on-disk snapshot too.

There appears to be an issue with LXD 4.0 LTS series in that its not correctly converting the LXD 3.0 LTS storage volume snapshot records (that used to be stored in storage_volumes table) to be stored in the storage_volumes_snapshots table, where they live today.

Then a later patch in LXD 5.0 LTS detects the missing storage_volumes_snapshots records and creates them again. But the orphaned snapshot entries in storage_volumes remain and should be deleted.

Furthermore I don’t know what happened in your case why the instance snapshot records were differing from those in storage_volumes_snapshots.

Well, I’m going to begin cleaning up all of the snapshots in the migrated system now. I expect I have enough information and understanding of how to do this, although it’s possible I’ll be back.

Thomas, thank you very much.

Thanks, glad we got it working.

I’ve recreated the issue with the storage_volumes/storage_volumes_snapshots discrepancy btw and logged it here:

As for why some of the instance snapshot records were missing (which is what caused the Instance snapshot record count doesn't match instance snapshot volume record count error) I don’t know how it got like that.

All I can say is that I only used lxc snapshot [container] [snapshot] and lxc delete [container]/[snapshot] for manipulating snapshots. I do have a nightly script that uses these commands, so perhaps I failed to adjust it properly after the migration from 3 to 4 to 5. If I learn more, I’ll post it here.

@tomp , I seem to have a lingering problem with one container. I first tried to copy a container (called redirected-sites) from one LXD host to another (after first deleting the container from the receiving host) and got this:

$ lxc delete redirected-sites
$ lxc copy remotehost:redirected-sites/now redirected-sites
Error: Failed instance creation: Error transferring instance data: Failed creating instance on target: Volume already exists on storage but not in database

This worked for the other containers in the past, so I checked using the same commands I used when we were figuring out the previous problem, and I got this:

$ mycontainer=redirected-sites
$ lxc info $mycontainer
Error: Instance not found
$ lxd sql global 'select * from storage_volumes' | grep "$mycontainer"

nothing

$ lxd sql global 'select storage_volumes.name, storage_volumes_snapshots.* from storage_volumes_snapshots join storage_volumes on storage_volumes_snapshots.storage_volume_id = storage_volumes.id' | grep "$mycontainer"

nothing

So, yes, there are no records of this instance in the databases, but there is a directory for that instance in /var/snap/lxd/common/lxd/storage-pools/default/containers/.

Can I simply delete that directory for that container?

Thanks.

Yes that should be fine.

Hmmm. How can I delete it?

# rm -rf redirected-sites/
rm: cannot remove 'redirected-sites/': Operation not permitted

I changed owner to root:root and chmod to 700. Won’t let go.

Oh its BTRFS.

Try:

sudo btrfs subvolume delete <path>

Wow! I have a lot to learn. Thanks, Thomas.

1 Like