Lxc snapshot errors after migrate

tomp · January 17, 2023, 3:31pm

OK try doing this:

sudo lxd sql global 'delete from storage_volumes_snapshots where id in (43706, 43707)'

johnmaher · January 17, 2023, 3:38pm

Yay! Rows affected: 2

|       NAME        |       TAKEN AT       | EXPIRES AT | STATEFUL |
| pre-trim-database | 2021/11/29 14:37 EST |            | NO       |

$ lxc storage show default | grep "pitemp/snapshots"
- /1.0/instances/pitemp/snapshots/pre-trim-database
- /1.0/instances/pitemp/snapshots/pre-trim-database

$ lxc storage volume list default | grep "TYPE\|pitemp"
|         TYPE         |                               NAME                               |  DESCRIPTION  | CONTENT-TYPE | USED BY |
| container            | pitemp                                                           |               | filesystem   | 1       |
| container (snapshot) | pitemp/pre-trim-database                                         |               | filesystem   | 1       |
| container (snapshot) | pitemp/pre-trim-database                                         | Auto repaired | filesystem   | 1       |

$ lxd sql global 'select storage_volumes.name, storage_volumes_snapshots.* from storage_volumes_snapshots join storage_volumes on storage_volumes_snapshots.storage_volume_id = storage_volumes.id' | grep "pitemp"
| pitemp           | 43705 | 21134             | pre-trim-database                             | Auto repaired | 0001-01-01T00:00:00Z |

Success!

I know this is a lot to ask, but is there a particular sequence I should follow to clean up the snapshots on all of the other containers I have? Or should I just do lxc delete [container]/[snapshot] and then use the other commands to find the anomalies?

tomp · January 17, 2023, 3:42pm

If you can use lxc delete command then that would be preferable as it’ll clean up both records and the on-disk snapshot too.

There appears to be an issue with LXD 4.0 LTS series in that its not correctly converting the LXD 3.0 LTS storage volume snapshot records (that used to be stored in storage_volumes table) to be stored in the storage_volumes_snapshots table, where they live today.

Then a later patch in LXD 5.0 LTS detects the missing storage_volumes_snapshots records and creates them again. But the orphaned snapshot entries in storage_volumes remain and should be deleted.

Furthermore I don’t know what happened in your case why the instance snapshot records were differing from those in storage_volumes_snapshots.

johnmaher · January 17, 2023, 3:45pm

Well, I’m going to begin cleaning up all of the snapshots in the migrated system now. I expect I have enough information and understanding of how to do this, although it’s possible I’ll be back.

Thomas, thank you very much.

tomp · January 17, 2023, 3:50pm

Thanks, glad we got it working.

I’ve recreated the issue with the storage_volumes/storage_volumes_snapshots discrepancy btw and logged it here:

github.com/lxc/lxd

lxd.migrate on LXD 4.0 LTS from LXD 3.0 LTS doesn't correctly convert volume snapshot DB records

opened 03:49PM - 17 Jan 23 UTC

tomponline

Bug

Create Bionic VM with LXD 3.0 LTS and create some instance snapshots: ``` lxc …launch images:ubuntu/bionic vtest --vm lxc shell vtest apt install lxd lxd init --auto lxc init images:alpine/3.17 c1 lxc snapshot c1 lxc snapshot c1 lxd sql global 'select * from storage_volumes' +----+----------+-----------------+---------+------+-------------+ | id | name | storage_pool_id | node_id | type | description | +----+----------+-----------------+---------+------+-------------+ | 1 | c1 | 1 | 1 | 0 | | | 2 | c1/snap0 | 1 | 1 | 0 | | | 3 | c1/snap1 | 1 | 1 | 0 | | +----+----------+-----------------+---------+------+-------------+ ``` Upgrade to LXD 4.0 LTS: ``` apt install snapd -y; snap install lxd --channel=4.0/stable lxd.migrate => Connecting to source server => Connecting to destination server => Running sanity checks === Source server LXD version: 3.0.3 LXD PID: 2553 Resources: Containers: 1 Images: 1 Networks: 1 Storage pools: 1 === Destination server LXD version: 4.0.9 LXD PID: 2893 Resources: Containers: 0 Images: 0 Networks: 0 Storage pools: 0 The migration process will shut down all your containers then move your data to the destination LXD. Once the data is moved, the destination LXD will start and apply any needed updates. And finally your containers will be brought back to their previous state, completing the migration. Are you ready to proceed (yes/no) [default=no]? yes => Shutting down the source LXD => Stopping the source LXD units => Stopping the destination LXD unit => Unmounting source LXD paths => Unmounting destination LXD paths => Wiping destination LXD clean => Backing up the database => Moving the data => Updating the storage backends => Starting the destination LXD => Waiting for LXD to come online === Destination server LXD version: 4.0.9 LXD PID: 3258 Resources: Containers: 1 Images: 1 Networks: 1 Storage pools: 1 The migration is now complete and your containers should be back online. Do you want to uninstall the old LXD (yes/no) [default=yes]? yes All done. You may need to close your current shell and open a new one to have the "lxc" command work. To migrate your existing client configuration, move ~/.config/lxc to ~/snap/lxd/common/config ``` Check if the snapshots in `storage_volumes` have been moved to `storage_volumes_snapshots`: ``` lxc ls +------+---------+------+------+-----------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +------+---------+------+------+-----------+-----------+ | c1 | STOPPED | | | CONTAINER | 2 | +------+---------+------+------+-----------+-----------+ lxd sql global 'select * from storage_volumes' +----+----------+-----------------+---------+------+-------------+------------+ | id | name | storage_pool_id | node_id | type | description | project_id | +----+----------+-----------------+---------+------+-------------+------------+ | 1 | c1 | 1 | 1 | 0 | | 1 | | 2 | c1/snap0 | 1 | 1 | 0 | | 1 | | 3 | c1/snap1 | 1 | 1 | 0 | | 1 | +----+----------+-----------------+---------+------+-------------+------------+ lxd sql global 'select * from storage_volumes_snapshots' +----+-------------------+------+-------------+-------------+ | id | storage_volume_id | name | description | expiry_date | +----+-------------------+------+-------------+-------------+ +----+-------------------+------+-------------+-------------+ ``` Nope. Furthermore if you then upgrade to LXD 5.0 LTS, it notices this discrepancy and adds the missing entries (as best it can) to `storage_volumes_snapshots`, but the old records in `storage_volumes` still remain.

As for why some of the instance snapshot records were missing (which is what caused the Instance snapshot record count doesn't match instance snapshot volume record count error) I don’t know how it got like that.

johnmaher · January 17, 2023, 3:57pm

All I can say is that I only used lxc snapshot [container] [snapshot] and lxc delete [container]/[snapshot] for manipulating snapshots. I do have a nightly script that uses these commands, so perhaps I failed to adjust it properly after the migration from 3 to 4 to 5. If I learn more, I’ll post it here.

johnmaher · January 19, 2023, 1:16pm

@tomp , I seem to have a lingering problem with one container. I first tried to copy a container (called redirected-sites) from one LXD host to another (after first deleting the container from the receiving host) and got this:

$ lxc delete redirected-sites
$ lxc copy remotehost:redirected-sites/now redirected-sites
Error: Failed instance creation: Error transferring instance data: Failed creating instance on target: Volume already exists on storage but not in database

This worked for the other containers in the past, so I checked using the same commands I used when we were figuring out the previous problem, and I got this:

$ mycontainer=redirected-sites

$ lxc info $mycontainer
Error: Instance not found

$ lxd sql global 'select * from storage_volumes' | grep "$mycontainer"

nothing

$ lxd sql global 'select storage_volumes.name, storage_volumes_snapshots.* from storage_volumes_snapshots join storage_volumes on storage_volumes_snapshots.storage_volume_id = storage_volumes.id' | grep "$mycontainer"

nothing

So, yes, there are no records of this instance in the databases, but there is a directory for that instance in /var/snap/lxd/common/lxd/storage-pools/default/containers/.

Can I simply delete that directory for that container?

Thanks.

tomp · January 19, 2023, 1:41pm

Yes that should be fine.

johnmaher · January 19, 2023, 1:56pm

Hmmm. How can I delete it?

# rm -rf redirected-sites/
rm: cannot remove 'redirected-sites/': Operation not permitted

I changed owner to root:root and chmod to 700. Won’t let go.

tomp · January 19, 2023, 1:59pm

Oh its BTRFS.

Try:

sudo btrfs subvolume delete <path>

johnmaher · January 19, 2023, 2:01pm

Wow! I have a lot to learn. Thanks, Thomas.