Yeah, ok, then maybe a --repair
-flag would be the thing?
If this happens in a production environment they is maybe to much sweat around to manually repair the db entries, if you never did this before.
At least the error message should have more details, where to look and take action to repair this?
BTW I have no clue how to repair this. I just reverted to 5.1, so if I update to 5.2 again I have the problem again. And also I have no clue why instance and volume count drift apart.
How old are the problem containers? It maybe a fault crept in a while back. This is all conjecture currently as I’m not at my pc.
This extra new consistency check when generating the start time backup.yaml file is the cause of the error
Its new in lxd 5.2.
But the actual record mismatch is likely to have existed more recently, but I’ll double check the record cleanup logic on snapshot failure you described above.
Doing lxc delete instance/snapshot
for the snapshots with missing volume db records should fix it and bring it inline. If that is acceptable to lose those snapshots.
Don’t just delete the problem db records otherwise you’ll leave the actual snapshots orphaned on disk.
Alternatively we will have to craft a custom insert statement to restore the missing volume db record.
(coming from Can't start containers - Error: Instance snapshot record count doesn't match instance snapshot volume record count)
Going by creation_date, in our case all recent(=>2021-09-09) containers startup fine, but the old ones(=<2021-08-04) all have issues.
Where the old containers are supposed to have 8 backups, on one container I just checked we’ve 23, going all the way back to 2021-11. Other containers are going back to 2021-08 etc.
lxc deleting snapshots does not work with the same exact error.
# lxc delete cont/autosnapshot-20220225-100052
Error: Instance snapshot record count doesn't match instance snapshot volume record count
The actual original error looks like a strange path for your instance
/var/snap/lxd/common/shmounts/storage-pools/default/containers/container-name
The shmounts part is strange and looks out of place.
Can you show ‘lxc storage show default’ please.
Hrm I’ll probably have to put a lxd startup db patch in to create db record entries for the missing snapshot volume records. Or something in that backup generator as really don’t want to be dealing with an inconsistent database or backup file (kind of defeats the purpose of it otherwise).
It suggests at some point the snapshot operation was not creating storage volume db records in certain scenarios.
Im not following what you’re meaning is here?
# btrfs subvolume list /var/snap/lxd/common/lxd | grep /container/
Shows many more snapshots than lxc info
So this issue isn’t related to any orphaned on disk snapshots, its only concerned with the difference between instances_snapshots and storage_volumes tables
Indeed, got it working even on 4.2, thanks a lot everyone!
lxc storage show default
Is quite long. FWIW, the ones I did check that have this issue the counts were off and they are fairly old containers I’ve had for a while that are set to daily snapshot with 30 day retention.
Yes let’s see it please
Much obliged thank you
This worked, but it shut down all the containers and started them back up in the process. It also took like 10 minutes or more.
So definitely not something you want to do if you can’t have any downtime.
Sadly not sure it would be of any value now that I downgraded to 5.1. Let me know if it’s still useful.
I just want to see what the source
property is as it shouldnt be trying to use /var/snap/lxd/common/shmounts
for your storage pool mount, which could indicate a problem with the snap package.
osgeo7 is the name of the zpool
zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
osgeo7 14.5T 2.68T 11.8T - - 27% 18% 1.00x ONLINE -
lxc storage show default
config:
source: osgeo7
volatile.initial_source: osgeo7
zfs.pool_name: osgeo7
description: ""
name: default
driver: zfs
used_by: