Lxc snapshot and lxc start Error: Instance snapshot record count doesn't match instance snapshot volume record count

I was running

lxc storage show default --debug

which for the nginx MISSING section shows below and matches what I saw in the db query as missing

            "/1.0/instances/nginx%!F(MISSING)after-adding-rewrite-web-config",
            "/1.0/instances/nginx%!F(MISSING)after-changing-ports",
            "/1.0/instances/nginx%!F(MISSING)after-ssl-bottle",
            "/1.0/instances/nginx%!F(MISSING)after-wiki-id-fdo-move",
            "/1.0/instances/nginx%!F(MISSING)before-proxypass-changes",
            "/1.0/instances/nginx%!F(MISSING)nginx-20181222",
            "/1.0/instances/nginx%!F(MISSING)nxginx-20181225",
            "/1.0/instances/nginx%!F(MISSING)snap0",
            "/1.0/instances/nginx%!F(MISSING)snap14",
            "/1.0/instances/nginx%!F(MISSING)snap15",
            "/1.0/instances/nginx%!F(MISSING)snap17",
            "/1.0/instances/nginx%!F(MISSING)snap2",
            "/1.0/instances/nginx%!F(MISSING)snap3",
            "/1.0/instances/nginx%!F(MISSING)snap5",

I did find another container with a MISSING beside it and was able to restore one of those missing snapshots and start it up.

I also have the same error with all my containers which have snapshots after a snap auto upgrade to lxd 5.2.

Some more info in case it helps track down whatā€™s going on:

edit:
Linux Mint 20
kernel 5.4.0-113-generic
zfs-0.8.3-1ubuntu12.14
zfs-kmod-0.8.3-1ubuntu12.13

$ lxc list
+-----------------------+---------+------+------+-----------+-----------+
|         NAME          |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+-----------------------+---------+------+------+-----------+-----------+
| streamline-ld         | STOPPED |      |      | CONTAINER | 3         |
+-----------------------+---------+------+------+-----------+-----------+
| streamline-schools-db | STOPPED |      |      | CONTAINER | 4         |
+-----------------------+---------+------+------+-----------+-----------+
| Sphere-clients        | STOPPED |      |      | CONTAINER | 0         |
+-----------------------+---------+------+------+-----------+-----------+
| SphereCRM             | STOPPED |      |      | CONTAINER | 3         |
+-----------------------+---------+------+------+-----------+-----------+
$ lxc start streamline-ld 
Error: Instance snapshot record count doesn't match instance snapshot volume record count
Try `lxc info --show-log streamline-ld` for more info
$ lxc info --show-log streamline-ld
Name: streamline-ld
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2019/10/01 13:29 BST
Last Used: 2020/07/25 15:34 BST

Snapshots:
+-------------------------------+----------------------+------------+----------+
|             NAME              |       TAKEN AT       | EXPIRES AT | STATEFUL |
+-------------------------------+----------------------+------------+----------+
| 2019.10.01-working            | 2019/10/01 15:36 BST |            | NO       |
+-------------------------------+----------------------+------------+----------+
| 2020.07.25-before-apt-upgrade | 2020/07/25 15:36 BST |            | NO       |
+-------------------------------+----------------------+------------+----------+
| before-php-upgrade            | 2019/10/01 14:36 BST |            | NO       |
+-------------------------------+----------------------+------------+----------+

Log:

$ lxc storage list
+---------+--------+-----------+-------------+---------+---------+
|  NAME   | DRIVER |  SOURCE   | DESCRIPTION | USED BY |  STATE  |
+---------+--------+-----------+-------------+---------+---------+
| default | zfs    | rpool/lxd |             | 12      | CREATED |
+---------+--------+-----------+-------------+---------+---------+
$ lxc storage show default
config:
  source: rpool/lxd
  volatile.initial_source: rpool/lxd
  zfs.pool_name: rpool/lxd
description: ""
name: default
driver: zfs
used_by:
- /1.0/instances/streamline-ld
- /1.0/instances/streamline-ld%252F2019.10.01-working
- /1.0/instances/streamline-ld%252Fbefore-php-upgrade
- /1.0/instances/streamline-schools-db
- /1.0/instances/streamline-schools-db%252F2019.10.01-before-apt-upgrade
- /1.0/instances/streamline-schools-db%252F2019.10.01-uptodate
- /1.0/instances/streamline-schools-db%252Fpost_install
- /1.0/instances/Sphere-clients
- /1.0/instances/SphereCRM
- /1.0/instances/SphereCRM%252Finstalled_mariadb
- /1.0/instances/SphereCRM%252Fmapped-bungle
- /1.0/profiles/default
status: Created
locations:
- none
$ lxd sql global "SELECT vs.* FROM instances AS v INNER JOIN instances_snapshots AS vs ON v.id = vs.instance_id WHERE v.name = 'streamline-ld'"
+----+-------------+-------------------------------+--------------------------------+----------+-------------+----------------------+
| id | instance_id |             name              |         creation_date          | stateful | description |     expiry_date      |
+----+-------------+-------------------------------+--------------------------------+----------+-------------+----------------------+
| 7  | 11          | 2019.10.01-working            | 2019-10-01T14:36:05.445284215Z | 0        |             | 0001-01-01T00:00:00Z |
| 9  | 11          | 2020.07.25-before-apt-upgrade | 2020-07-25T14:36:15.707635639Z | 0        |             | 0001-01-01T00:00:00Z |
| 6  | 11          | before-php-upgrade            | 2019-10-01T13:36:29.683017681Z | 0        |             | 0001-01-01T00:00:00Z |
+----+-------------+-------------------------------+--------------------------------+----------+-------------+----------------------+
$ lxd sql global "SELECT vs.* FROM storage_volumes AS v INNER JOIN storage_volumes_snapshots AS vs ON v.id = vs.storage_volume_id WHERE v.name = 'streamline-ld'"
+----+-------------------+-------------------------------+-------------+---------------------------+
| id | storage_volume_id |             name              | description |        expiry_date        |
+----+-------------------+-------------------------------+-------------+---------------------------+
| 20 | 16                | 2020.07.25-before-apt-upgrade |             | 0000-12-31T23:58:45-00:01 |
+----+-------------------+-------------------------------+-------------+---------------------------+
$ lxc storage volume ls default
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
|         TYPE         |                        NAME                         | DESCRIPTION | CONTENT-TYPE | USED BY |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container            | streamline-ld                                       |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container (snapshot) | streamline-ld/2019.10.01-working                    |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container (snapshot) | streamline-ld/2020.07.25-before-apt-upgrade         |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container (snapshot) | streamline-ld/before-php-upgrade                    |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container            | streamline-schools-db                               |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container (snapshot) | streamline-schools-db/2019.10.01-before-apt-upgrade |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container (snapshot) | streamline-schools-db/2019.10.01-uptodate           |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container (snapshot) | streamline-schools-db/2020.07.25-before-apt-upgrade |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container (snapshot) | streamline-schools-db/post_install                  |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container            | Sphere-clients                                      |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container            | SphereCRM                                           |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container (snapshot) | SphereCRM/2020.07.25-before-apt-upgrade             |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container (snapshot) | SphereCRM/installed_mariadb                         |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
| container (snapshot) | SphereCRM/mapped-bungle                             |             | filesystem   | 1       |
+----------------------+-----------------------------------------------------+-------------+--------------+---------+
$ zfs list -r -t snapshot rpool/lxd/containers/streamline-ld 
NAME                                                                        USED  AVAIL     REFER  MOUNTPOINT
rpool/lxd/containers/streamline-ld@snapshot-before-php-upgrade             57.7M      -      799M  -
rpool/lxd/containers/streamline-ld@snapshot-2019.10.01-working             86.5M      -      813M  -
rpool/lxd/containers/streamline-ld@snapshot-2020.07.25-before-apt-upgrade  63.3M      -      785M  -
rpool/lxd/containers/streamline-ld@monthly-2021.07.01                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2021.08.06                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2021.09.01                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2021.10.01                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2021.11.01                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2021.12.01                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2022.01.01                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2022.02.01                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2022.03.01                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2022.04.01                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@weekly-2022.04.25                          0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2022.05.01                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@weekly-2022.05.02                          0B      -      794M  -
rpool/lxd/containers/streamline-ld@weekly-2022.05.09                          0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.16                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@weekly-2022.05.16                          0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.17                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.18                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.19                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.20                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.21                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.22                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.23                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@weekly-2022.05.23                          0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.24                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.25                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.26                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.28                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@weekly-2022.05.30                          0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.05.31                           0B      -      794M  -
rpool/lxd/containers/streamline-ld@monthly-2022.06.03                         0B      -      794M  -
rpool/lxd/containers/streamline-ld@daily-2022.06.03                           0B      -      794M  -

I just ran into the problem again on another lxd host machine. This one the snapshots are relatively new:

I just ran into the same issue with another lxd host I have.

I confirmed that after doing

lxc rm container-name/bad-snapshot

I was then able to start up the container. So I guess I was mistaken before about having tried that.

FWIW the problem snapshot was also in 2019 (the first snapshot I had created for this container) and all the good ones were newer snapshots were all with expiry dates and in 2022

Yes it seems there was a problem with snapshots created in 2019 missing storage volume records (at least in some unknown scenario). If you only have one problem snapshot it can be removed using ā€œlxc deleteā€ but if you have more than one then the snapshot removal will be blocked by updating the backup yaml when removing the first one

Iā€™m going to be looking into this on Monday

Okay maybe that is what I ran into with the other server since the problem containers had more than one bad snapshot on that server.

I ran into this today as well. I deleted old snapshots from 2019 and all the containers started back up.

Perhaps in the short term the error can be downgraded to a warning thatā€™s logged, with a note that it will become an error in a future release?

Iā€™m working on a patch that will create the missing snapshot volume DB records. As the problem is that other functionality will also be already broken with these missing records, people just havenā€™t come across it yet.

This patch will restore the records and includes tests to check its behavior:

If you donā€™t need the problem snapshot(s) then doing lxc delete <instance>/<snapshot> will resolve the issue.
Note: If you have multiple problem snapshots the lxc delete will also generate the same error, but will still delete the problem snapshot, meaning if you keep going, once all problem snapshots have been removed the instance can be started.

Have you tried running lxc delete for each of the problem snapshots, as although you will get the error, looking at the code, it should still delete the snapshot, and once you clear all the problem ones out the instance should start again.

I see this issue is marked as fixed. I still seem to be having it even after the upgrade to 5.3.

Anything I need to do to fix?

e.g. doing a snapshot of one problem container still results in

Error: Instance snapshot record count doesn't match instance snapshot volume record count

I checked version and

lxd --version

shows:

5.3

Most of the containers are fine, but I seem to have some phantom snapshots. Example case:

lxc snapshot collabora

Gives me error:

Error: Create instance snapshot: Error inserting volume "collabora/snap3" for project "default" in pool "default" of type "containers" into database "Insert volume snapshot: UNIQUE constraint failed: storage_volumes_snapshots.storage_volume_id, storage_volumes_snapshots.name"

If I do

lxc info collabora

I get this, no snap3 listed

+----------------------------+----------------------+------------+----------+
|            NAME            |       TAKEN AT       | EXPIRES AT | STATEFUL |
+----------------------------+----------------------+------------+----------+
| after-upgrade-ubuntu-20.04 | 2022/04/30 18:02 UTC |            | NO       |
+----------------------------+----------------------+------------+----------+
| snap1                      | 2022/01/29 00:17 UTC |            | NO       |
+----------------------------+----------------------+------------+----------+
| snap2                      | 2022/04/30 17:20 UTC |            | NO       |

If I do:

zfs list -r -t snapshot osgeo7/containers/collabora 

again the mysterious snap3 does not show:

NAME                                                              USED  AVAIL     REFER  MOUNTPOINT
osgeo7/containers/collabora@snapshot-snap1                        327M      -     1.19G  -
osgeo7/containers/collabora@snapshot-snap2                        451M      -     1.26G  -
osgeo7/containers/collabora@snapshot-after-upgrade-ubuntu-20.04   117M      -     1.44G  -

Itā€™s only when I do this, that I see the phantom snapshot:

lxc storage show default | grep collabora
- /1.0/instances/collabora
- /1.0/instances/collabora%252Fafter-2019-10-20-system-updates
- /1.0/instances/collabora%252Fafter-nextcloud-config
- /1.0/instances/collabora%252Fbefore-2019-11-14-updates
- /1.0/instances/collabora%252Fsnap0
- /1.0/instances/collabora%252Fsnap1
- /1.0/instances/collabora%252Fsnap2
- /1.0/instances/collabora%252Fsnap3
- /1.0/instances/collabora%252Fsnap4

Whatā€™s also odd is the after-upgrade-ubuntu-20.04 is not in the above list and yet I see it in lxc info collabora.

The only thing I can think of special about this container, is I think I had created it as a copy of another container a very long time ago. So wondering if maybe something went wrong in that copy process.

I have another container with a similar issue, also I think that had undergone a copy a long time ago. Other containers seem fine.

OK so I can see the issue and its not quite the same as what was fixed in LXD 5.3.

An instance snapshot is made up of 3 things:

  1. An instance snapshot DB record (these are show in the output of lxc info <instance>).
  2. A storage volume snapshot DB record (these are shown in the output of lxc storage volume show <pool>).
  3. The actual snapshot volume on disk.

In this case the state of 3. (zfs list) is irrelevant as the error is only complaining about inconsistencies between 1 and 2.

In the scenario that was fixed in LXD 5.3, if there is an instance snapshot DB record (1.) but no associated storage volume snapshot DB record (2.) then LXD will now auto-create the missing storage volume snapshot DB record (as in most cases these records donā€™t contain any custom config and so can be re-crearted effectively).

However in your case you have 3 instance snapshot DB records:

after-upgrade-ubuntu-20.04
snap1
snap2

But you have additional storage volume snapshot DB records:

after-2019-10-20-system-updates
after-nextcloud-config
before-2019-11-14-updates
snap0
snap3
snap4

It is this inconsistency that LXD is now validating before allowing a fresh config backup to be made.

Now, we canā€™t automatically create an associated instance snapshot DB record for these excess storage volume snapshot DB records, nor would we want to delete them (or any associated snapshots on disk) for fear of causing data loss.

So in this case it will need manual intervention to delete these records.

You can use this command to get a list of all storage volume snapshot records:

lxd sql global 'select storage_volumes.name, storage_volumes_snapshots.* from storage_volumes_snapshots join storage_volumes on storage_volumes_snapshots.storage_volume_id = storage_volumes.id'

Then you can run:

lxd sql global 'delete from storage_volumes_snapshots where id = <ID of snapshot to delete>'
2 Likes

Thanks that did the trick. One of the containers I had had like 20 of these, so was hard to scroll thru the whole list and figure out what should be deleted.

I revised to use a query like my earlier:

#back up database in case screw up
sudo cp /var/snap/lxd/common/lxd/database/global/db.bin lxd-global-220701

# find volumes with no corresponding instance snapshot
lxd sql global "SELECT v.id, v.name FROM (SELECT vs.* FROM storage_volumes AS v INNER JOIN storage_volumes_snapshots AS vs ON v.id = vs.storage_volume_id WHERE v.name = 'collabora') AS v LEFT JOIN 
(SELECT vs.* FROM instances AS v INNER JOIN instances_snapshots AS vs ON v.id = vs.instance_id WHERE v.name = 'collabora') AS i ON i.name = v.name WHERE  i.name IS NULL;"

This output

 +-------+-------+
 |  id   | name  |
 +-------+-------+
 | 9334  | snap0 |
 | 21218 | snap3 |
 | 21866 | snap4 |
 +-------+-------+

If you have a lot and confirmed they are the right ones, then use group_concat

# concatenate
lxd sql global "SELECT group_concat(v.id) FROM (SELECT vs.* FROM storage_volumes AS v INNER JOIN storage_volumes_snapshots AS vs ON v.id = vs.storage_volume_id WHERE v.name = 'collabora') AS v LEFT JOIN 
(SELECT vs.* FROM instances AS v INNER JOIN instances_snapshots AS vs ON v.id = vs.instance_id WHERE v.name = 'collabora') AS i ON i.name = v.name WHERE  i.name IS NULL;"

The above outputs:

9334, 21218, 21866
-- delete volumes listed above
lxd sql global "DELETE FROM storage_volumes_snapshots WHERE id IN(9334,21218,21866)"

Once I did that, then

lxc snapshot collabora

worked.

2 Likes