Lxc copy --refresh failed, leaving database in unstable state

Hi,
After the recent 5.1 fix on lxc copy --refresh with btrfs filesystem, my daily remote copy of containers have went well. Except for a few days for one single container.

I don’t know what went wrong, but now for this single container, when I do a lxc copy aklive1 remoteserver: --stateless --refresh, I get this error message:
Error: Failed instance creation: Error transferring instance data: Error inserting volume "aklive1/auto-20220509-220501" for project "default" in pool "device1To" of type "containers" into database "Insert volume snapshot: UNIQUE constraint failed: storage_volumes_snapshots.storage_volume_id, storage_volumes_snapshots.name"

Indeed, the 9th of may snapshot is not on the filesystem. ls containers-snapshots/aklive1/ :
auto-20220416-220502/ auto-20220423-220502/ auto-20220430-220503/ auto-20220505-220502/ auto-20220506-220502/ auto-20220507-220502/ auto-20220508-220502/

But in the LXD database, the entry is there:

lxd sql global “SELECT * FROM storage_volumes where name=‘aklive1’;”
±-----±--------±----------------±--------±-----±------------±-----------±-------------+
| id | name | storage_pool_id | node_id | type | description | project_id | content_type |
±-----±--------±----------------±--------±-----±------------±-----------±-------------+
| 1152 | aklive1 | 3 | 1 | 0 | | 1 | 0 |
±-----±--------±----------------±--------±-----±------------±-----------±-------------+

lxd sql global “SELECT * FROM storage_volumes_snapshots where storage_volume_id=1152;”
±-----±------------------±---------------------±------------±---------------------+
| id | storage_volume_id | name | description | expiry_date |
±-----±------------------±---------------------±------------±---------------------+
| 1154 | 1152 | auto-20220416-220502 | | 0001-01-01T00:00:00Z |
| 1155 | 1152 | auto-20220423-220502 | | 0001-01-01T00:00:00Z |
| 1159 | 1152 | auto-20220430-220503 | | 0001-01-01T00:00:00Z |
| 1190 | 1152 | auto-20220505-220502 | | 0001-01-01T00:00:00Z |
| 1196 | 1152 | auto-20220506-220502 | | 0001-01-01T00:00:00Z |
| 1206 | 1152 | auto-20220507-220502 | | 0001-01-01T00:00:00Z |
| 1210 | 1152 | auto-20220508-220502 | | 0001-01-01T00:00:00Z |
| 1213 | 1152 | auto-20220509-220501 | | 0001-01-01T00:00:00Z |
±-----±------------------±---------------------±------------±---------------------+

  1. How is it possible? Another container (aktest1) has exactly the same settings, same storage location, etc. and everything is fine for it.
  2. What should I do? Simply remove the entry in the storage_volumes_snapshots table, like
    lxd sql global "DELETE from storage_volumes_snapshots where id=1213;" ?

Thanks for the help!

Please can you advise @monstermunchkin

It may be that this is being affected by a fix you added after the 5.1 release.

For info, I solved the issue by manually cleaning the SQL database. That doesn’t explain why and how the issue happened yet.

1 Like