LXD 5.4 cluster member after reboot don't start containers

Hi! I need some help.
After reboot (host) a LXD prod member cluster I have a lot of issues:

Aug 01 09:19:54 host lxd.daemon[368675]: time="2022-08-01T09:19:54Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_device_config\" table: sql: >
Aug 01 09:19:54 host lxd.daemon[368675]: time="2022-08-01T09:19:54Z" level=warning msg="Failed auto start instance attempt" attempt=2 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:20:09 host lxd.daemon[368675]: time="2022-08-01T09:20:09Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_config\" table: sql: Rows ar>
Aug 01 09:20:09 host lxd.daemon[368675]: time="2022-08-01T09:20:09Z" level=warning msg="Failed auto start instance attempt" attempt=3 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:20:10 host lxd.daemon[368675]: time="2022-08-01T09:20:10Z" level=error msg="Failed to auto start instance" err="Failed to get snapshots: Failed to fetch from \"instance_snapshot_config\" tabl>
Aug 01 09:20:20 host lxd.daemon[368675]: time="2022-08-01T09:20:20Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_device_config\" table: sql: >
Aug 01 09:20:20 host lxd.daemon[368675]: time="2022-08-01T09:20:20Z" level=warning msg="Failed auto start instance attempt" attempt=1 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:20:35 host lxd.daemon[368675]: time="2022-08-01T09:20:35Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instances_profiles\" table: sql: transaction h>
Aug 01 09:20:35 host lxd.daemon[368675]: time="2022-08-01T09:20:35Z" level=warning msg="Failed auto start instance attempt" attempt=2 err="Failed to get snapshots: Failed to fetch from \"instances_prof>
Aug 01 09:20:50 host lxd.daemon[368675]: time="2022-08-01T09:20:50Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_device_config\" table: sql: >
Aug 01 09:20:50 host lxd.daemon[368675]: time="2022-08-01T09:20:50Z" level=warning msg="Failed auto start instance attempt" attempt=3 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:20:50 host lxd.daemon[368675]: time="2022-08-01T09:20:50Z" level=error msg="Failed to auto start instance" err="Failed to get snapshots: Failed to fetch from \"instance_snapshot_device_config>
Aug 01 09:21:00 host lxd.daemon[368675]: time="2022-08-01T09:21:00Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instances_profiles\" table: sql: transaction h>
Aug 01 09:21:00 host lxd.daemon[368675]: time="2022-08-01T09:21:00Z" level=warning msg="Failed auto start instance attempt" attempt=1 err="Failed to get snapshots: Failed to fetch from \"instances_prof>
Aug 01 09:21:15 host lxd.daemon[368675]: time="2022-08-01T09:21:15Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_config\" table: sql: Rows ar>
Aug 01 09:21:15 host lxd.daemon[368675]: time="2022-08-01T09:21:15Z" level=warning msg="Failed auto start instance attempt" attempt=2 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:21:31 host lxd.daemon[368675]: time="2022-08-01T09:21:31Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_device_config\" table: sql: >
Aug 01 09:21:31 host lxd.daemon[368675]: time="2022-08-01T09:21:31Z" level=warning msg="Failed auto start instance attempt" attempt=3 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:21:31 host lxd.daemon[368675]: time="2022-08-01T09:21:31Z" level=error msg="Failed to auto start instance" err="Failed to get snapshots: Failed to fetch from \"instance_snapshot_device_config>
Aug 01 09:21:41 host lxd.daemon[368675]: time="2022-08-01T09:21:41Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_config\" table: sql: Rows ar>
Aug 01 09:21:41 host lxd.daemon[368675]: time="2022-08-01T09:21:41Z" level=warning msg="Failed auto start instance attempt" attempt=1 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:21:56 host lxd.daemon[368675]: time="2022-08-01T09:21:56Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instances_profiles\" table: sql: transaction h>
Aug 01 09:21:56 host lxd.daemon[368675]: time="2022-08-01T09:21:56Z" level=warning msg="Failed auto start instance attempt" attempt=2 err="Failed to get snapshots: Failed to fetch from \"instances_prof>
Aug 01 09:22:12 host lxd.daemon[368675]: time="2022-08-01T09:22:12Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_config\" table: sql: Rows ar>
Aug 01 09:22:12 host lxd.daemon[368675]: time="2022-08-01T09:22:12Z" level=warning msg="Failed auto start instance attempt" attempt=3 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:22:12 host lxd.daemon[368675]: time="2022-08-01T09:22:12Z" level=error msg="Failed to auto start instance" err="Failed to get snapshots: Failed to fetch from \"instance_snapshot_config\" tabl>
Aug 01 09:22:22 host lxd.daemon[368675]: time="2022-08-01T09:22:22Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_config\" table: sql: Rows ar>
Aug 01 09:22:22 host lxd.daemon[368675]: time="2022-08-01T09:22:22Z" level=warning msg="Failed auto start instance attempt" attempt=1 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:22:37 host lxd.daemon[368675]: time="2022-08-01T09:22:37Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_device_config\" table: sql: >
Aug 01 09:22:37 host lxd.daemon[368675]: time="2022-08-01T09:22:37Z" level=warning msg="Failed auto start instance attempt" attempt=2 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:22:52 host lxd.daemon[368675]: time="2022-08-01T09:22:52Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_config\" table: sql: Rows ar>
Aug 01 09:22:52 host lxd.daemon[368675]: time="2022-08-01T09:22:52Z" level=warning msg="Failed auto start instance attempt" attempt=3 err="Failed to get snapshots: Failed to fetch from \"instance_snaps>
Aug 01 09:22:52 host lxd.daemon[368675]: time="2022-08-01T09:22:52Z" level=error msg="Failed to auto start instance" err="Failed to get snapshots: Failed to fetch from \"instance_snapshot_config\" tabl>
Aug 01 09:22:53 host lxd.daemon[368513]: => LXD is ready

Can you provide the untruncated logs please, I can’t see the full error?

But from what I can see it sounds similar to:

Do your containers have lots of snapshots?

Mostly 3 weeks of snapshots, every 6 hours. 84

Some container after delete last 2 snapshots is running again, but others is even not possible to delete.
Patience job…

lxc delete container/snapshot-20220722_00-17-32
Error: Failed to begin transaction: context deadline exceeded

What do you see in the logs when you try and start a container?
sudo tail -f /var/snap/lxd/common/lxd/logs/lxd.log

You will probably need to wait for the fix or reduce the amount of snapshots you have manually see Database error: "sql: transaction has already been committed or rolled back" - #53 by tomp

What kind of fix will it be?
Is going to have limit to snapshots?
How many snapshots is recommended?

I’m going on vacation and will be offline for the next few weeks to follow up on the solution.

For now a little bit of:

time="2022-08-01T14:05:32Z" level=warning msg="Failed to rollback transaction after error (Unable to prepare statement with error: sql: transaction has already been committed or rolled back): sql: transaction has already been committed or rolled back"
time="2022-08-01T14:05:42Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_devices\" table: sql: Rows are closed): sql: transaction has already been committed or rolled back"
time="2022-08-01T14:05:42Z" level=warning msg="Transaction timed out. Retrying once" err="Failed to begin transaction: context deadline exceeded" member=2
time="2022-08-01T14:05:52Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instances_profiles\" table: sql: transaction has already been committed or rolled back): sql: transaction has already been committed or rolled back"
time="2022-08-01T14:05:52Z" level=warning msg="Transaction timed out. Retrying once" err="Failed to begin transaction: context deadline exceeded" member=2
time="2022-08-01T14:06:03Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instance_snapshot_devices\" table: sql: Rows are closed): sql: transaction has already been committed or rolled back"
time="2022-08-01T14:06:03Z" level=warning msg="Transaction timed out. Retrying once" err="Failed to begin transaction: context deadline exceeded" member=2
time="2022-08-01T14:06:13Z" level=warning msg="Transaction timed out. Retrying once" err="Failed to begin transaction: context deadline exceeded" member=2
time="2022-08-01T14:06:13Z" level=warning msg="Transaction timed out. Retrying once" err="Failed to begin transaction: context deadline exceeded" member=2
time="2022-08-01T14:06:13Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instances_profiles\" table: sql: transaction has already been committed or rolled back): sql: transaction has already been committed or rolled back"
time="2022-08-01T14:06:23Z" level=warning msg="Transaction timed out. Retrying once" err="Failed to begin transaction: context deadline exceeded" member=2
time="2022-08-01T14:06:23Z" level=warning msg="Failed to rollback transaction after error (Failed to fetch from \"instances_profiles\" table: sql: transaction has already been committed or rolled back): sql: transaction has already been committed or rolled back"

Yeah thats the same issue.

See Database error: "sql: transaction has already been committed or rolled back" - #52 by tomp for discussion around the fixes. If you can wait then there will be a software fix to make the DB queries more efficient so they don’t time out when instances have lots of snapshots. If not, and you can afford to lose some snapshots, then another approach is to delete some of them manually as per the steps on that thread.

The fix has already been merged and @stgraber may do a cherry-pick into the latest/stable snap channel soon.

Hi! Here Database error: "sql: transaction has already been committed or rolled back" - #52 by tomp provided solution, search and delete snapshots from database, but how can we delete snapshots of a specific containers?

I’ve updated the queries to aid deleting snapshots of specific instances:

Hi! It didn’t working:
lxd sql global ‘delete from instance_snapshots where id = 5017’
Error: Failed to exec query: no such table: instance_snapshots

Its instances_snapshots, I’ve corrected it now.

Thanks!