STATE=ERROR, but container appears to be running

List --fast show containers running:

[22:31:48] john@moneta:~$ lxc list --fast
+---------+---------+--------------+----------------------+----------+-----------+
|  NAME   |  STATE  | ARCHITECTURE |      CREATED AT      | PROFILES |   TYPE    |
+---------+---------+--------------+----------------------+----------+-----------+
| flash   | RUNNING | x86_64       | 2021/02/12 02:32 UTC | base     | CONTAINER |
|         |         |              |                      | data     |           |
+---------+---------+--------------+----------------------+----------+-----------+
| forward | RUNNING | x86_64       | 2021/02/12 02:35 UTC | base     | CONTAINER |
|         |         |              |                      | data     |           |
+---------+---------+--------------+----------------------+----------+-----------+

BUT lxc list -c nsD (or pretty much any columns that aren’t in --fast):
[22:25:18] john@moneta:~$ lxc list -c nsD +---------+---------+------------+ | NAME | STATE | DISK USAGE | +---------+---------+------------+ | flash | ERROR | | +---------+---------+------------+ | forward | ERROR | | +---------+---------+------------+ | gold | RUNNING | 5.01GiB | +---------+---------+------------+

The list -c nsD takes a long time, but I can lxc shell flash and access the container.

Any clue as to what is messed up?

Hi,
You can check information about the log with the following command.
lxc info <container_name> --show-log
You can check the error as well, lxc monitor --type=logging --pretty and restart the container which has error state and observe the log.
Regards.

If you’re running LXD 5.3 you’re likely being affected by this Database error: "sql: transaction has already been committed or rolled back"

There is a fix for the instance list speed regression merged and it should be deployed to the latest/stable channel soon.

Yes – that error is all over the logs. Thank you – I will await the update/fix

I have updated to 5.4 and stopped all the containers and rebooted the server, but the long delay to “lxc list” is still there. What logs / info ought I provide?

Are you running a cluster?

If so can you identify the leader using “lxc cluster list” and then try running “lxc ls” on that machine and see if still slow.

Can you provide the output of both commands too please.

Do you still see errors in the logs?

My server is not part of a cluster.
My logs are filled with error:
`WARNING[2022-07-28T08:00:52-05:00] Transaction timed out. Retrying once err=“Failed to begin transaction: context deadline exceeded” member=1’

‘DEBUG [2022-07-28T08:01:12-05:00] Database error err=“Failed to fetch from “instance_snapshot_config” table: sql: Rows are closed”’

'DEBUG [2022-07-28T08:01:12-05:00] Database error err=“sql: transaction has already been committed or rolled back”`

How many instances do you have?

I’ve just tried locally with 512 instances and lxc list returns in a couple of seconds.
Are the instances running or stopped? Does that make a difference?

Thank you for your help.
The server has 21 running instances and 6 stopped instances( AMD 5900X w/128GB – very low load).
What maybe significant is that each instance has between 90 and 100 snapshots.

Thanks, yes I suspect that is significant. I’ll re-look at the DB queries and see if there are some inefficiencies related to snapshots, as the instance list (without snapshots) seems fine now.

I’ve opened https://github.com/lxc/lxd/issues/10707 to investigate this with some ideas as to what the cause is.