Error every other day. LXD times out etc

I get these types of errors every other day. Tried reload, restart of lxd deamon. Allso reboot of server. Still the errors are coming back. i get kicked out when i use lxc exec to the container. Then i becomes really slow. lxc list takes forever. Minutes if it show results at all.

Ubuntu 22.04.1.
LXD snap 5.0 lts channel.

Apr 11 15:51:46 Server1 lxd.daemon[142623]: time=“2023-04-11T15:51:46+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 15:51:46 Server1 lxd.daemon[142623]: time=“2023-04-11T15:51:46+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 15:51:47 Server1 lxd.daemon[142623]: time=“2023-04-11T15:51:47+02:00” level=warning msg=“Failed getting exec control websocket reader, killing command” PID=2700801 err=“read unix /var/snap/lxd/common/lxd/unix.socket->@: read: connection reset by peer” instance=webserverinteractive=true project=default
Apr 11 15:51:48 Server1 lxd.daemon[142623]: time=“2023-04-11T15:51:48+02:00” level=warning msg=“Detected poll(POLLNVAL) event: exiting”
Apr 11 16:02:35 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:35+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:02:35 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:35+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:02:41 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:41+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:02:43 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:40+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:02:50 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:50+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:02:53 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:53+02:00” level=warning msg=“Failed to rollback transaction after error (context deadline exceeded): sql: transaction has already been committed or rolled back”
Apr 11 16:02:53 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:53+02:00” level=error msg=“Unable to retrieve the list of expired custom volume snapshots” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:02:54 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:54+02:00” level=error msg=“Failed getting expired instance snapshots” err=“context deadline exceeded”
Apr 11 16:08:14 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:14+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:08:14 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:14+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:08:17 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:16+02:00” level=warning msg=“Failed getting exec control websocket reader, killing command” PID=3919346 err=“read unix /var/snap/lxd/common/lxd/unix.socket->@: read: connection reset by peer” instance=webserverinteractive=true project=default
Apr 11 16:08:17 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:13+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:08:21 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:21+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:08:25 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:25+02:00” level=warning msg=“Failed to rollback transaction after error (context deadline exceeded): sql: transaction has already been committed or rolled back”
Apr 11 16:08:25 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:25+02:00” level=error msg=“Unable to retrieve the list of expired custom volume snapshots” err=“context deadline exceeded”
Apr 11 16:13:04 Server1 lxd.daemon[142623]: time=“2023-04-11T16:12:59+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:13:05 Server1 lxd.daemon[142623]: time=“2023-04-11T16:12:58+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:13:14 Server1 lxd.daemon[142623]: time=“2023-04-11T16:13:14+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:18:29 Server1 lxd.daemon[142623]: time=“2023-04-11T16:18:29+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:18:29 Server1 lxd.daemon[142623]: time=“2023-04-11T16:18:29+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:18:30 Server1 lxd.daemon[142623]: time=“2023-04-11T16:18:30+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:18:30 Server1 lxd.daemon[142623]: time=“2023-04-11T16:18:30+02:00” level=warning msg=“Failed to rollback transaction after error (Failed getting volumes for auto custom volume snapshot task: driver: bad connection): sql: transaction has already been committed or rolled back”
Apr 11 16:22:34 Server1 lxd.daemon[142623]: time=“2023-04-11T16:22:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:22:35 Server1 lxd.daemon[142623]: time=“2023-04-11T16:22:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:22:36 Server1 lxd.daemon[142623]: time=“2023-04-11T16:22:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:23:17 Server1 lxd.daemon[142623]: time=“2023-04-11T16:23:16+02:00” level=error msg=“Failed getting expired instance snapshots” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:23:17 Server1 lxd.daemon[142623]: time=“2023-04-11T16:23:17+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:24:14 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:11+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:24:14 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:14+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:24:15 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:10+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:24:15 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:15+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:24:40 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:40+02:00” level=warning msg=“Dqlite: attempt 1: server 1: write handshake: write unix @->@0000d: i/o timeout”
Apr 11 16:24:41 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:41+02:00” level=error msg=“Unable to retrieve the list of expired custom volume snapshots” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:24:42 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:42+02:00” level=error msg=“Failed getting expired instance snapshots” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:24:43 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:43+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err=“Failed to begin transaction: context deadline exceeded”

Is there something that triggers this? A load spike on your workloads perhaps?

Not really. The load is very varied. Cant really find and logic to it. Last the it happened the load was low.

Ive had the same setup and load for at least a year. This is new since a few weeks.

@tomp now i cant do exec, just stand still. BUT the lxc seems to be working. Can connect to the webserver. So something in the communication ?

Did a reload. Got this. Doesn’t say much

pr 13 20:03:15 server1 systemd[1]: Reloading Service for snap application lxd.daemon…
Apr 13 20:03:16 server1 systemd[1]: Reloaded Service for snap application lxd.daemon.
Apr 13 20:11:03 server1 systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 13 20:11:03 server1 systemd[1]: snap.lxd.daemon.service: Failed with result ‘exit-code’.
Apr 13 20:11:03 server1 systemd[1]: snap.lxd.daemon.service: Consumed 1.350s CPU time.
Apr 13 20:11:03 server1 systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 1.
Apr 13 20:11:03 server1 systemd[1]: Stopped Service for snap application lxd.daemon.
Apr 13 20:11:03 server1 systemd[1]: snap.lxd.daemon.service: Consumed 1.350s CPU time.
Apr 13 20:11:03 server1 systemd[1]: Started Service for snap application lxd.daemon.
Apr 13 20:11:03 server1 lxd.daemon[3862678]: => Preparing the system (24322)
Apr 13 20:11:04 server1 lxd.daemon[3862678]: ==> Loading snap configuration
Apr 13 20:11:04 server1 lxd.daemon[3862678]: ==> Setting up mntns symlink (mnt:[4026533649])
Apr 13 20:11:04 server1 lxd.daemon[3862678]: ==> Setting up kmod wrapper

tried to exec now. Got kicked out

Error: websocket: close 1006 (abnormal closure): unexpected EOF

Some more logs. Couldn’t access anything. Started working again after about 30min. Containers seem to be working during the timeouts.

Apr 13 20:13:57 server1 lxd.daemon[3862926]: time=“2023-04-13T20:13:57+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadl
ine exceeded” member=1
Apr 13 20:13:57 server1 lxd.daemon[3862926]: time=“2023-04-13T20:13:56+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadl
ine exceeded” member=1
Apr 13 20:13:58 server1 lxd.daemon[3862926]: time=“2023-04-13T20:13:56+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadl
ine exceeded” member=1
Apr 13 20:13:59 server1 lxd.daemon[3862926]: time=“2023-04-13T20:13:59+02:00” level=warning msg=“Failed to rollback transaction after error (context deadline exceeded): sql: transact
ion has already been committed or rolled back”
Apr 13 20:13:59 server1 lxd.daemon[3862926]: time=“2023-04-13T20:13:59+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“context deadline exceeded” member=1
Apr 13 20:14:12 server1 lxd.daemon[3862926]: time=“2023-04-13T20:14:12+02:00” level=error msg=“Unable to retrieve the list of expired custom volume snapshots” err=“Failed to begin transaction: context deadline exceeded”
Apr 13 20:14:12 server1 lxd.daemon[3862926]: time=“2023-04-13T20:14:12+02:00” level=warning msg=“Failed to rollback transaction after error (context deadline exceeded): sql: transaction has already been committed or rolled back”
Apr 13 20:14:12 server1 lxd.daemon[3862926]: time=“2023-04-13T20:14:12+02:00” level=error msg=“Failed getting expired instance snapshots” err=“context deadline exceeded”
Apr 13 20:20:30 server1 lxd.daemon[3862926]: time=“2023-04-13T20:20:29+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadl
ine exceeded” member=1
Apr 13 20:20:31 server1 lxd.daemon[3862926]: time=“2023-04-13T20:20:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadl
ine exceeded” member=1

After some more testing ive realized that the problem is the communication with the container and daemon when i do lxc exec. If i ssh into the container everything is fine.

Some more…
I have 2 vms. One debian and one windows 11. Both crasched with the error
qemu-system-x86_64: Issue while setting TUNSETSTEERINGEBPF: Invalid argument with fd: 51(50 on win11), prog_fd: -1

lxc list wont work after, lxc exec wont work either.

Rebooted server and now i works again. lxd daemon log and lxd log are empty.

winvm crashed. Some errors:

Error: websocket: close 1006 (abnormal closure): unexpected EOF

ay 03 17:53:21 server1 lxd.daemon[10576]: time=“2023-05-03T17:53:20+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 17:53:24 server1 lxd.daemon[10576]: time=“2023-05-03T17:53:24+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 17:58:35 server1 lxd.daemon[10576]: time=“2023-05-03T17:58:35+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 17:58:36 server1 lxd.daemon[10576]: time=“2023-05-03T17:58:36+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 17:58:36 server1 lxd.daemon[10576]: time=“2023-05-03T17:58:36+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 17:58:39 server1 lxd.daemon[10576]: time=“2023-05-03T17:58:39+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 18:05:32 server1 lxd.daemon[10576]: time=“2023-05-03T18:05:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 18:05:32 server1 lxd.daemon[10576]: time=“2023-05-03T18:05:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 18:05:32 server1 lxd.daemon[10576]: time=“2023-05-03T18:05:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 18:05:33 server1 lxd.daemon[10576]: time=“2023-05-03T18:05:33+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 18:05:45 server1 lxd.daemon[10576]: time=“2023-05-03T18:05:45+02:00” level=error msg=“Unable to retrieve the list of expired custom volume snapshots” err=“Failed to begin transaction: context deadline exceeded”
May 03 18:05:45 server1 lxd.daemon[10576]: time=“2023-05-03T18:05:45+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err=“Failed to begin transaction: context deadline exceeded”
May 03 18:05:45 server1 lxd.daemon[10576]: time=“2023-05-03T18:05:45+02:00” level=error msg=“Failed getting expired instance snapshots” err=“Failed to begin transaction: context deadline exceeded”
May 03 19:26:06 server1 lxd.daemon[10576]: time=“2023-05-03T19:26:06+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 19:26:08 server1 lxd.daemon[10576]: time=“2023-05-03T19:26:07+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 19:26:09 server1 lxd.daemon[10576]: time=“2023-05-03T19:26:06+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 19:26:09 server1 lxd.daemon[10576]: time=“2023-05-03T19:26:09+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:33:02 server1 lxd.daemon[10576]: time=“2023-05-03T20:33:02+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:33:02 server1 lxd.daemon[10576]: time=“2023-05-03T20:33:02+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:33:03 server1 lxd.daemon[10576]: time=“2023-05-03T20:33:03+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:33:04 server1 lxd.daemon[10576]: time=“2023-05-03T20:33:04+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:38:42 server1 lxd.daemon[10576]: time=“2023-05-03T20:38:36+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:38:42 server1 lxd.daemon[10576]: time=“2023-05-03T20:38:36+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:38:42 server1 lxd.daemon[10576]: time=“2023-05-03T20:38:37+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:39:26 server1 lxd.daemon[10576]: time=“2023-05-03T20:39:05+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:39:26 server1 lxd.daemon[10576]: time=“2023-05-03T20:39:05+02:00” level=error msg=“Failed getting expired instance snapshots” err=“Failed to begin transaction: context deadline exceeded”
May 03 20:39:47 server1 lxd.daemon[10576]: time=“2023-05-03T20:39:05+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err=“Failed to begin transaction: context deadline exceeded”
May 03 20:41:49 server1 lxd.daemon[10576]: time=“2023-05-03T20:41:49+02:00” level=warning msg=“Dqlite: attempt 1: server 1: write handshake: write unix @->@0000d: i/o timeout”
May 03 20:41:50 server1 lxd.daemon[10576]: time=“2023-05-03T20:41:50+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:41:51 server1 lxd.daemon[10576]: time=“2023-05-03T20:41:51+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:41:56 server1 lxd.daemon[10576]: time=“2023-05-03T20:41:56+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err="Failed to begin transaction: failed to create dqlite connection: no available dqlite leader server fou>
May 03 20:41:56 server1 lxd.daemon[10576]: time=“2023-05-03T20:41:56+02:00” level=error msg=“Unable to retrieve the list of expired custom volume snapshots” err=“Failed to begin transaction: context deadline exceeded”
May 03 20:47:42 server1 lxd.daemon[10576]: time=“2023-05-03T20:47:37+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:47:42 server1 lxd.daemon[10576]: time=“2023-05-03T20:47:40+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:47:43 server1 lxd.daemon[10576]: time=“2023-05-03T20:46:49+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:50:57 server1 lxd.daemon[10576]: time=“2023-05-03T20:50:55+02:00” level=error msg=“Failed getting expired instance snapshots” err=“Failed to begin transaction: context deadline exceeded”
May 03 20:51:00 server1 lxd.daemon[10576]: time=“2023-05-03T20:51:00+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err=“Failed to begin transaction: context deadline exceeded”
May 03 20:51:11 server1 lxd.daemon[10576]: time=“2023-05-03T20:51:11+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:55:46 server1 lxd.daemon[10576]: time=“2023-05-03T20:55:46+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:55:46 server1 lxd.daemon[10576]: time=“2023-05-03T20:55:46+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:55:47 server1 lxd.daemon[10576]: time=“2023-05-03T20:55:46+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 20:55:54 server1 lxd.daemon[10576]: time=“2023-05-03T20:55:54+02:00” level=warning msg=“Failed to rollback transaction after error (driver: bad connection): sql: transaction has already been committed or rolled back”
May 03 21:45:33 server1 lxd.daemon[10576]: time=“2023-05-03T21:45:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 21:45:34 server1 lxd.daemon[10576]: time=“2023-05-03T21:45:32+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 21:45:35 server1 lxd.daemon[10576]: time=“2023-05-03T21:45:35+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 21:45:36 server1 lxd.daemon[10576]: time=“2023-05-03T21:45:36+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 21:47:40 server1 lxd.daemon[10576]: time=“2023-05-03T21:47:37+02:00” level=warning msg=“Dqlite: attempt 1: server 1: write handshake: write unix @->@0000d: i/o timeout”
May 03 21:47:41 server1 lxd.daemon[10576]: time=“2023-05-03T21:47:33+02:00” level=error msg=“Unable to retrieve the list of expired custom volume snapshots” err=“Failed to begin transaction: context deadline exceeded”
May 03 21:47:44 server1 lxd.daemon[10576]: time=“2023-05-03T21:47:44+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err=“Failed to begin transaction: context deadline exceeded”
May 03 21:47:44 server1 lxd.daemon[10576]: time=“2023-05-03T21:47:44+02:00” level=error msg=“Failed getting expired instance snapshots” err=“Failed to begin transaction: context deadline exceeded”
May 03 21:47:56 server1 lxd.daemon[10576]: time=“2023-05-03T21:47:56+02:00” level=warning msg="Failed to rollback transaction after error (Failed loading instance profiles: Failed to fetch from “profile_devices” table: Failed to fetch from "profile_dev>
May 03 21:52:35 server1 lxd.daemon[10576]: time=“2023-05-03T21:52:35+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 21:52:36 server1 lxd.daemon[10576]: time=“2023-05-03T21:52:35+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 21:52:36 server1 lxd.daemon[10576]: time=“2023-05-03T21:52:36+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 21:52:37 server1 lxd.daemon[10576]: time=“2023-05-03T21:52:37+02:00” level=warning msg="Failed to rollback transaction after error (Failed loading projects: Failed to fetch from “projects” table: Failed to fetch from “projects” table: sql: Rows>
May 03 21:55:00 server1 lxd.daemon[10576]: time=“2023-05-03T21:55:00+02:00” level=warning msg="Failed to rollback transaction after error (Failed loading projects: Failed to fetch from “projects” table: Failed to fetch from “projects” table: sql: tran>
May 03 21:55:00 server1 lxd.daemon[10576]: time=“2023-05-03T21:55:00+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err="Failed loading projects: Failed to fetch from “projects” table: Failed to fetch from “projects” ta>
May 03 21:57:05 server1 lxd.daemon[10576]: time=“2023-05-03T21:57:05+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
May 03 21:57:07 server1 lxd.daemon[10576]: time=“2023-05-03T21:57:07+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1

@tomp any thoughts? this time it took 2 days after reboot.

i did sudo systemctl reload snap.lxd.daemon but i still cant do lxc list etc.
Cant restart lxd every other day. Containers seems to be working but the win vm is not responsive at all since yesterday.

Please can you check the load and memory usage when it occurs as this sort of thing normally happens your system is overloaded in some way. And as both the VM is hanging and LXD is having trouble accessing the database in a timely manner its all pointing to an overloaded system.

@tomp
Memory was 18% free at the time (24gb). CPU was a 40% usage.
I still cant do lxc list
“lxc list
Error: Get “http://unix.socket/1.0/instances?filter=&recursion=2”: net/http: timeout awaiting response header”

VM is still unresponsive. lxc stop vm-name -f wont work. Can i force it?

What does top show and what does ps aux | grep qemu show?

Also, can you try this using the latest/stable channel? (keep in mind you wont be able to downgrade back to the LTS channel so if thats a problem don’t do it).

lxd 14624 5.6 5.7 8901696 7635548 ? Sl Apr27 547:44 /snap/lxd/24322/bin/qemu-system-x86_64 -S -vm1 vm1 -uuid e3932e4b-9226-463d-a85c-64005a85e83f -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/vm1/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/vm1/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/vm1/qemu.pid -D /var/snap/lxd/common/lxd/logs/vm1/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
lxd 1083047 66.3 12.7 18322956 16784548 ? Sl Apr28 5666:27 /snap/lxd/24322/bin/qemu-system-x86_64 -S -vm1 win11 -uuid 10344210-25c5-4c81-ac7e-ea62a7b8b918 -daemonize -cpu host,hv_passthrough -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/win11/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/win11/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/win11/qemu.pid -D /var/snap/lxd/common/lxd/logs/win11/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
user1 1299439 0.0 0.0 6608 2228 pts/10 S+ 16:37 0:00 grep --color=auto qemu

1083047 lxd 20 0 17.5g 16.0g 15.9g S 100.0 12.7 5671:21 qemu-system-x86
14624 lxd 20 0 8901696 7.3g 7.3g S 5.6 5.8 547:55.96 qemu-system-x86

vms right?

So it looks like those 2 qemu VM processes have 15.9G + 7.3G = 23.2G of resident memory.

How much memory does the server have?

You can kill those qemu processes to force the VMs to stop.

The server has 128gb of ram. Reg ecc ddr4. But should lxd really crash if the memory is full? Ok so kill pid may work? I can ssh into containers and they work. But lxc exec, list, info wobt work.

Its likely qemu has crashed and is blocking lxc list. So we should focus on why qemu is crashing/freezing/blocking - it may be an I/O fault. Anything in dmesg or journalctl?