I get these types of errors every other day. Tried reload, restart of lxd deamon. Allso reboot of server. Still the errors are coming back. i get kicked out when i use lxc exec to the container. Then i becomes really slow. lxc list takes forever. Minutes if it show results at all.
Ubuntu 22.04.1.
LXD snap 5.0 lts channel.
Apr 11 15:51:46 Server1 lxd.daemon[142623]: time=“2023-04-11T15:51:46+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 15:51:46 Server1 lxd.daemon[142623]: time=“2023-04-11T15:51:46+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 15:51:47 Server1 lxd.daemon[142623]: time=“2023-04-11T15:51:47+02:00” level=warning msg=“Failed getting exec control websocket reader, killing command” PID=2700801 err=“read unix /var/snap/lxd/common/lxd/unix.socket->@: read: connection reset by peer” instance=webserverinteractive=true project=default
Apr 11 15:51:48 Server1 lxd.daemon[142623]: time=“2023-04-11T15:51:48+02:00” level=warning msg=“Detected poll(POLLNVAL) event: exiting”
Apr 11 16:02:35 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:35+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:02:35 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:35+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:02:41 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:41+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:02:43 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:40+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:02:50 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:50+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:02:53 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:53+02:00” level=warning msg=“Failed to rollback transaction after error (context deadline exceeded): sql: transaction has already been committed or rolled back”
Apr 11 16:02:53 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:53+02:00” level=error msg=“Unable to retrieve the list of expired custom volume snapshots” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:02:54 Server1 lxd.daemon[142623]: time=“2023-04-11T16:02:54+02:00” level=error msg=“Failed getting expired instance snapshots” err=“context deadline exceeded”
Apr 11 16:08:14 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:14+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:08:14 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:14+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:08:17 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:16+02:00” level=warning msg=“Failed getting exec control websocket reader, killing command” PID=3919346 err=“read unix /var/snap/lxd/common/lxd/unix.socket->@: read: connection reset by peer” instance=webserverinteractive=true project=default
Apr 11 16:08:17 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:13+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:08:21 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:21+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:08:25 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:25+02:00” level=warning msg=“Failed to rollback transaction after error (context deadline exceeded): sql: transaction has already been committed or rolled back”
Apr 11 16:08:25 Server1 lxd.daemon[142623]: time=“2023-04-11T16:08:25+02:00” level=error msg=“Unable to retrieve the list of expired custom volume snapshots” err=“context deadline exceeded”
Apr 11 16:13:04 Server1 lxd.daemon[142623]: time=“2023-04-11T16:12:59+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:13:05 Server1 lxd.daemon[142623]: time=“2023-04-11T16:12:58+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:13:14 Server1 lxd.daemon[142623]: time=“2023-04-11T16:13:14+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:18:29 Server1 lxd.daemon[142623]: time=“2023-04-11T16:18:29+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:18:29 Server1 lxd.daemon[142623]: time=“2023-04-11T16:18:29+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:18:30 Server1 lxd.daemon[142623]: time=“2023-04-11T16:18:30+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:18:30 Server1 lxd.daemon[142623]: time=“2023-04-11T16:18:30+02:00” level=warning msg=“Failed to rollback transaction after error (Failed getting volumes for auto custom volume snapshot task: driver: bad connection): sql: transaction has already been committed or rolled back”
Apr 11 16:22:34 Server1 lxd.daemon[142623]: time=“2023-04-11T16:22:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:22:35 Server1 lxd.daemon[142623]: time=“2023-04-11T16:22:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:22:36 Server1 lxd.daemon[142623]: time=“2023-04-11T16:22:31+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:23:17 Server1 lxd.daemon[142623]: time=“2023-04-11T16:23:16+02:00” level=error msg=“Failed getting expired instance snapshots” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:23:17 Server1 lxd.daemon[142623]: time=“2023-04-11T16:23:17+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:24:14 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:11+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:24:14 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:14+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:24:15 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:10+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:24:15 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:15+02:00” level=warning msg=“Transaction timed out. Retrying once” err=“Failed to begin transaction: context deadline exceeded” member=1
Apr 11 16:24:40 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:40+02:00” level=warning msg=“Dqlite: attempt 1: server 1: write handshake: write unix @->@0000d: i/o timeout”
Apr 11 16:24:41 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:41+02:00” level=error msg=“Unable to retrieve the list of expired custom volume snapshots” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:24:42 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:42+02:00” level=error msg=“Failed getting expired instance snapshots” err=“Failed to begin transaction: context deadline exceeded”
Apr 11 16:24:43 Server1 lxd.daemon[142623]: time=“2023-04-11T16:24:43+02:00” level=error msg=“Failed to schedule local auto custom volume snapshot,” err=“Failed to begin transaction: context deadline exceeded”
After some more testing ive realized that the problem is the communication with the container and daemon when i do lxc exec. If i ssh into the container everything is fine.
Some more…
I have 2 vms. One debian and one windows 11. Both crasched with the error
qemu-system-x86_64: Issue while setting TUNSETSTEERINGEBPF: Invalid argument with fd: 51(50 on win11), prog_fd: -1
lxc list wont work after, lxc exec wont work either.
Rebooted server and now i works again. lxd daemon log and lxd log are empty.
i did sudo systemctl reload snap.lxd.daemon but i still cant do lxc list etc.
Cant restart lxd every other day. Containers seems to be working but the win vm is not responsive at all since yesterday.
Please can you check the load and memory usage when it occurs as this sort of thing normally happens your system is overloaded in some way. And as both the VM is hanging and LXD is having trouble accessing the database in a timely manner its all pointing to an overloaded system.
What does top show and what does ps aux | grep qemu show?
Also, can you try this using the latest/stable channel? (keep in mind you wont be able to downgrade back to the LTS channel so if thats a problem don’t do it).
The server has 128gb of ram. Reg ecc ddr4. But should lxd really crash if the memory is full? Ok so kill pid may work? I can ssh into containers and they work. But lxc exec, list, info wobt work.
Its likely qemu has crashed and is blocking lxc list. So we should focus on why qemu is crashing/freezing/blocking - it may be an I/O fault. Anything in dmesg or journalctl?