Database error: "sql: transaction has already been committed or rolled back"

Also please can you confirm host os and kernel version .

Are you still seeing the first container start and then others not on a fresh install? And are you still seeing a correlation between kernel versions and the issue?

Are you able to run a test without luks in the mix to rule that out as a contributor?

On an LXD install that is quite different from our main setup, I am not seeing this issue. It is used for CI and has the following key differences:

  • /var/snap/lxd on tmpfs (i.e. “ramdisk”)
  • LXD re-inits every boot (but uptimes can be quite long still)
  • No LUKS
  • There were most likely no containers when LXD switched from 5.2 to 5.3
  • At most there is normally only one container
  • Nested, i.e. this LXD runs in an LXD container

I will have to set up separate test servers for these procedures, but can’t do that today. I will track this thread to see if it is still needed when I get the time. Then I can also test with/without LUKS and other aspects to bisect.

In the meantime, I pinned our hosts to 5.2 so the matter is not pressing on our part.

@tomp Carrying on conversation from other thread:

Host OS version is all 20.04, kernel versions:

> for server in $servers; do echo $server; ssh $server uname -a;  echo; done
es-hel-phys-2
Linux es-hel-phys-2 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

app-hel-phys-3
Linux app-hel-phys-3 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

app-hel-phys-8
Linux app-hel-phys-8 5.4.0-88-generic #99-Ubuntu SMP Thu Sep 23 17:29:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

es-hel-phys-3
Linux es-hel-phys-3 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

app-hel-phys-2
Linux app-hel-phys-2 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

app-hel-phys-5
Linux app-hel-phys-5 5.4.0-88-generic #99-Ubuntu SMP Thu Sep 23 17:29:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

es-hel-phys-1
Linux es-hel-phys-1 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

app-hel-phys-4
Linux app-hel-phys-4 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

app-hel-phys-7
Linux app-hel-phys-7 5.4.0-88-generic #99-Ubuntu SMP Thu Sep 23 17:29:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

hetzner-inference-5-phys
Linux hetzner-inference-5-phys 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

hetzner-inference-8-phys
Linux hetzner-inference-8-phys 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

hetzner-inference-9-phys
Linux hetzner-inference-9-phys 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

hetzner-inference-staging-phys
Linux hetzner-inference-staging-phys 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

hetzner-inference-2-phys
Linux hetzner-inference-2-phys 5.4.0-96-generic #109-Ubuntu SMP Wed Jan 12 16:49:16 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

hetzner-inference-3-phys
Linux hetzner-inference-3-phys 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

hetzner-inference-4-phys
Linux hetzner-inference-4-phys 5.4.0-96-generic #109-Ubuntu SMP Wed Jan 12 16:49:16 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

hetzner-inference-7-phys
Linux hetzner-inference-7-phys 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

hetzner-inference-6-phys
Linux hetzner-inference-6-phys 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

monitoring
Linux monitoring 5.15.0-27-generic #28-Ubuntu SMP Thu Apr 14 04:55:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

app-hel-phys-6
Linux app-hel-phys-6 5.4.0-88-generic #99-Ubuntu SMP Thu Sep 23 17:29:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

hetzner-inference-1-phys
Linux hetzner-inference-1-phys 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

app-ovh-phys-1
Linux app-ovh-phys-1 5.4.0-117-generic #132-Ubuntu SMP Thu Jun 2 00:39:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

app-ovh-phys-2
Linux app-ovh-phys-2 5.4.0-117-generic #132-Ubuntu SMP Thu Jun 2 00:39:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

app-ovh-phys-3
Linux app-ovh-phys-3 5.4.0-117-generic #132-Ubuntu SMP Thu Jun 2 00:39:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

app-ovh-phys-4
Linux app-ovh-phys-4 5.4.0-117-generic #132-Ubuntu SMP Thu Jun 2 00:39:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

app-ovh-phys-5
Linux app-ovh-phys-5 5.4.0-117-generic #132-Ubuntu SMP Thu Jun 2 00:39:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

app-ovh-phys-6
Linux app-ovh-phys-6 5.4.0-117-generic #132-Ubuntu SMP Thu Jun 2 00:39:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

app-staging-phys-1
Linux app-staging-phys-1 5.13.0-41-generic #46~20.04.1-Ubuntu SMP Wed Apr 20 13:16:21 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

app-ovh-phys-7
Linux app-ovh-phys-7 5.4.0-121-generic #137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

app-ovh-phys-8
Linux app-ovh-phys-8 5.4.0-121-generic #137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

es-ovh-phys-1
Linux es-ovh-phys-1 5.4.0-121-generic #137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

es-ovh-phys-2
Linux es-ovh-phys-2 5.4.0-121-generic #137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

es-ovh-phys-3
Linux es-ovh-phys-3 5.4.0-121-generic #137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

app-hel-phys-1
Linux app-hel-phys-1 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Do you want log files from all the hosts, or just the database hosts?

Thanks, we could really do with the debug logs from the affected systems.

Myself and @masnax have tried reproducing this error to no avail.

I just tried a freshly installed ubuntu 20.04 system (kernel 5.4.0-121-generic), with a ZFS pool ontop of LVM logical volume (for added layering), then installed the 5.2/stable LXD channel, configured LXD to use the logical volume as the block device of the ZFS pool, and then launched 5 ubuntu/focal containers.

I tried refreshing several times between LXD 5.2 and LXD 5.3 to try and reproduce it, but no luck.

With them still running, I then refreshed to latest/stable which installed 5.3-924be6a and ran lxc ls and the container list with IPs and status returned quickly as normal.

Can I also ask if anyone here who is affected by this are they using disk devices on their containers to pass in paths from the host into the container?

In our case the errors were observed on hosts with containers using disk devices and on hosts without any such container.

Its a strange one. There’s clearly something you all have in common, but can’t figure out what it is yet.
I’ve just done a LXD 5.2 to LXD 5.3 cluster upgrade too and that went fine.

I am, yes

Thanks for this.

It looks like app-hel-phys-4 is leader, can you get the output of lxc cluster ls from that member to confirm?

Also can you double check that all members are online and running the same LXD version using snap info lxd
as I saw an instance of this:

time="2022-06-30T14:01:37Z" level=warning msg="Could not notify all nodes of database upgrade" err="failed to notify peer app-hel-phys-2:8443: failed to notify node about completed upgrade: Patch \"https://app-hel-phys-2:8443/internal/database\": Unable to connect to: app-hel-phys-2:8443 ([dial tcp 10.145.8.216:8443: connect: connection refused])"

Looks like there could be some DNS issues too:

time="2022-07-01T13:30:51Z" level=warning msg="Failed adding member event listener client" err="lookup es-ovh-phys-3 on 213.186.33.99:53: no such host" local="10.145.96.163:8443" remote="es-ovh-phys-3:8443"
Output
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
|              NAME              |                     URL                     |      ROLES       | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATE  |      MESSAGE      |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-hel-phys-1                 | https://app-hel-phys-1:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-hel-phys-2                 | https://app-hel-phys-2:8443                 | database-standby | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-hel-phys-3                 | https://app-hel-phys-3:8443                 | database-leader  | x86_64       | default        |             | ONLINE | Fully operational |
|                                |                                             | database         |              |                |             |        |                   |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-hel-phys-4                 | https://app-hel-phys-4:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-hel-phys-5                 | https://app-hel-phys-5:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-hel-phys-6                 | https://app-hel-phys-6:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-hel-phys-7                 | https://app-hel-phys-7:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-hel-phys-8                 | https://app-hel-phys-8:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-ovh-phys-1                 | https://app-ovh-phys-1:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-ovh-phys-2                 | https://app-ovh-phys-2:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-ovh-phys-3                 | https://app-ovh-phys-3:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-ovh-phys-4                 | https://app-ovh-phys-4:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-ovh-phys-5                 | https://app-ovh-phys-5:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-ovh-phys-6                 | https://app-ovh-phys-6:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-ovh-phys-7                 | https://app-ovh-phys-7:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| app-ovh-phys-8                 | https://app-ovh-phys-8:8443                 |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| es-hel-phys-1                  | https://es-hel-phys-1:8443                  |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| es-hel-phys-2                  | https://es-hel-phys-2:8443                  | database-standby | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| es-hel-phys-3                  | https://es-hel-phys-3:8443                  | database-standby | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| es-ovh-phys-2                  | https://es-ovh-phys-2:8443                  |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| es-ovh-phys-3                  | https://es-ovh-phys-3:8443                  |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| hetzner-inference-1-phys       | https://hetzner-inference-1-phys:8443       |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| hetzner-inference-2-phys       | https://hetzner-inference-2-phys:8443       |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| hetzner-inference-3-phys       | https://hetzner-inference-3-phys:8443       |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| hetzner-inference-4-phys       | https://hetzner-inference-4-phys:8443       | database         | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| hetzner-inference-5-phys       | https://hetzner-inference-5-phys:8443       |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| hetzner-inference-6-phys       | https://hetzner-inference-6-phys:8443       | database-standby | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| hetzner-inference-7-phys       | https://hetzner-inference-7-phys:8443       |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| hetzner-inference-8-phys       | https://hetzner-inference-8-phys:8443       |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| hetzner-inference-9-phys       | https://hetzner-inference-9-phys:8443       | database         | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| hetzner-inference-staging-phys | https://hetzner-inference-staging-phys:8443 | database-standby | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+
| monitoring                     | https://monitoring:8443                     |                  | x86_64       | default        |             | ONLINE | Fully operational |
+--------------------------------+---------------------------------------------+------------------+--------------+----------------+-------------+--------+-------------------+

Yep they are

That’s odd. Just tested from another host, and it works fine

> nc -v app-hel-phys-2 8443
Connection to app-hel-phys-2 8443 port [tcp/*] succeeded!

OK so in your case, your cluster appears to be up and running, but I am surprised it has ever worked well because its spread over 3 different geographical locations (from what I’m interpreting its Helsinki, Germany and Spain).

During the upgrade the leader is likely to have changed member, which may have meant its moved to a different country, giving very different query performance from the perspective of other members that used to be close to the leader.

Do you happen to know where the leader was before the upgrade?

What is the latency between each of the sites?

Here’s a discussion around cluster performance over WAN setups:

In LXD clusters all queries (read and writes) go to the leader from all other members, and then writes have to be replicated to the voter and standby members. So if they are spread over a wide geographical area this can really slow things down.

1 Like

Yeah, we’ve had a discussion about that before, in general it’s been reliable but slow since Heartbeat timeouts after upgrade to 4.18 - #38 by tomp

I’m afraid not

Between 20 and 35 ms

Which is the busiest site in terms of changes/activity?

In terms of lxd db changes? That would have been our staging server (Now removed) trying to do a lxd recover, when I posted in the thread initially.

I’ve now spun that out into a separate lxd instance, and the rest of the lxd cluster is running perfectly fine now it appears - I can create new containers etc

1 Like

OK thats good to hear. If you want to try moving the leader around then running sudo systemctl reload snap.lxd.daemon on the current leader will cause it to step down and a new leader will be promoted, so you can see which gives the best performance for the most members.

Yeah already tried that last night, thanks

1 Like