SQL logic error

ema · February 17, 2020, 5:00pm

Hi, I get this:

lxc exec <container-name> bash

Error: failed to add Operation e2981f66-3501-4a8e-b86c-e41421f90cd7 to database: SQL logic error

Same problem with other commands:

lxc restart <container-name>

Error: failed to add Operation 563ce5af-a789-4aa2-903c-620d834a7ce1 to database: SQL logic error

A little debug:

lxc monitor

location: none
metadata:
  context: {}
  level: dbug
  message: 'New event listener: 8d7405a3-3369-40ec-97b4-80309f2a76dd'
timestamp: "2020-02-17T17:25:28.886169109+01:00"
type: logging


location: none
metadata:
  context:
    ip: '@'
    method: POST
    url: /1.0/instances/<container-name>/exec
    user: ""
  level: dbug
  message: Handling
timestamp: "2020-02-17T17:25:28.886455586+01:00"
type: logging


location: none
metadata:
  context: {}
  level: dbug
  message: 'forkcheckfile: Path doesn''t exist: No such file or directory'
timestamp: "2020-02-17T17:25:28.894736921+01:00"
type: logging


location: none
metadata:
  context: {}
  level: dbug
  message: 'Database error: protocol.Error{Code:1, Message:"SQL logic error"}'
timestamp: "2020-02-17T17:25:28.895186684+01:00"
type: logging


location: none
metadata:
  context: {}
  level: dbug
  message: 'Disconnected event listener: 8d7405a3-3369-40ec-97b4-80309f2a76dd'
timestamp: "2020-02-17T17:25:28.897155847+01:00"
type: logging


location: none
metadata:
  context: {}
  level: dbug
  message: 'Event listener finished: 8d7405a3-3369-40ec-97b4-80309f2a76dd'
timestamp: "2020-02-17T17:25:28.897006982+01:00"
type: logging

My lxd installation is based on snap.
I can’t find any reference to this kind of error. Any hint?
Thanks
Ema

simos · February 17, 2020, 6:17pm

Hi!

This is not a common error. It could be a disk space issue that has corrupted the database.
Does lxc list work?

You can examine the contents of the database with the following. To view the database schema (the database table names, and also their respective fields),

lxd sql global .schema

To get the list of the containers (instances), you would

lxd sql global "SELECT * FROM instances"

ema · February 17, 2020, 6:34pm

Hi! Thanks for your fast reply

There’s space:

df /var/snap/lxd
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/nvme2n1p1  60923672 10691172  50216116  18% /

Listing works:

lxc list
+---------------------------+---------+--------------------+------+-----------+-----------+
|           NAME            |  STATE  |        IPV4        | IPV6 |   TYPE    | SNAPSHOTS |
+---------------------------+---------+--------------------+------+-----------+-----------+
| container1                | RUNNING | 10.10.0.62 (eth0)  |      | CONTAINER | 0         |
+---------------------------+---------+--------------------+------+-----------+-----------+
| container2                | RUNNING | 10.10.0.27 (eth0)  |      | CONTAINER | 0         |
+---------------------------+---------+--------------------+------+-----------+-----------+

Sqlite commands work:

lxd sql global .schema | head
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE schema (
    id         INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    version    INTEGER NOT NULL,
    updated_at DATETIME NOT NULL,
    UNIQUE (version)
);
INSERT INTO schema VALUES(1,13,1543511636);
INSERT INTO schema VALUES(2,14,1550571829);

And:

lxd sql global "SELECT * FROM instances" | head -5
+-----+---------+---------------------------+--------------+------+-----------+-------------------------------------+----------+-------------------------------------+-------------+------------+----------------------+
| id  | node_id |           name            | architecture | type | ephemeral |            creation_date            | stateful |            last_use_date            | description | project_id |     expiry_date      |
+-----+---------+---------------------------+--------------+------+-----------+-------------------------------------+----------+-------------------------------------+-------------+------------+----------------------+
| 175 | 1       | container1                | 2            | 0    | 0         | 2019-12-23T10:09:59.801824047+01:00 | 0        | 2020-02-13T15:46:59.085104898+01:00 |             | 1          | 0001-01-01T00:00:00Z |
| 176 | 1       | container2                | 2            | 0    | 0         | 2019-12-23T10:14:55.028941164+01:00 | 0        | 2020-02-13T15:45:27.526716913+01:00 |             | 1          | 0001-01-01T00:00:00Z |

You’re saying I should check if sqlite file is corrupted?

stgraber · February 17, 2020, 6:39pm

So read-only access appears fine but writes no so much.
/var/snap/lxd/common/lxd/database is writable and didn’t flip to read-only, right?

If it’s properly writable, then I’d suggest trying to restart LXD, that may fix the issue or if not, will probably provide a better error. This can be done with systemctl reload snap.lxd.daemon

@freeekanayaka

ema · February 17, 2020, 6:49pm

Hi,

I’ll install sqlite3 client and try to do a writing operation, is it your suggestion?

Regarding your suggestion to snap restart lxd, when I do it it apparently works fine, but after a few minutes the containers start to be unreachable, not all at once, one after the other.
They’re apparently running, except that their ports are not reachable any more.
If it happens when I’m still inside a container, basic commands like w doesn’t work anymore, saying /proc folder isn’t available.
I have to restart each container to fix it.

So, now I’m looking for a different approach.

Thanks!

stgraber · February 17, 2020, 7:04pm

Sounds like lxcfs is getting restarted during systemctl reload snap.lxd.daemon according to what you’re saying.

That’s pretty odd as it’s normally kept running at all times.

Can you show journalctl -u snap.lxd.daemon -n 300 and dmesg?

ema · February 17, 2020, 7:39pm

Journalctl:

https://justpaste.it/21wi2

Dmesg:

https://justpaste.it/444tt

stgraber · February 17, 2020, 7:43pm

Nothing looking particularly wrong in there, other than not seeing any LXD reloads, all of the last stop reasons are full restart or host shutdown.

ema · February 17, 2020, 7:49pm

Should I get the logs right after the restart?
When I restart LXD I have to restart each container.

ema · February 26, 2020, 3:50pm

it seems it occurs when I try to write into the db:

lxd sql global "update instances set description ='dummy' where node_id = 1" --debug --verbose
DBUG[02-26|16:48:38] Connecting to a local LXD over a Unix socket 
DBUG[02-26|16:48:38] Sending request to LXD                   method=POST url=http://unix.socket/internal/sql etag=
DBUG[02-26|16:48:38] 
	{
		"database": "global",
		"query": "update instances set description ='dummy' where node_id = 1"
	} 
Error: SQL logic error

is there any command to restart just the db subsystem (to avoid the down of the containers I experience after the restart of LXD)?

Thanks!

C0rn3j · April 17, 2020, 11:24am

Also started running into this issue.

# lxc exec xwiki bash
Error: failed to add Operation fc2bf29d-3440-458a-b0b2-ae1e037a20af to database: SQL logic error

Reloading the daemon fixes it only temporarily for a few hours until the issue reoccurs

# systemctl reload snap.lxd.daemon

Logs and versions: https://haste.rys.pw/raw/ewoxinabuq

ema · April 17, 2020, 11:55am

Hi, same here even after upgrade to LXD 4.0.0

GOOD (for a few hours as you said):

# systemctl reload snap.lxd.daemon

NO GOOD (containers after a few minutes starts to be unreachable):

# systemctl restart snap.lxd.daemon

stgraber · April 17, 2020, 1:44pm

@freeekanayaka

freeekanayaka · April 17, 2020, 2:08pm

ema:

it seems it occurs when I try to write into the db:

lxd sql global "update instances set description ='dummy' where node_id = 1" --debug --verbose
DBUG[02-26|16:48:38] Connecting to a local LXD over a Unix socket 
DBUG[02-26|16:48:38] Sending request to LXD                   method=POST url=http://unix.socket/internal/sql etag=
DBUG[02-26|16:48:38] 
	{
		"database": "global",
		"query": "update instances set description ='dummy' where node_id = 1"
	} 
Error: SQL logic error

Does this happen also right after your restart LXD? (in other word, when things still seem to behave normally)

ema · April 17, 2020, 2:32pm

No, it works fine

freeekanayaka · April 17, 2020, 2:47pm

Are you using clustering or is this a standalone deplyment?

lxc cluster list

should tell that (you’ll get an error if you are not clustered).

ema · April 17, 2020, 3:22pm

No cluster, just one server active at a time while a second one is in stand by (rsync’d)