Containers are running but lxd.service is not

Host system: Ubuntu 18.04
lxd version: 3.0.3

Containers are running but any lxc command ends with:

$ lxc list
Error: Get http://unix.socket/1.0: dial unix /var/lib/lxd/unix.socket: connect: resource temporarily unavailable

lxd.service is not running and attempt to start it (via systemctl start lxd.service) ends with syslog log messages:

Feb  1 16:08:53 server lxd[9668]: t=2022-02-01T16:08:53+0100 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored."
Feb  1 16:08:54 server lxd[9668]: panic: log not found
Feb  1 16:08:54 server lxd[9668]: goroutine 1 [running]:
Feb  1 16:08:54 server lxd[9668]: github.com/hashicorp/raft.NewRaft(0xc420210750, 0x112daa0, 0xc420364560, 0x1135e60, 0xc4203643a0, 0x1132ca0, 0xc4203643a0, 0x112df60, 0xc4203644e0, 0x1139220, ...)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/hashicorp/raft/api.go:491 +0x11ba
Feb  1 16:08:54 server lxd[9668]: github.com/lxc/lxd/lxd/cluster.raftInstanceInit(0xc420334be0, 0xc420335b80, 0xc42035e000, 0x4008000000000000, 0x1, 0x70f7aa, 0xc4202d24a0)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/raft.go:191 +0x5c9
Feb  1 16:08:54 server lxd[9668]: github.com/lxc/lxd/lxd/cluster.newRaft(0xc420334be0, 0xc42035e000, 0x4008000000000000, 0x0, 0x0, 0xc420335900)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/raft.go:71 +0x24d
Feb  1 16:08:54 server lxd[9668]: github.com/lxc/lxd/lxd/cluster.(*Gateway).init(0xc420320ea0, 0xc420335900, 0xc4202050e0)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/gateway.go:448 +0x84
Feb  1 16:08:54 server lxd[9668]: github.com/lxc/lxd/lxd/cluster.NewGateway(0xc420334be0, 0xc42035e000, 0xc4202e7718, 0x2, 0x2, 0x0, 0x24, 0xf7efc0)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/cluster/gateway.go:57 +0x1f2
Feb  1 16:08:54 server lxd[9668]: main.(*Daemon).init(0xc4201e5b00, 0xc4202e78d0, 0x40e936)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/daemon.go:502 +0x645
Feb  1 16:08:54 server lxd[9668]: main.(*Daemon).Init(0xc4201e5b00, 0xc4201e5b00, 0xc420204c00)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/daemon.go:390 +0x2f
Feb  1 16:08:54 server lxd[9668]: main.(*cmdDaemon).Run(0xc4203000c0, 0xc420306280, 0xc42030e0f0, 0x0, 0x3, 0x0, 0x0)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main_daemon.go:82 +0x37a
Feb  1 16:08:54 server lxd[9668]: main.(*cmdDaemon).Run-fm(0xc420306280, 0xc42030e0f0, 0x0, 0x3, 0x0, 0x0)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main_daemon.go:42 +0x52
Feb  1 16:08:54 server lxd[9668]: github.com/spf13/cobra.(*Command).execute(0xc420306280, 0xc4200e8050, 0x3, 0x3, 0xc420306280, 0xc4200e8050)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:762 +0x468
Feb  1 16:08:54 server lxd[9668]: github.com/spf13/cobra.(*Command).ExecuteC(0xc420306280, 0x0, 0xc42030c780, 0xc42030c780)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:852 +0x30a
Feb  1 16:08:54 server lxd[9668]: github.com/spf13/cobra.(*Command).Execute(0xc420306280, 0xc4202e7e08, 0x1)
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/spf13/cobra/command.go:800 +0x2b
Feb  1 16:08:54 server lxd[9668]: main.main()
Feb  1 16:08:54 server lxd[9668]: #011/build/lxd-j7VLB_/lxd-3.0.3/obj-x86_64-linux-gnu/src/github.com/lxc/lxd/lxd/main.go:160 +0xe26
Feb  1 16:08:54 server systemd[1]: lxd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

I am sorry but I don’t know where to find any more detail about what happened.

And, in /var/log/lxd/lxd.log I found this repeated messages:

t=2022-02-01T16:08:53+0100 lvl=info msg="LXD 3.0.3 is starting in normal mode" path=/var/lib/lxd
t=2022-02-01T16:08:53+0100 lvl=info msg="Kernel uid/gid map:" 
t=2022-02-01T16:08:53+0100 lvl=info msg=" - u 0 0 4294967295" 
t=2022-02-01T16:08:53+0100 lvl=info msg=" - g 0 0 4294967295" 
t=2022-02-01T16:08:53+0100 lvl=info msg="Configured LXD uid/gid map:" 
t=2022-02-01T16:08:53+0100 lvl=info msg=" - u 0 100000 65536" 
t=2022-02-01T16:08:53+0100 lvl=info msg=" - g 0 100000 65536" 
t=2022-02-01T16:08:53+0100 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." 
t=2022-02-01T16:08:53+0100 lvl=info msg="Kernel features:" 
t=2022-02-01T16:08:53+0100 lvl=info msg=" - netnsid-based network retrieval: no" 
t=2022-02-01T16:08:53+0100 lvl=info msg=" - unprivileged file capabilities: yes" 
t=2022-02-01T16:08:53+0100 lvl=info msg="Initializing local database" 
t=2022-02-01T16:18:54+0100 lvl=info msg="LXD 3.0.3 is starting in normal mode" path=/var/lib/lxd
t=2022-02-01T16:18:54+0100 lvl=info msg="Kernel uid/gid map:" 
t=2022-02-01T16:18:54+0100 lvl=info msg=" - u 0 0 4294967295" 
t=2022-02-01T16:18:54+0100 lvl=info msg=" - g 0 0 4294967295" 
t=2022-02-01T16:18:54+0100 lvl=info msg="Configured LXD uid/gid map:" 
t=2022-02-01T16:18:54+0100 lvl=info msg=" - u 0 100000 65536" 
t=2022-02-01T16:18:54+0100 lvl=info msg=" - g 0 100000 65536" 
t=2022-02-01T16:18:54+0100 lvl=warn msg="CGroup memory swap accounting is disabled, swap limits will be ignored." 
t=2022-02-01T16:18:54+0100 lvl=info msg="Kernel features:" 
t=2022-02-01T16:18:54+0100 lvl=info msg=" - netnsid-based network retrieval: no" 
t=2022-02-01T16:18:54+0100 lvl=info msg=" - unprivileged file capabilities: yes" 
t=2022-02-01T16:18:54+0100 lvl=info msg="Initializing local database"

Dump of /var/lib/lxd/database/local.db is:

# sqlite3 local.db .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE schema (
    id         INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    version    INTEGER NOT NULL,
    updated_at DATETIME NOT NULL,
    UNIQUE (version)
);
INSERT INTO schema VALUES(1,37,1543232236);
CREATE TABLE config (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    key VARCHAR(255) NOT NULL,
    value TEXT,
    UNIQUE (key)
);
INSERT INTO config VALUES(4,'core.https_address','192.168.23.123:8443');
CREATE TABLE patches (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    name VARCHAR(255) NOT NULL,
    applied_at DATETIME NOT NULL,
    UNIQUE (name)
);
INSERT INTO patches VALUES(1,'invalid_profile_names',1543232236);
INSERT INTO patches VALUES(2,'leftover_profile_config',1543232237);
INSERT INTO patches VALUES(3,'network_permissions',1543232237);
INSERT INTO patches VALUES(4,'storage_api',1543232237);
INSERT INTO patches VALUES(5,'storage_api_v1',1543232237);
INSERT INTO patches VALUES(6,'storage_api_dir_cleanup',1543232237);
INSERT INTO patches VALUES(7,'storage_api_lvm_keys',1543232237);
INSERT INTO patches VALUES(8,'storage_api_keys',1543232237);
INSERT INTO patches VALUES(9,'storage_api_update_storage_configs',1543232237);
INSERT INTO patches VALUES(10,'storage_api_lxd_on_btrfs',1543232237);
INSERT INTO patches VALUES(11,'storage_api_lvm_detect_lv_size',1543232237);
INSERT INTO patches VALUES(12,'storage_api_insert_zfs_driver',1543232237);
INSERT INTO patches VALUES(13,'storage_zfs_noauto',1543232237);
INSERT INTO patches VALUES(14,'storage_zfs_volume_size',1543232237);
INSERT INTO patches VALUES(15,'network_dnsmasq_hosts',1543232237);
INSERT INTO patches VALUES(16,'storage_api_dir_bind_mount',1543232237);
INSERT INTO patches VALUES(17,'fix_uploaded_at',1543232237);
INSERT INTO patches VALUES(18,'storage_api_ceph_size_remove',1543232237);
INSERT INTO patches VALUES(19,'devices_new_naming_scheme',1543232237);
INSERT INTO patches VALUES(20,'storage_api_permissions',1543232237);
INSERT INTO patches VALUES(21,'container_config_regen',1543232237);
INSERT INTO patches VALUES(22,'lvm_node_specific_config_keys',1543232237);
INSERT INTO patches VALUES(23,'candid_rename_config_key',1543232237);
INSERT INTO patches VALUES(24,'shrink_logs_db_file',1545998098);
CREATE TABLE raft_nodes (
    id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
    address TEXT NOT NULL,
    UNIQUE (address)
);
INSERT INTO raft_nodes VALUES(1,'192.168.23.123:8443');
DELETE FROM sqlite_sequence;
INSERT INTO sqlite_sequence VALUES('schema',1);
INSERT INTO sqlite_sequence VALUES('patches',24);
INSERT INTO sqlite_sequence VALUES('config',4);
INSERT INTO sqlite_sequence VALUES('raft_nodes',1);
COMMIT;

Ok, so you’re unfortunately dealing with a corrupted database…
Did your system crash, lost power or ran into a disk full situation?

LXD 3.0 is a bit old (though still supported for security updates) so I haven’t dealt with that kind of recovery in a while :slight_smile:

Could you show us ls -lh /var/lib/lxd/database/global/ ?
I’m hoping it’s structurally similar to current LXD where I know how to easily recover from something like this :wink:

Thank you for reply. I know it is old :slight_smile: But I prefer to stay at distribution’s (Ubuntu 18.04) default version and experienced some minor difficulties with upgrade to 4.0 (during dist-upgrade and migration to snap) on other server when different storages were involved.

Server’s uptime is over 100 days and didn’t experience any problems (raids without error, no full disks).

ls is:

# ls -lh /var/lib/lxd/database/global/
total 2.3G
-rw------- 1 root root 264K Feb  1 15:22 db.bin
-rw------- 1 root root 2.3G Feb  1 20:09 logs.db
drwxr-xr-x 2 root root    6 Feb  1 20:09 snapshots

I noticed quite big logs.db but did not find any information about this file.
If logs.db is SQLite file it is probably corrupted:

# sqlite3 logs.db .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
/**** ERROR: (26) file is not a database *****/
ROLLBACK; -- due to errors

It’s unfortunately not a sqlite3 database, it’s the raft transactions inside of a boltdb I believe.

Any chance you can make a tarball of /var/lib/lxd/database/global/ and send that over to stgraber@ubuntu.com?

I’ll see if I can revert to one of the snapshots or alternatively wipe the whole thing and convince LXD to reload it from the db.bin.

(LXD 4.0 and higher with the newer dqlite has each segment and snapshots as individual files making it just one rm to revert the last write attempt and go back to a consistant DB in such cases, LXD 3.0 is a bit more effort :wink: )

Thank you for great help with database cleanup.