Due to snap auto updates , LXD Got restarted and deamon is no longer able to start again

v3ss0n · June 5, 2023, 8:14am

LXD Went down after automatic snapd update , and there is a storage that got broken (default) .It is trying to patch it and fail , unable to start. Now i cannot even access storage command to remove it.
What should i do.

Jun 05 02:22:58 E3-2276G lxd.daemon[3218160]: ==> snap base has changed, restart system to upgrade LXCFS
Jun 05 02:22:58 E3-2276G lxd.daemon[3218160]: ==> Cleaning up existing LXCFS namespace
Jun 05 02:22:58 E3-2276G lxd.daemon[3218160]: => Starting LXD
Jun 05 02:22:58 E3-2276G lxd.daemon[3218940]: time="2023-06-05T02:22:58-04:00" level=warning msg=" - Couldn't find the CGroup blkio.weight, disk priority will be ignored"
Jun 05 02:22:58 E3-2276G lxd.daemon[3218940]: time="2023-06-05T02:22:58-04:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
Jun 05 02:22:59 E3-2276G lxd.daemon[3218940]: time="2023-06-05T02:22:59-04:00" level=error msg="Failed mounting storage pool" err="Failed to run: zpool import -f -d /var/snap/lxd/common/lxd/disks default: exit s>
Jun 05 02:22:59 E3-2276G lxd.daemon[3218940]: time="2023-06-05T02:22:59-04:00" level=error msg="Failed to start the daemon" err="Failed applying patch \"storage_delete_old_snapshot_records\": Unvailable storage >
Jun 05 02:22:59 E3-2276G lxd.daemon[3218940]: Error: Failed applying patch "storage_delete_old_snapshot_records": Unvailable storage pools: [default]
Jun 05 02:22:59 E3-2276G lxd.daemon[3218160]: Killed
Jun 05 02:22:59 E3-2276G lxd.daemon[3218160]: => LXD failed to start
Jun 05 02:22:59 E3-2276G systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 05 02:22:59 E3-2276G systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jun 05 02:22:59 E3-2276G systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 2.

tomp · June 5, 2023, 8:37am

It looks like you have defined a storage pool that is no longer available.
In the past LXD would have failed to start under that scenario, but a few releases back we added the ability for LXD to start to allow manual removal or repair of an unavailable storage pool.

The expectation being that storage pools wouldn’t be left in an unavailable state for long periods of time, and not between releases.

However there is still a scenario where all storage pools must be available for LXD to start and that is when a storage related patch needs to be applied, which in this case there is.

Please can you run:

sudo sqlite3 /var/snap/lxd/common/lxd/database/global/db.bin  -header 'select * from storage_pools'
sudo sqlite3 /var/snap/lxd/common/lxd/database/global/db.bin  -header 'select * from storage_pools_config'

And if you are happy that the storage pool in question is not needed, we can prepare a DB patch file together to remove those errant records that are preventing startup.

See also https://linuxcontainers.org/lxd/docs/master/database/#running-custom-queries-at-lxd-daemon-startup

tomp · June 5, 2023, 8:38am

Also see Managing the LXD snap for managing snap update schedules/pausing it.

v3ss0n · June 5, 2023, 8:40am

Thanks , i need to disable updates . I think i had already did but somehow snap still updating.

v3ss0n · June 5, 2023, 8:41am

thats storage pool is broken (due to my mistake , a wrong truncate command ) , and everything gone .

here are the results

sudo sqlite3 /var/snap/lxd/common/lxd/database/global/db.bin  -header 'select * from storage_pools_config'
root@E3-2276G:/var/snap/lxd/common/lxd/storage-pools# sudo sqlite3 /var/snap/lxd/common/lxd/database/global/db.bin  -header 'select * from storage_pools'
id|name|driver|description|state
1|default|zfs||1
2|btrfs|btrfs||1
4|default-btrfs|btrfs||1
5|btrfs-deploy|btrfs||1
root@E3-2276G:/var/snap/lxd/common/lxd/storage-pools# sudo sqlite3 /var/snap/lxd/common/lxd/database/global/db.bin  -header 'select * from storage_pools_config'
id|storage_pool_id|node_id|key|value
2|1|1|size|600GB
3|1|1|source|/var/snap/lxd/common/lxd/disks/default.img
4|1|1|zfs.pool_name|default
10|4|1|size|280GB
11|4|1|source|/var/snap/lxd/common/lxd/disks/default-btrfs.img
13|5|1|size|200GB
14|5|1|source|/var/snap/lxd/common/lxd/disks/btrfs-deploy.img
15|2|1|size|120GB
16|2|1|source|/var/snap/lxd/common/lxd/disks/btrfs.img

tomp · June 5, 2023, 8:51am

OK so please create a file called /var/snap/lxd/common/lxd/database/patch.global.sql:

DELETE FROM storage_pools WHERE name = "default";

Then reload LXD:

sudo systemctl reload snap.lxd.daemon

v3ss0n · June 5, 2023, 8:57am

Should i deleted the patch file?

EDIT:
Thanks that solved , and patch is automatically deleted.

tomp · June 5, 2023, 8:58am

Yep, if not already.

v3ss0n · June 5, 2023, 9:02am

I think LXD need to put storages in configuration file instead of sqlite. Right now it is quite fragile that if one storage fails (that is common) whole daemon would fail to load and , that would break a lot of things. And then we cannot fix stuf without sql knowledge .
Also going in and deleting sqlite records is a bad idea in production , we can’t know what will go wrong if accidentally deleted some records.