Removed one storage disk, LXD won't start - how to fix?

I’ve configured LXD with storage on a separate disk, and one of containers was placed on this disk.

Some time later, the disk started to fail - because it was a single disk and the container on it was not important, I wanted to remove the disk by using the following process:

  • remove the container
  • remove the storage

However, “lxc delete $container” didn’t work, because there were input/output errors on disk.

So I’ve powered off the server, pulled the disk out, powered the server on - unfortunately, LXD won’t start anymore:

# lxc list
Error: Get "http://unix.socket/1.0": EOF

# systemctl status snap.lxd.daemon.service
● snap.lxd.daemon.service - Service for snap application lxd.daemon
     Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static)
     Active: active (running) since Wed 2021-12-01 15:38:17 UTC; 2s ago
TriggeredBy: ● snap.lxd.daemon.unix.socket
   Main PID: 99314
      Tasks: 0 (limit: 9223)
     Memory: 328.0K
        CPU: 119ms
     CGroup: /system.slice/snap.lxd.daemon.service

Dec 01 15:38:18 backup lxd.daemon[99453]: - cpuview_daemon
Dec 01 15:38:18 backup lxd.daemon[99453]: - loadavg_daemon
Dec 01 15:38:18 backup lxd.daemon[99453]: - pidfds
Dec 01 15:38:19 backup lxd.daemon[99314]: => Starting LXD
Dec 01 15:38:19 backup lxd.daemon[99465]: t=2021-12-01T15:38:19+0000 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
Dec 01 15:38:19 backup lxd.daemon[99465]: t=2021-12-01T15:38:19+0000 lvl=eror msg="Failed to start the daemon" err="Failed initializing storage pool \"samsung\": Source path \"/var/lib/snapd/hostfs/lxd/storage/samsung\" isn't btrfs"
Dec 01 15:38:19 backup lxd.daemon[99465]: Error: Failed initializing storage pool "samsung": Source path "/var/lib/snapd/hostfs/lxd/storage/samsung" isn't btrfs
Dec 01 15:38:20 backup lxd.daemon[99314]: => LXD failed to start
Dec 01 15:38:20 backup systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Dec 01 15:38:20 backup systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.

What is the best way to recover from this situation?

The easiest would be:

  • truncate -s 1G blah.img
  • mkfs.btrfs blah.img
  • mount blah.img /lxd/storage/samsung

That should make LXD happy (the pool will existing and will be btrfs, albeit empty), then go in LXD and delete everything that uses that pool as well as the pool itself.

Nope, there is still something preventing storage removal.

While it let me remove the container, it doesn’t allow me to remove storage due to:

# lxc list
Error: Get "http://unix.socket/1.0": EOF

# mount -o loop -t btrfs /data/tmp/image.img /lxd/storage/samsung/

# lxc list
(...works...)

# lxc storage info samsung
info:
  description: ""
  driver: btrfs
  name: samsung
  space used: 3.74MB
  total space: 1.07GB
used by: {}

# lxc storage delete samsung
Error: Failed getting image info for "57198c6ec93dfffe64dc77daef1529f22c3654022c7968b1d24b11c64eb45b39": No such object

With LXD can you show the output of:

lxd sql global 'select * from storage_volumes where storage_pool_id = (select id from storage_pools where name = "samsung")'
# lxd sql global 'select * from storage_volumes where storage_pool_id = (select id from storage_pools where name = "samsung")'
+-----+------------------------------------------------------------------+-----------------+---------+------+-------------+------------+--------------+
| id  |                               name                               | storage_pool_id | node_id | type | description | project_id | content_type |
+-----+------------------------------------------------------------------+-----------------+---------+------+-------------+------------+--------------+
| 215 | 57198c6ec93dfffe64dc77daef1529f22c3654022c7968b1d24b11c64eb45b39 | 2               | 1       | 1    |             | 1          | 0            |
+-----+------------------------------------------------------------------+-----------------+---------+------+-------------+------------+--------------+

OK cool, so now run:

lxd sql global 'delete from storage_volumes where id = 215'

And then try deleting the storage pool.

Wohoo, it worked!

# lxc storage delete samsung
Storage pool samsung deleted
# 

Thank you!

1 Like