ZFS pool faulted, LXD not starting

Hi all, I have 2 pools, “z64” the system drive for my containers and “z600” the data drive for my containers. z600 is now faulted and seems to be preventing LXD from starting. The data on z600 is not super important, but I wouldn’t mind getting it back if it’s not too involved. Here’s what I’m seeing based on things I’ve gathered from some other posts. Please let me know if any additional information will help me get to the root of the problem.

lxc list

Error: Get “http://unix.socket/1.0”: dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory

systemctl stop lxd.service lxd.socket

Failed to stop lxd.service: Unit lxd.service not loaded.
Failed to stop lxd.socket: Unit lxd.socket not loaded.

lxd --debug --group lxd

DBUG[01-21|12:04:54] Mount started driver=zfs pool=z600
DBUG[01-21|12:04:54] Mount finished driver=zfs pool=z600
EROR[01-21|12:04:54] Failed to start the daemon: Failed initializing storage pool “z600”: Failed to run: zpool import z600: cannot import ‘z600’: pool was previously in use from another system.
Last accessed by (hostid=0) at Wed Jan 20 23:42:11 2021
The pool can be imported, use ‘zpool import -f’ to import the pool.
Error: Failed initializing storage pool “z600”: Failed to run: zpool import z600: cannot import ‘z600’: pool was previously in use from another system.
Last accessed by (hostid=0) at Wed Jan 20 23:42:11 2021
The pool can be imported, use ‘zpool import -f’ to import the pool.

zpool import -f z600

cannot import ‘z600’: I/O error
Recovery is possible, but will result in some data loss.
Returning the pool to its state as of Wed 20 Jan 2021 11:41:40 PM CST
should correct the problem. Approximately 31 seconds of data
must be discarded, irreversibly. Recovery can be attempted
by executing ‘zpool import -F z600’. A scrub of the pool
is strongly recommended after recovery.

zpool import -F z600

cannot import ‘z600’: pool was previously in use from another system.
Last accessed by (hostid=0) at Wed Jan 20 23:42:11 2021
The pool can be imported, use ‘zpool import -f’ to import the pool.

zpool import -d z600

no pools available to import

zpool import -d /dev/sdc1

pool: z600
id: 3524713278963610425
state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the ‘-f’ flag.
see: ZFS Message ID: ZFS-8000-72
config:

    z600        FAULTED  corrupted data
      sdc       ONLINE

lxc storage list

Error: Get “http://unix.socket/1.0”: dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory

which lxd

/snap/bin/lxd

lxd --version

4.0.4

I’d recommend looking at the help for zpool import, there is a lot of options one can try to recover from what looks like a corrupted pool.

Thank you, I’ll check the manpage for zpool to see what options there are to potentially get me going.

Is there a way to manually remove that storage from LXD so I can access my containers if I am unable to import the pool?

Yeah, worst case scenario you’ll need a small DB patch against the global database to delete the pool. This should cascade to delete most DB records that were related to it, though there will likely still be a bunch of things to cleanup both in DB and /var/snap/lxd/common/lxd

I was able to get it going with the command below.

zpool import -fFX -d /dev/disk/by-id z600

My containers are active again now. I really appreciate your help.