Error: ZFS storage pool “default” could not be imported: cannot import ‘default’: no such pool available

I understand this has been resolved in the past over here [SOLVED]Error: ZFS storage pool "default" could not be imported: cannot import 'default': no such pool available and the solution did work for me but with differences,

  1. This was experienced on Ubuntu 19.10 (on a desktop)
  2. Using Snap (3.18) not .deb
  3. I can reproduce this issue on my system at-least. It seems like creating default pool in lxd init doesn’t work (after rebooting machine) and as long as I create default pool manually correctly things work correctly.

Also, I am pointing it out here because I lost all containers in this case (thankfully I had backups - still restoring) and the fact that I use LXD heavily in production. Not so sure if snap update botched up or there’s something about ubuntu 19.10 that LXD didn’t like or something else but it’d be great if devs can further look into this. I will be happy to provide more details about my system (if need be).

Here’s the output of lxd --debug --group lxd (with failed lxd service)

DBUG[11-03|12:30:51] Connecting to a local LXD over a Unix socket 
DBUG[11-03|12:30:51] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
INFO[11-03|12:30:51] LXD 3.18 is starting in normal mode      path=/var/snap/lxd/common/lxd
INFO[11-03|12:30:51] Kernel uid/gid map: 
INFO[11-03|12:30:51]  - u 0 0 4294967295 
INFO[11-03|12:30:51]  - g 0 0 4294967295 
INFO[11-03|12:30:51] Configured LXD uid/gid map: 
INFO[11-03|12:30:51]  - u 0 1000000 1000000000 
INFO[11-03|12:30:51]  - g 0 1000000 1000000000 
WARN[11-03|12:30:51] Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored. 
WARN[11-03|12:30:51] CGroup memory swap accounting is disabled, swap limits will be ignored. 
INFO[11-03|12:30:51] Kernel features: 
INFO[11-03|12:30:51]  - netnsid-based network retrieval: yes 
INFO[11-03|12:30:51]  - uevent injection: yes 
INFO[11-03|12:30:51]  - seccomp listener: yes 
INFO[11-03|12:30:51]  - unprivileged file capabilities: yes 
INFO[11-03|12:30:51]  - shiftfs support: yes 
INFO[11-03|12:30:51] Initializing local database 
DBUG[11-03|12:30:51] Initializing database gateway 
DBUG[11-03|12:30:51] Start database node                      id=1 address=
DBUG[11-03|12:30:51] Connecting to a local LXD over a Unix socket 
DBUG[11-03|12:30:51] Sending request to LXD                   method=GET url=http://unix.socket/1.0 etag=
DBUG[11-03|12:30:51] Detected stale unix socket, deleting 
INFO[11-03|12:30:51] Starting /dev/lxd handler: 
INFO[11-03|12:30:51]  - binding devlxd socket                 socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[11-03|12:30:51] REST API daemon: 
INFO[11-03|12:30:51]  - binding Unix socket                   socket=/var/snap/lxd/common/lxd/unix.socket
INFO[11-03|12:30:51] Initializing global database 
DBUG[11-03|12:30:51] Dqlite: connected address=1 attempt=0 
INFO[11-03|12:30:51] Initializing storage pools 
DBUG[11-03|12:30:51] Initializing and checking storage pool "default" 
DBUG[11-03|12:30:51] Checking ZFS storage pool "default" 
DBUG[11-03|12:30:51] ZFS storage pool "default" does not exist, trying to import it 
EROR[11-03|12:30:51] Failed to start the daemon: ZFS storage pool "default" could not be imported:  
INFO[11-03|12:30:51] Starting shutdown sequence 
INFO[11-03|12:30:51] Stopping REST API handler: 
INFO[11-03|12:30:51]  - closing socket                        socket=/var/snap/lxd/common/lxd/unix.socket
INFO[11-03|12:30:51] Stopping /dev/lxd handler: 
INFO[11-03|12:30:51]  - closing socket                        socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[11-03|12:30:51] Closing the database 
DBUG[11-03|12:30:51] Stop database gateway 
INFO[11-03|12:30:51] Unmounting temporary filesystems 
INFO[11-03|12:30:51] Done unmounting temporary filesystems 
Error: ZFS storage pool "default" could not be imported:

If such an error occurs, removing snap also fails (partially) with following output,

$ snap remove lxd
error: cannot perform the following tasks:
- Stop snap "lxd" services ([--root / enable snap.lxd.daemon.unix.socket] failed with exit status 1: Failed to enable unit, unit snap.lxd.daemon.unix.socket does not exist.)
- Remove data for snap "lxd" (12224) (remove /var/snap/lxd/common/ns/mntns: device or resource busy)

I’ve seen one such report so far, though on that particular system, attempting to debug this behavior resulted in things working again consistently…

It looks like zpool import is failing, but due to a bug in 3.18, we’re not getting the full output from it… On the system I’ve investigated before, just running a zpool list from outside the snap was sufficient to get things working, including following reboots.

Digging into the logs, I saw an entry in boot.log saying something on the lines of failed to import zpool.cache (I don’t have the exact log line though) and that led me to this (open) issue zpool.cache not updated when adding a pool.

What did work for me was commenting out ConditionPathExists=!/etc/zfs/zpool.cache from /lib/systemd/system/zfs-import-scan.service and enabling that service (solution suggested towards end of thread). After doing so, I have done some warm/cold reboot’s and haven’t had any problems.

I am not an expert in ZFS/LXD but it feels like more of ZFS issue than LXD and a sneaky fix, not a real solution. Do you see why suggested solution worked?