One LXD wont come up after reboot (again)- Nah, problem with freaking LXD changing config after reboot, ZFS is fine

IT seems that when ever reboot one of the servers there is always a problem on the way up.

zpool get version
The ZFS modules are not loaded.
Try running ‘/sbin/modprobe zfs’ as root to load them.
zfs get version
The ZFS modules are not loaded.
Try running ‘/sbin/modprobe zfs’ as root to load them.

It looks like after reboot is lost zfs install
Any ideas, more info below.

±---------±-------------------------±---------±--------±--------------------------------------+
| NAME | URL | DATABASE | STATE | MESSAGE |
±---------±-------------------------±---------±--------±--------------------------------------+
|
lxc version Client version: 3.0.3
lxd version 3.0.3
MOE systemd[1]: lxd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
May 10 18:02:52 MOE lxd[1993]: Error: LXD still not running after 600s timeout ()
May 10 18:02:52 MOE systemd[1]: lxd.service: Control process exited, code=exited status=1
May 10 18:02:52 MOE systemd[1]: lxd.service: Failed with result ‘exit-code’.
May 10 18:02:52 MOE systemd[1]: Failed to start LXD - main daemon.
May 10 18:02:52 MOE systemd[1]: lxd.service: Service hold-off time over, scheduling restart.
May 10 18:02:52 MOE systemd[1]: lxd.service: Scheduled restart job, restart counter is at 1.
May 10 18:02:52 MOE systemd[1]: Stopped LXD - main daemon.
May 10 18:02:52 MOE systemd[1]: Starting LXD - main daemon…
t=2019-05-10T18:02:52-0400 lvl=info msg=“LXD 3.0.3 is starting in normal mode” path=/var/lib/lxd
t=2019-05-10T18:02:52-0400 lvl=info msg=“Kernel uid/gid map:”
t=2019-05-10T18:02:52-0400 lvl=info msg=" - u 0 0 4294967295"
t=2019-05-10T18:02:52-0400 lvl=info msg=" - g 0 0 4294967295"
t=2019-05-10T18:02:52-0400 lvl=info msg=“Configured LXD uid/gid map:”
t=2019-05-10T18:02:52-0400 lvl=info msg=" - u 0 100000 65536"
t=2019-05-10T18:02:52-0400 lvl=info msg=" - g 0 100000 65536"
t=2019-05-10T18:02:52-0400 lvl=warn msg=“CGroup memory swap accounting is disabled, swap limits will be ignored.”
t=2019-05-10T18:02:52-0400 lvl=info msg=“Kernel features:”
t=2019-05-10T18:02:52-0400 lvl=info msg=" - netnsid-based network retrieval: no"
t=2019-05-10T18:02:52-0400 lvl=info msg=" - unprivileged file capabilities: yes"
t=2019-05-10T18:02:52-0400 lvl=info msg=“Initializing local database”

Looks like it needs a new battery or starter.

Let me know what to try next.

Thanks

Hi!

For some reason, the ZFS kernel module is not being loaded automatically.
For Ubuntu 16.04 or newer, these kernel modules are provided for you.
So, do you run Ubuntu or an another distribution?

Ubuntu 18.04
It basically looks like it is not installed

Ubuntu 18.04 already has built-in ZFS support.
As long as you run a stock kernel (NOT kernel-mainline), and you have not installed any ZFS DKMS packages, and haved not blacklisted the ZFS kernel modules, it should work.
See if any of the above might be happening, and then we can rectify.

I just did apt upgrade and rebooted
stock kernel, and DKMS as far as I know is not installed.
Been very careful not to mess with zfs pool, I have backup of img file 100g, and storage pools local containers show empty directory
Was trying to mount them but that didnt work

Done following
root@MOE:/var/lib/lxd/storage-pools# zpool get all
root@MOE:/var/lib/lxd/storage-pools# zpool list
no pools available
root@MOE:/var/lib/lxd/storage-pools# zpool status
no pools available
root@MOE:/var/lib/lxd/storage-pools# zpool get version
root@MOE:/var/lib/lxd/storage-pools# /sbin/modprobe zfs
root@MOE:/var/lib/lxd/storage-pools# zpool history
no pools available
root@MOE:/var/lib/lxd/storage-pools#
dkms status

Command ‘dkms’ not found, but can be installed with:

apt install dkms

Not sure if I have ZFs problem that is causing LXD not to work or the other way around.
If I reboot and start from zero I get
zpool history
The ZFS modules are not loaded.
Try running ‘/sbin/modprobe zfs’ as root to load them.

If I could just mount containers I could copy them off and reinstall LXD

This works, so I see data is there

zdb -v
local:
version: 5000
name: ‘local’
state: 0
txg: 4027066
pool_guid: 18204573411481427592
errata: 0
hostname: ‘MOE’
com.delphix:has_per_vdev_zaps
vdev_children: 1
vdev_tree:
type: ‘root’
id: 0
guid: 18204573411481427592
children[0]:
type: ‘file’
id: 0
guid: 5585528005014743118
path: ‘/var/lib/lxd/disks/local.img’
metaslab_array: 131
metaslab_shift: 29
ashift: 9
asize: 107369463808
is_log: 0
DTL: 409
create_txg: 4
com.delphix:vdev_zap_leaf: 129
com.delphix:vdev_zap_top: 130
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data

I got this working zpool import -d /var/lib/lxd/disks/ local

zpool status
pool: local
state: ONLINE
scan: scrub repaired 0B in 0h19m with 0 errors on Sun Apr 14 00:43:48 2019
config:

NAME                            STATE     READ WRITE CKSUM
local                           ONLINE       0     0     0
  /var/lib/lxd/disks/local.img  ONLINE       0     0     0

errors: No known data errors

The ZSfs is mounted but I still can get to containers inside, they are still blank
Tried mounting them individually, so far no go
First create /mnt/fs directory
mount local/containers/AI-GENIE /mnt/fs
mount: /mnt/fs: special device local/containers/AI-GENIE does not exist.

The command to mount the individual containers is
zfs mount local/containers/WP-HAPPYDOGS2

So I can access my ZFS, Data, now I have rebuild server again.

lxd is showing many…

/usr/lib/lxd/lxd/ activateifneeded

and

/user/lib/lxd/lxd waitready --timeout=600

Not sure what is stopping them from working.