Cluster + temp storage + destroy zpool -> broken LXD

JeToJedno · June 8, 2021, 6:47am

I created a temporary ZFS storage pool + volumes in my LXD cluster, then created an ephemeral test container on each host.

I then did some disk speed testing (LXD pools+volumes & ZFS mounts were equally good - best RW mixed throughput), shutdown the containers (systemctl poweroff) and tidied up.

In the process of tidying up I destroyed the ZFS pools behind the LXD storage pool and removed the LVs they used. Mistake.

Then lxc storage list showed the temp pool as in-use, so I couldn’t delete it. I restarted the hosts.

Now lxd is broken:

$ lxc cluster list
Error: Get "http://unix.socket/1.0": EOF

# journalctl -b0 -oshort-precise | grep -i lxd
... normal startup
Jun 08 06:43:50.903517 albans lxd.daemon[26905]: t=2021-06-08T06:43:50+0000 lvl=eror msg="Failed to start the daemon: Failed initializing storage pool \"temp-lxd\": Failed to run: zpool import temp-lxd: cannot import 'temp-lxd': no such pool available"
Jun 08 06:43:51.045566 albans lxd.daemon[26905]: Error: Failed initializing storage pool "temp-lxd": Failed to run: zpool import temp-lxd: cannot import 'temp-lxd': no such pool available
Jun 08 06:43:51.776278 albans lxd.daemon[26777]: => LXD failed to start
Jun 08 06:43:51.777271 albans systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 08 06:43:51.777451 albans systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.

Before I break it further:

is there a way to remove the storage pool from LXD simply?
or
will it restart if I recreate the destroyed zpool on all hosts

Thanks
david

tomp · June 8, 2021, 8:07am

Yes if you recreate the missing storage pool in ZFS it should allow it to restart.

You’ll then need to run the lxc delete commands to remove the instances from the DB that were on that pool.

LXD requires that all storage pools are available when starting. See

Support for starting LXD with degraded storage or network · Issue #8730 · lxc/lxd · GitHub

JeToJedno · June 8, 2021, 8:17am

Thanks again. Created the zpool, started cluster & deleted the volumes. Still can’t delete the storage pool (it’s showing 0 usage and has no volumes), but I’ll try again after the next reboot.

tomp · June 8, 2021, 8:23am

What error do you get when you run lxc storage delete <pool>?

Can you show output of lxc storage show <pool> please.

JeToJedno · June 9, 2021, 6:10am

# lxc storage list
+----------+--------+--------------------------+---------+---------+
|   NAME   | DRIVER |       DESCRIPTION        | USED BY |  STATE  |
+----------+--------+--------------------------+---------+---------+
| local    | zfs    | LXD system local storage | 7       | CREATED |
+----------+--------+--------------------------+---------+---------+
| temp-lxd | zfs    |                          | 0       | CREATED |
+----------+--------+--------------------------+---------+---------+

# lxc storage volume list temp-lxd
+------+------+-------------+--------------+---------+----------+
| TYPE | NAME | DESCRIPTION | CONTENT-TYPE | USED BY | LOCATION |
+------+------+-------------+--------------+---------+----------+

# lxc storage show temp-lxd
config: {}
description: ""
name: temp-lxd
driver: zfs
used_by: []
status: Created
locations:
- albans
- grantham
- uxbridge

# lxc storage delete temp-lxd
Error: Failed to run: zpool destroy temp-lxd: cannot destroy 'temp-lxd': pool is busy

Nothing in the journal:

Jun 09 06:00:10.587155 albans sudo[80096]:   albans : TTY=pts/1 ; PWD=/home/albans ; USER=root ; COMMAND=/usr/bin/lxc storage list
Jun 09 06:00:10.588022 albans sudo[80096]: pam_unix(sudo:session): session opened for user root by albans(uid=0)
Jun 09 06:00:10.631054 albans systemd[1]: Started snap.lxd.lxc.d31cdeff-a673-486e-aa30-1d0b3c43d43b.scope.
Jun 09 06:00:10.722039 albans sudo[80096]: pam_unix(sudo:session): session closed for user root
Jun 09 06:00:10.722827 albans systemd[1]: snap.lxd.lxc.d31cdeff-a673-486e-aa30-1d0b3c43d43b.scope: Succeeded.
Jun 09 06:00:22.691040 albans sudo[80171]:   albans : TTY=pts/1 ; PWD=/home/albans ; USER=root ; COMMAND=/usr/bin/lxc storage volume list temp-lxd
Jun 09 06:00:22.692139 albans sudo[80171]: pam_unix(sudo:session): session opened for user root by albans(uid=0)
Jun 09 06:00:22.733562 albans systemd[1]: Started snap.lxd.lxc.c417c3d6-5ee6-4e40-b473-db456561fcd0.scope.
Jun 09 06:00:22.823969 albans sudo[80171]: pam_unix(sudo:session): session closed for user root
Jun 09 06:00:22.824659 albans systemd[1]: snap.lxd.lxc.c417c3d6-5ee6-4e40-b473-db456561fcd0.scope: Succeeded.
Jun 09 06:01:14.492243 albans sudo[80270]:   albans : TTY=pts/1 ; PWD=/home/albans ; USER=root ; COMMAND=/usr/bin/lxc storage show temp-lxd
Jun 09 06:01:14.493723 albans sudo[80270]: pam_unix(sudo:session): session opened for user root by albans(uid=0)
Jun 09 06:01:14.534979 albans systemd[1]: Started snap.lxd.lxc.4122005d-d559-4976-af85-74f0e212fa60.scope.
Jun 09 06:01:14.621787 albans systemd[1]: snap.lxd.lxc.4122005d-d559-4976-af85-74f0e212fa60.scope: Succeeded.
Jun 09 06:01:14.623019 albans sudo[80270]: pam_unix(sudo:session): session closed for user root
Jun 09 06:01:48.364555 albans sudo[80343]:   albans : TTY=pts/1 ; PWD=/home/albans ; USER=root ; COMMAND=/usr/bin/lxc storage delete temp-lxd
Jun 09 06:01:48.365656 albans sudo[80343]: pam_unix(sudo:session): session opened for user root by albans(uid=0)
Jun 09 06:01:48.408065 albans systemd[1]: Started snap.lxd.lxc.1cbb6725-581c-4bc7-92f4-80298ffc4014.scope.
Jun 09 06:01:48.623391 albans systemd[1]: snap.lxd.lxc.1cbb6725-581c-4bc7-92f4-80298ffc4014.scope: Succeeded.
Jun 09 06:01:48.623420 albans sudo[80343]: pam_unix(sudo:session): session closed for user root
Jun 09 06:02:28.480344 albans sudo[80456]:   albans : TTY=pts/1 ; PWD=/home/albans ; USER=root ; COMMAND=/bin/journalctl -b0 -oshort-precise -e
Jun 09 06:02:28.481463 albans sudo[80456]: pam_unix(sudo:session): session opened for user root by albans(uid=0)

Found it:

# zfs list temp-lxd
NAME       USED  AVAIL     REFER  MOUNTPOINT
temp-lxd  97.5K  11.1G       24K  /temp-lxd

# mount | grep zfs
lxd-data on /mnt/lxd-data type zfs (rw,xattr,noacl)
temp-lxd on /temp-lxd type zfs (rw,xattr,noacl)

## on all hosts
# umount /temp-lxd

# lxc storage delete temp-lxd
Storage pool temp-lxd deleted

I don’t know where that mount point came from - I created the zfs pools with zpool create temp-lxd /dev/mapper/vgTemp-lxd on eahc host.