LXD 3.8 hangs after systemupdates and reboot

sandious · December 21, 2018, 1:33pm

ubuntu 16.04 lts
snap-version
LXD 3.8

after apt-update && apt-upgrade && a restart lxd isn’t coming up anymore.

lxd --debug --group lxd results in hanging
DBUG[12-21|14:12:22] Connecting to a local LXD over a Unix socket
DBUG[12-21|14:12:22] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=

lxd sql ‘SELECT * FROM containers;’ results
Description:
Execute a SQL query against the LXD local or global database

The local database is specific to the LXD cluster member you target the
command to, and contains member-specific data (such as the member network
address).

The global database is common to all LXD members in the cluster, and contains
cluster-specific data (such as profiles, containers, etc).

If you are running a non-clustered LXD instance, the same applies, as that
instance is effectively a single-member cluster.

If is the special value “-”, then the query is read from
standard input.

If is the special value “.dump”, the the command returns a SQL text
dump of the given database.

If is the special value “.schema”, the the command returns the SQL
text schema of the given database.

This internal command is mostly useful for debugging and disaster
recovery. The LXD team will occasionally provide hotfixes to users as a
set of database queries to fix some data inconsistency.

This command targets the global LXD database and works in both local
and cluster mode.

Usage:
lxd sql <local|global> [flags]

Global Flags:
-d, --debug Show all debug messages
-h, --help Print help
–logfile Path to the log file
–syslog Log to syslog
–trace Log tracing targets
-v, --verbose Show all information messages
–version Print version number
Error: Missing required arguments

journalctl -u snap.lxd.daemon results:

Dez 21 12:11:19 q101010 systemd[1]: Started Service for snap application lxd.daemon.
Dez 21 12:11:19 q101010 lxd.daemon[12347]: => Preparing the system
Dez 21 12:11:19 q101010 lxd.daemon[12347]: ==> Loading snap configuration
Dez 21 12:11:19 q101010 lxd.daemon[12347]: ==> Setting up mntns symlink (mnt:[4026532495])
Dez 21 12:11:20 q101010 lxd.daemon[12347]: ==> Setting up persistent shmounts path
Dez 21 12:11:20 q101010 lxd.daemon[12347]: ====> Making LXD shmounts use the persistent path
Dez 21 12:11:20 q101010 lxd.daemon[12347]: ====> Making LXCFS use the persistent path
Dez 21 12:11:20 q101010 lxd.daemon[12347]: ==> Setting up kmod wrapper
Dez 21 12:11:20 q101010 lxd.daemon[12347]: ==> Preparing /boot
Dez 21 12:11:20 q101010 lxd.daemon[12347]: ==> Preparing a clean copy of /run
Dez 21 12:11:20 q101010 lxd.daemon[12347]: ==> Preparing a clean copy of /etc
Dez 21 12:11:21 q101010 lxd.daemon[12347]: ==> Setting up ceph configuration
Dez 21 12:11:21 q101010 lxd.daemon[12347]: ==> Setting up LVM configuration
Dez 21 12:11:21 q101010 lxd.daemon[12347]: ==> Rotating logs
Dez 21 12:11:22 q101010 lxd.daemon[12347]: ==> Setting up ZFS (0.7)
Dez 21 12:11:22 q101010 lxd.daemon[12347]: ==> Escaping the systemd cgroups
Dez 21 12:11:22 q101010 lxd.daemon[12347]: ==> Escaping the systemd process resource limits
Dez 21 12:11:22 q101010 lxd.daemon[12347]: ==> Increasing the number of inotify user instances
Dez 21 12:11:22 q101010 lxd.daemon[12347]: => Starting LXCFS
Dez 21 12:11:22 q101010 lxd.daemon[12347]: mount namespace: 6
Dez 21 12:11:22 q101010 lxd.daemon[12347]: hierarchies:
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 0: fd: 7: pids
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 1: fd: 8: cpuset
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 2: fd: 9: hugetlb
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 3: fd: 10: rdma
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 4: fd: 11: net_cls,net_prio
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 5: fd: 12: freezer
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 6: fd: 13: perf_event
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 7: fd: 14: cpu,cpuacct
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 8: fd: 15: memory
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 9: fd: 16: blkio
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 10: fd: 17: devices
Dez 21 12:11:22 q101010 lxd.daemon[12347]: 11: fd: 18: name=systemd
Dez 21 12:11:22 q101010 lxd.daemon[12347]: => Starting LXD
Dez 21 12:11:22 q101010 lxd.daemon[12347]: t=2018-12-21T12:11:22+0100 lvl=warn msg="CGroup memory swap ac
Dez 21 12:11:26 q101010 lxd.daemon[12347]: t=2018-12-21T12:11:26+0100 lvl=eror msg="Failed to start the d
Dez 21 12:11:26 q101010 lxd.daemon[12347]: Error: ZFS storage pool “LXD-qbit” could not be imported: cann
Dez 21 12:11:26 q101010 systemd[1]: Started Service for snap application lxd.daemon.
Dez 21 12:11:27 q101010 lxd.daemon[12347]: => LXD failed to start
Dez 21 12:11:27 q101010 systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=137
Dez 21 12:11:28 q101010 lxd.daemon[14083]: => Stop reason is: crashed
Dez 21 12:11:28 q101010 systemd[1]: snap.lxd.daemon.service: Unit entered failed state.
Dez 21 12:11:28 q101010 systemd[1]: snap.lxd.daemon.service: Failed with result ‘exit-code’.
Dez 21 12:11:28 q101010 systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling resta
Dez 21 12:11:28 q101010 systemd[1]: Stopped Service for snap application lxd.daemon.
Dez 21 12:11:28 q101010 systemd[1]: Started Service for snap application lxd.daemon.
Dez 21 12:11:28 q101010 lxd.daemon[14227]: => Preparing the system
Dez 21 12:11:28 q101010 lxd.daemon[14227]: ==> Loading snap configuration
Dez 21 12:11:28 q101010 lxd.daemon[14227]: ==> Setting up mntns symlink (mnt:[4026532495])
Dez 21 12:11:28 q101010 lxd.daemon[14227]: ==> Setting up kmod wrapper
Dez 21 12:11:28 q101010 lxd.daemon[14227]: ==> Preparing /boot
Dez 21 12:11:28 q101010 lxd.daemon[14227]: ==> Preparing a clean copy of /run

any idea to solve this ?

simos · December 21, 2018, 3:58pm

The earliest error is the one with the ZFS storage pool. For some reason, it is not available to LXD and LXD cannot continue. The rest of the LXD commands would not help unless there is some info as to what happened to the ZFS storage pool.

You can have a look at dmesg to see if there are and relevant ZFS messages.
Run sudo zpool list to see the state of the pool.
Run sudo zfs list to see the state of the containers and container images.

Post the results here.

sandious · December 21, 2018, 5:35pm

thanks simos!

I coudn’t stop lxd therefore I achieved it by

killall lxd

and

zpool status

didn’t list the needed lxd pool. so
next I had a look into

ls -l /var/snap/lxd/common/lxd/disks/
-rw------- 1 root root 107374182400 Dez 20 14:11 default.img
-rw------- 1 root root 107374182400 Dez 20 14:00 LXD-qbit.img

where those images still have been there.
In order to save my work I copied those files to a save place

cp /var/snap/lxd/common/lxd/disks/* /opt/lxd-rescue/

I tried to recreate the missing pool by

truncate -s 100G default.img && truncate -s 100G LXD-qbit.img

and recreated a new zpool by

zpool create default /var/snap/lxd/common/lxd/disks/default.img && zpool create LXD-qbit /var/snap/lxd/common/lxd/disks/LXD-qbit.img

After zpool status showed me the new pools default and LXD-qbit I exported those pools due to copy back my saved images.

zpool export default && zpool export LXD-qbit

and copied those images back to lxd

mv /opt/lxd-rescue/*.img /var/snap/lxd/common/lxd/disks/

eventually import all zpools

zpool import -a

Tried than to start lxd but it still hangs, So I tried to give LXD the default project to start.

lxc config set storage.zfs_pool_name default

After Minutes of nothing I killed this command and restarted snap

snap restart lxd

after that I was able to start lxd and everthings worked like before.

worked now again.