LXD daemon became unresponsive while containers were still running during a server uptime of 15+ days.
Currently unsure what further troubleshooting steps to take next. LXD on this server contains firewall/router/dhcp functions along with a number of other production services which are down since the server reboot.
Exact failure date/time unknown because containers & services continued to function after the daemon/socket failure. Logged in to handle an administration task within a container and found:
root@r910n01:~# lxc list
LXD socket not found; is LXD installed and running?
OS: Ubuntu 16.04
- lxc --version
- 3.0.0.beta6
root@r910n01:~# snap list --all core
Name Version Rev Developer Notes
core 16-2.31 4017 canonical core,disabled
core 16-2.31.1 4110 canonical core,disabled
core 16-2.31.2 4206 canonical core
Gist of troubleshooting to date
0 systemctl reload snap.lxd.daemon
1 systemctl status snap.lxd.daemon
2 journalctl -u snap.lxd.daemon
3 journalctl -xe
4 lxc list
5 cat /var/snap/lxd/common/lxd/logs/lxd.log
6 snap revert lxd
7 reboot
Logs and output here: https://pastebin.ubuntu.com/p/V5qHzzkwkP/
[EDIT: Additional info]
lxc_info_–debug == https://pastebin.ubuntu.com/p/XSchZr2wt2/
snap info lxd == https://pastebin.ubuntu.com/p/yj2VwrqHfd/
.setup_mode file check == https://pastebin.ubuntu.com/p/x9VSm4tqp2/
zpool check == https://pastebin.ubuntu.com/p/GhkhvC4qKW/