LXD 5.14 doesn't start on ubuntu 18.04 server kernel 4.15 - Please help

I request for help. The details are as shown below:
System : Ubuntu Ubuntu 18.04.6 LTS Kernel Linux varuna 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
(varuna is the server name)
snap list
Name Version Rev Tracking Publisher Notes
core 16-2.58.3 14946 latest/stable canonical✓ core
core18 20230503 2751 latest/stable canonical✓ base
core20 20230503 1891 latest/stable canonical✓ base
core22 20230531 750 latest/stable canonical✓ base
lxd 5.14-7072c7b 24918 latest/stable canonical✓ -
snap services lxd
Service Startup Current Notes
lxd.activate enabled inactive -
lxd.daemon enabled inactive socket-activated
lxd.user-daemon enabled inactive socket-activated
Problem
sudo journalctl -u snap.lxd.daemon -n 30
– Logs begin at Tue 2023-06-06 17:13:53 UTC, end at Mon 2023-06-12 05:52:48 UTC. –
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Loading snap configuration
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Setting up mntns symlink (mnt:[4026534482])
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Setting up kmod wrapper
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Preparing /boot
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Preparing a clean copy of /run
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Preparing /run/bin
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Preparing a clean copy of /etc
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Preparing a clean copy of /usr/share/misc
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Setting up ceph configuration
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Setting up LVM configuration
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Setting up OVN configuration
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Rotating logs
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Escaping the systemd cgroups
Jun 12 05:45:07 varuna lxd.daemon[9059]: ====> Detected cgroup V1
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Escaping the systemd process resource limits
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Disabling shiftfs on this kernel (auto)
Jun 12 05:45:07 varuna lxd.daemon[9059]: => Re-using existing LXCFS
Jun 12 05:45:07 varuna lxd.daemon[9059]: ==> Reloading LXCFS
Jun 12 05:45:07 varuna lxd.daemon[9059]: => Starting LXD
Jun 12 05:45:07 varuna lxd.daemon[9059]: time=“2023-06-12T05:45:07Z” level=warning msg=" - Couldn’t find the CGroup memory swap accounting, swap limits will be ignored"
Jun 12 05:45:08 varuna lxd.daemon[9059]: time=“2023-06-12T05:45:08Z” level=error msg=“Failed to start the daemon” err="Failed to start dqlite server: raft_start(): io: closed segment 0000000000189781
Jun 12 05:45:08 varuna lxd.daemon[9059]: Error: Failed to start dqlite server: raft_start(): io: closed segment 0000000000189781-0000000000189827 is past last snapshot snapshot-4-189441-5402297704
Jun 12 05:45:08 varuna systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 12 05:45:08 varuna systemd[1]: snap.lxd.daemon.service: Failed with result ‘exit-code’.
Jun 12 05:45:09 varuna systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Jun 12 05:45:09 varuna systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 10.
Jun 12 05:45:09 varuna systemd[1]: Stopped Service for snap application lxd.daemon.
Jun 12 05:45:09 varuna systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
Jun 12 05:45:09 varuna systemd[1]: snap.lxd.daemon.service: Failed with result ‘exit-code’.
Jun 12 05:45:09 varuna systemd[1]: Failed to start Service for snap application lxd.daemon.

lxc ls
Error: Get “http://unix.socket/1.0”: dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused

Please help
Sincerely
pgm@nitk.edu.in
Mohanan PG
NITK Surathkal, India

P.S
systemctl status snap.lxd.daemon.service
● snap.lxd.daemon.service - Service for snap application lxd.daemon
Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2023-06-12 05:45:09 UTC; 19min ago
Main PID: 9059 (code=exited, status=1/FAILURE)

Jun 12 05:45:09 varuna systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Jun 12 05:45:09 varuna systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 10.
Jun 12 05:45:09 varuna systemd[1]: Stopped Service for snap application lxd.daemon.
Jun 12 05:45:09 varuna systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
Jun 12 05:45:09 varuna systemd[1]: snap.lxd.daemon.service: Failed with result ‘exit-code’.
Jun 12 05:45:09 varuna systemd[1]: Failed to start Service for snap application lxd.daemon.

sudo ls -lh /var/snap/lxd/common/lxd/
total 68K
drwx------ 4 root root 4.0K Oct 19 2020 backups
drwx------ 4 root root 4.0K Sep 5 2022 cache
drwx–x–x 2 root root 4.0K Mar 21 02:40 containers
drwx------ 4 root root 4.0K Jun 9 17:34 database
drwx–x–x 24 root root 4.0K Jun 12 05:45 devices
drwxr-xr-x 2 root root 4.0K Jun 9 16:57 devlxd
drwx------ 2 root root 4.0K Nov 14 2019 disks
drwx------ 2 root root 4.0K Jun 9 17:31 images
drwx------ 68 root root 4.0K Jun 12 05:45 logs
drwx–x–x 2 root root 4.0K Nov 14 2019 networks
drwx------ 4 root root 4.0K Nov 15 2019 security
-rw-r–r-- 1 root root 733 Nov 14 2019 server.crt
-rw------- 1 root root 288 Nov 14 2019 server.key
lrwxrwxrwx 1 root root 39 Jun 12 05:44 shmounts → /var/snap/lxd/common/shmounts/instances
drwx------ 2 root root 4.0K Mar 13 05:21 snapshots
drwx–x–x 3 root root 4.0K Nov 15 2019 storage-pools
srw-rw---- 1 root lxd 0 Jun 12 05:44 unix.socket
drwx–x–x 2 root root 4.0K Jan 23 2020 virtual-machines
drwx------ 2 root root 4.0K Jan 23 2020 virtual-machines-snapshots

The Solution from NITK Team (Thanks to Sushant Rao) :
LXD has released newer versions where they mandated the use of ZFS > 0.8, but in bionic(4.15 kernel) only ZFS 0.7.5 is there. So added a HWE kernel for bionic(which makes the version 5.4.0 & makes ZFS 0.8.5). This was the issue in both of the systems.
But one server had a problem of DQlite file corruption. There were 4 segments that were complete when compared against the snapshots captured.
So those corrupted segments were identified and delete. Then LXD restored its state from the last snapshot nearest to that segment.
Recommended : Upgrade the OS to 20.04/22.04 (after taking complete backup of the current machines) - 18.04 is now officially out of support.

Our Sincere gratitude to the community who helped us. :slight_smile:

Hi,

This sounds pretty similar to How to recover from Failed LXD snap auto update? - #3 by JonathanK as you’re both using LXD 5.14 on Ubuntu 18.04.

I’ve notified the dqlite team of both your cases, and the steps in that post should help you get back up and running.

Thanks
Tom

Were you previously getting any start up errors about " Required tool 'zpool' is missing" btw?

Can you list the contents of /var/snap/lxd/common/lxd/database/global please @mohananpg ?

Thank you very much for the suggestions. The issue is resolved.

1 Like

zpool list is showing default correctly.
lxc ls works now.

It shows
0000000000181250-0000000000181824 0000000000184322-0000000000185029 0000000000188025-0000000000188417 db.bin-wal snapshot-4-188417-4741894662.meta
0000000000181825-0000000000182273 0000000000185030-0000000000185345 0000000000188418-0000000000189021 metadata1 snapshot-4-189441-5402297704
0000000000182274-0000000000182849 0000000000185346-0000000000186042 0000000000189022-0000000000189441 metadata2 snapshot-4-189441-5402297704.meta
0000000000182850-0000000000182901 0000000000186043-0000000000186369 0000000000189442-0000000000189766 open-1
0000000000182902-0000000000183297 0000000000186370-0000000000186701 0000000000189767-0000000000189810 open-2
0000000000183298-0000000000184004 0000000000186702-0000000000187393 0000000000189811-0000000000189988 open-3
0000000000184005-0000000000184321 0000000000187394-0000000000188024 db.bin snapshot-4-188417-4741894662
now.
The lxc ls works now. My team had followed it up.

The Solution from NITK Team (Thanks to Sushant Rao) :
LXD has released newer versions where they mandated the use of ZFS > 0.8, but in bionic(4.15 kernel) only ZFS 0.7.5 is there. So added a HWE kernel for bionic(which makes the version 5.4.0 & makes ZFS 0.8.5). This was the issue in both of the systems.
But one server had a problem of DQlite file corruption. There were 4 segments that were complete when compared against the snapshots captured.
So those corrupted segments were identified and delete. Then LXD restored its state from the last snapshot nearest to that segment.
Recommended : Upgrade the OS to 20.04/22.04 (after taking complete backup of the current machines) - 18.04 is now officially out of support.

Our Sincere gratitude to the community who helped us. :slight_smile:

2 Likes