How to recover from Failed LXD snap auto update?

Environment:
Ubuntu: 18.04
LXD Snap: Channel = latest/stable

So I’ve been blissfully unaware of snap auto-updating lxd. Shame on me, but I’m a very part time sysadmin. Also, Ubuntu 18.04 is now out of support. Due to a UPS failure, my server rebooted and when it came back up, lxd refused to run. Up until that time, my containers apparently remained running even as the snap auto update had failed.

Initially, I got: Error: Required tool 'zpool' is missing. And I found this thread

So I tried to change channels to 5.0/stable. But there’s since been a database change that prevents me from downgrading. So after some more fiddling, I’m back to channel = latest/stable.

Jun 05 2023 21:52:19 root@pogo:~# snap info lxd|grep -E 'snap-id:|tracking:|installed:'
snap-id:      J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking:     latest/stable
installed:          5.14-7072c7b             (24918) 178MB -

And I’ve installed the HWE kernel to get to zfs 0.8.3.

Jun 05 2023 21:26:12 root@pogo:~# modinfo zfs | grep version
version:        0.8.3-1ubuntu12.14

But:

Jun 05 2023 21:27:06 root@pogo:~# lxc list
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused

and:

Jun 05 2023 21:34:00 root@pogo:~# systemctl status snap.lxd.daemon.service
● snap.lxd.daemon.service - Service for snap application lxd.daemon
   Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2023-06-05 08:39:14 EDT; 12h ago
  Process: 18172 ExecStart=/usr/bin/snap run lxd.daemon (code=exited, status=1/FAILURE)
 Main PID: 18172 (code=exited, status=1/FAILURE)

Jun 05 08:39:14 pogo systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Jun 05 08:39:14 pogo systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 7.
Jun 05 08:39:14 pogo systemd[1]: Stopped Service for snap application lxd.daemon.
Jun 05 08:39:14 pogo systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
Jun 05 08:39:14 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jun 05 08:39:14 pogo systemd[1]: Failed to start Service for snap application lxd.daemon.

This is all after the lxd snap was refreshed to latest/stable and the server was shutdown and restarted.

Snap exists on this server only for LXD:

Jun 05 2023 21:53:19 root@pogo:~# snap list
Name         Version       Rev    Tracking       Publisher   Notes
core         16-2.58.3     14946  latest/stable  canonical✓  core
core20       20230503      1891   latest/stable  canonical✓  base
core22       20230503      634    latest/stable  canonical✓  base
hello-world  6.4           29     latest/stable  canonical✓  -
lxd          5.14-7072c7b  24918  latest/stable  canonical✓  -
snapd        2.59.4        19361  latest/stable  canonical✓  snapd

And snap appears to be working:

Jun 05 2023 22:15:02 root@pogo:~# hello-world
Hello World!

I’ve considered upgrading Ubuntu to 20.04. But I figured that I should ask here first. I’d prefer to resolve the issue and have a working system before I do an OS upgrade.

What should I attempt next? What other information will help diagnose the issue?

What does journalctl show for the reason lxd is failing to start?

Starting after my most recent reboot of the server yesterday:

Jun 06 2023 14:20:26 root@pogo:~# journalctl -u snap.lxd.daemon.service --since yesterday
-- Logs begin at Mon 2023-04-03 14:19:10 EDT, end at Tue 2023-06-06 13:30:41 EDT. --
Jun 05 08:38:55 pogo systemd[1]: Started Service for snap application lxd.daemon.
Jun 05 08:38:55 pogo lxd.daemon[16274]: => Preparing the system (24918)
Jun 05 08:38:55 pogo lxd.daemon[16274]: ==> Loading snap configuration
Jun 05 08:38:55 pogo lxd.daemon[16274]: ==> Setting up mntns symlink (mnt:[4026532630])
Jun 05 08:38:55 pogo lxd.daemon[16274]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/storage-pools
Jun 05 08:38:55 pogo lxd.daemon[16274]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/devices
Jun 05 08:38:55 pogo lxd.daemon[16274]: ==> Setting up persistent shmounts path
Jun 05 08:38:55 pogo lxd.daemon[16274]: ====> Making LXD shmounts use the persistent path
Jun 05 08:38:55 pogo lxd.daemon[16274]: ====> Making LXCFS use the persistent path
Jun 05 08:38:55 pogo lxd.daemon[16274]: ==> Setting up kmod wrapper
Jun 05 08:38:55 pogo lxd.daemon[16274]: ==> Preparing /boot
Jun 05 08:38:55 pogo lxd.daemon[16274]: ==> Preparing a clean copy of /run
Jun 05 08:38:55 pogo lxd.daemon[16274]: ==> Preparing /run/bin
Jun 05 08:38:55 pogo lxd.daemon[16274]: ==> Preparing a clean copy of /etc
Jun 05 08:38:56 pogo lxd.daemon[16274]: ==> Preparing a clean copy of /usr/share/misc
Jun 05 08:38:56 pogo lxd.daemon[16274]: ==> Setting up ceph configuration
Jun 05 08:38:56 pogo lxd.daemon[16274]: ==> Setting up LVM configuration
Jun 05 08:38:56 pogo lxd.daemon[16274]: ==> Setting up OVN configuration
Jun 05 08:38:56 pogo lxd.daemon[16274]: ==> Rotating logs
Jun 05 08:38:57 pogo lxd.daemon[16274]: ==> Setting up ZFS (0.8)
Jun 05 08:38:57 pogo lxd.daemon[16274]: ==> Escaping the systemd cgroups
Jun 05 08:38:57 pogo lxd.daemon[16274]: ====> Detected cgroup V1
Jun 05 08:38:57 pogo lxd.daemon[16274]: ==> Escaping the systemd process resource limits
Jun 05 08:38:57 pogo lxd.daemon[16274]: ==> Increasing the number of keys for a nonroot user
Jun 05 08:38:57 pogo lxd.daemon[16274]: ==> Increasing the number of bytes for a nonroot user
Jun 05 08:38:57 pogo lxd.daemon[16274]: ==> Disabling shiftfs on this kernel (auto)
Jun 05 08:38:57 pogo lxd.daemon[16274]: => Starting LXCFS
Jun 05 08:38:57 pogo lxd.daemon[16274]: Running constructor lxcfs_init to reload liblxcfs
Jun 05 08:38:57 pogo lxd.daemon[16274]: mount namespace: 4
Jun 05 08:38:57 pogo lxd.daemon[16274]: hierarchies:
Jun 05 08:38:57 pogo lxd.daemon[16274]:   0: fd:   5:
Jun 05 08:38:57 pogo lxd.daemon[16274]:   1: fd:   6: name=systemd
Jun 05 08:38:57 pogo lxd.daemon[16274]:   2: fd:   7: rdma
Jun 05 08:38:57 pogo lxd.daemon[16274]:   3: fd:   8: memory
Jun 05 08:38:57 pogo lxd.daemon[16274]:   4: fd:   9: pids
Jun 05 08:38:57 pogo lxd.daemon[16274]:   5: fd:  10: freezer
Jun 05 08:38:57 pogo lxd.daemon[16274]:   6: fd:  11: cpu,cpuacct
Jun 05 08:38:57 pogo lxd.daemon[16274]:   7: fd:  12: blkio
Jun 05 08:38:57 pogo lxd.daemon[16274]:   8: fd:  13: devices
Jun 05 08:38:57 pogo lxd.daemon[16274]:   9: fd:  14: net_cls,net_prio
Jun 05 08:38:57 pogo lxd.daemon[16274]:  10: fd:  15: hugetlb
Jun 05 08:38:57 pogo lxd.daemon[16274]:  11: fd:  16: perf_event
Jun 05 08:38:57 pogo lxd.daemon[16274]:  12: fd:  17: cpuset
Jun 05 08:38:57 pogo lxd.daemon[16274]: Kernel supports pidfds
Jun 05 08:38:57 pogo lxd.daemon[16274]: Kernel does not support swap accounting
Jun 05 08:38:57 pogo lxd.daemon[16274]: api_extensions:
Jun 05 08:38:57 pogo lxd.daemon[16274]: - cgroups
Jun 05 08:38:57 pogo lxd.daemon[16274]: - sys_cpu_online
Jun 05 08:38:57 pogo lxd.daemon[16274]: - proc_cpuinfo
Jun 05 08:38:57 pogo lxd.daemon[16274]: - proc_diskstats
Jun 05 08:38:57 pogo lxd.daemon[16274]: - proc_loadavg
Jun 05 08:38:57 pogo lxd.daemon[16274]: - proc_meminfo
Jun 05 08:38:57 pogo lxd.daemon[16274]: - proc_stat
Jun 05 08:38:57 pogo lxd.daemon[16274]: - proc_swaps
Jun 05 08:38:57 pogo lxd.daemon[16274]: - proc_uptime
Jun 05 08:38:57 pogo lxd.daemon[16274]: - proc_slabinfo
Jun 05 08:38:57 pogo lxd.daemon[16274]: - shared_pidns
Jun 05 08:38:57 pogo lxd.daemon[16274]: - cpuview_daemon
Jun 05 08:38:57 pogo lxd.daemon[16274]: - loadavg_daemon
Jun 05 08:38:57 pogo lxd.daemon[16274]: - pidfds
Jun 05 08:39:03 pogo systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 05 08:39:03 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jun 05 08:39:03 pogo systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Jun 05 08:39:03 pogo systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 1.
Jun 05 08:39:03 pogo systemd[1]: Stopped Service for snap application lxd.daemon.
Jun 05 08:39:03 pogo systemd[1]: Started Service for snap application lxd.daemon.
Jun 05 08:39:03 pogo lxd.daemon[16737]: => Preparing the system (24918)
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Loading snap configuration
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Setting up mntns symlink (mnt:[4026532630])
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Setting up kmod wrapper
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Preparing /boot
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Preparing a clean copy of /run
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Preparing /run/bin
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Preparing a clean copy of /etc
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Preparing a clean copy of /usr/share/misc
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Setting up ceph configuration
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Setting up LVM configuration
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Setting up OVN configuration
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Rotating logs
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Setting up ZFS (0.8)
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Escaping the systemd cgroups
Jun 05 08:39:03 pogo lxd.daemon[16737]: ====> Detected cgroup V1
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Escaping the systemd process resource limits
Jun 05 08:39:03 pogo lxd.daemon[16737]: ==> Disabling shiftfs on this kernel (auto)
Jun 05 08:39:03 pogo lxd.daemon[16737]: => Re-using existing LXCFS
Jun 05 08:39:04 pogo lxd.daemon[16737]: ==> Reloading LXCFS
Jun 05 08:39:04 pogo lxd.daemon[16737]: => Starting LXD
Jun 05 08:39:04 pogo lxd.daemon[16737]: time="2023-06-05T08:39:04-04:00" level=warning msg=" - Couldn't find the CGroup blkio.weight, disk priority will be ignored"
Jun 05 08:39:04 pogo lxd.daemon[16737]: time="2023-06-05T08:39:04-04:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
Jun 05 08:39:04 pogo lxd.daemon[16737]: time="2023-06-05T08:39:04-04:00" level=error msg="Failed to start the daemon" err="Failed to start dqlite server: raft_start(): io: closed segment 0000000000023376-0000000000023392 is past last snapshot snapshot-1-22528-9655034584"
Jun 05 08:39:04 pogo lxd.daemon[16737]: Error: Failed to start dqlite server: raft_start(): io: closed segment 0000000000023376-0000000000023392 is past last snapshot snapshot-1-22528-9655034584
Jun 05 08:39:05 pogo systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 05 08:39:05 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jun 05 08:39:05 pogo systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Jun 05 08:39:05 pogo systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 2.
Jun 05 08:39:05 pogo systemd[1]: Stopped Service for snap application lxd.daemon.

After this, logs similar to the second starting attempt are repeated. I can provide them if necessary.

… and the intermediate step where I tried to downgrade and LXD complained of the schema error (in case this is of any assistance)

May 31 15:25:48 pogo systemd[1]: Stopping Service for snap application lxd.daemon...
May 31 15:25:49 pogo lxd.daemon[20835]: => Stop reason is: host shutdown
May 31 15:25:49 pogo lxd.daemon[20835]: => Stopping LXD (with container shutdown)
May 31 15:25:49 pogo lxd.daemon[20835]: => Stopping LXCFS
May 31 15:25:49 pogo systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
May 31 15:25:50 pogo lxd.daemon[20835]: ==> Stopped LXCFS
May 31 15:25:50 pogo lxd.daemon[20835]: => Cleaning up PID files
May 31 15:25:50 pogo lxd.daemon[20835]: => Cleaning up namespaces
May 31 15:25:50 pogo lxd.daemon[20835]: => All done
May 31 15:25:50 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
May 31 15:25:50 pogo systemd[1]: Stopped Service for snap application lxd.daemon.
May 31 15:25:50 pogo systemd[1]: Started Service for snap application lxd.daemon.
May 31 15:25:50 pogo lxd.daemon[20989]: => Preparing the system (24322)
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Loading snap configuration
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Setting up mntns symlink (mnt:[4026532617])
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Setting up kmod wrapper
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Preparing /boot
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Preparing a clean copy of /run
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Preparing /run/bin
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Preparing a clean copy of /etc
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Preparing a clean copy of /usr/share/misc
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Setting up ceph configuration
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Setting up LVM configuration
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Setting up OVN configuration
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Rotating logs
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Setting up ZFS (0.7)
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Escaping the systemd cgroups
May 31 15:25:50 pogo lxd.daemon[20989]: ====> Detected cgroup V1
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Escaping the systemd process resource limits
May 31 15:25:50 pogo lxd.daemon[20989]: ==> Disabling shiftfs on this kernel (auto)
May 31 15:25:50 pogo lxd.daemon[20989]: => Starting LXCFS
May 31 15:25:50 pogo lxd.daemon[20989]: Running constructor lxcfs_init to reload liblxcfs
May 31 15:25:50 pogo lxd.daemon[20989]: mount namespace: 4
May 31 15:25:50 pogo lxd.daemon[20989]: hierarchies:
May 31 15:25:50 pogo lxd.daemon[20989]:   0: fd:   5:
May 31 15:25:50 pogo lxd.daemon[20989]:   1: fd:   6: name=systemd
May 31 15:25:50 pogo lxd.daemon[20989]:   2: fd:   7: cpu,cpuacct
May 31 15:25:50 pogo lxd.daemon[20989]:   3: fd:   8: cpuset
May 31 15:25:50 pogo lxd.daemon[20989]:   4: fd:   9: memory
May 31 15:25:50 pogo lxd.daemon[20989]:   5: fd:  10: rdma
May 31 15:25:50 pogo lxd.daemon[20989]:   6: fd:  11: net_cls,net_prio
May 31 15:25:50 pogo lxd.daemon[20989]:   7: fd:  12: pids
May 31 15:25:50 pogo lxd.daemon[20989]:   8: fd:  13: freezer
May 31 15:25:50 pogo lxd.daemon[20989]:   9: fd:  14: blkio
May 31 15:25:50 pogo lxd.daemon[20989]:  10: fd:  15: hugetlb
May 31 15:25:50 pogo lxd.daemon[20989]:  11: fd:  16: devices
May 31 15:25:50 pogo lxd.daemon[20989]:  12: fd:  17: perf_event
May 31 15:25:50 pogo lxd.daemon[20989]: Kernel does not support swap accounting
May 31 15:25:50 pogo lxd.daemon[20989]: api_extensions:
May 31 15:25:50 pogo lxd.daemon[20989]: - cgroups
May 31 15:25:50 pogo lxd.daemon[20989]: - sys_cpu_online
May 31 15:25:50 pogo lxd.daemon[20989]: - proc_cpuinfo
May 31 15:25:50 pogo lxd.daemon[20989]: - proc_diskstats
May 31 15:25:50 pogo lxd.daemon[20989]: - proc_loadavg
May 31 15:25:50 pogo lxd.daemon[20989]: - proc_meminfo
May 31 15:25:50 pogo lxd.daemon[20989]: - proc_stat
May 31 15:25:50 pogo lxd.daemon[20989]: - proc_swaps
May 31 15:25:50 pogo lxd.daemon[20989]: - proc_uptime
May 31 15:25:50 pogo lxd.daemon[20989]: - proc_slabinfo
May 31 15:25:50 pogo lxd.daemon[20989]: - shared_pidns
May 31 15:25:50 pogo lxd.daemon[20989]: - cpuview_daemon
May 31 15:25:50 pogo lxd.daemon[20989]: - loadavg_daemon
May 31 15:25:50 pogo lxd.daemon[20989]: - pidfds
May 31 15:25:52 pogo systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
May 31 15:25:52 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
May 31 15:25:53 pogo systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
May 31 15:25:53 pogo systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 1.
May 31 15:25:53 pogo systemd[1]: Stopped Service for snap application lxd.daemon.
May 31 15:25:53 pogo systemd[1]: Started Service for snap application lxd.daemon.
May 31 15:25:53 pogo lxd.daemon[21526]: => Preparing the system (24322)
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Loading snap configuration
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Setting up mntns symlink (mnt:[4026532617])
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Setting up kmod wrapper
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Preparing /boot
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Preparing a clean copy of /run
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Preparing /run/bin
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Preparing a clean copy of /etc
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Preparing a clean copy of /usr/share/misc
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Setting up ceph configuration
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Setting up LVM configuration
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Setting up OVN configuration
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Rotating logs
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Setting up ZFS (0.7)
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Escaping the systemd cgroups
May 31 15:25:53 pogo lxd.daemon[21526]: ====> Detected cgroup V1
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Escaping the systemd process resource limits
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Disabling shiftfs on this kernel (auto)
May 31 15:25:53 pogo lxd.daemon[21526]: => Re-using existing LXCFS
May 31 15:25:53 pogo lxd.daemon[21526]: ==> Reloading LXCFS
May 31 15:25:53 pogo lxd.daemon[21526]: => Starting LXD
May 31 15:25:53 pogo lxd.daemon[21526]: time="2023-05-31T15:25:53-04:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, sw
May 31 15:25:54 pogo lxd.daemon[21526]: time="2023-05-31T15:25:54-04:00" level=error msg="Failed to start the daemon" err="Error creating database
May 31 15:25:54 pogo lxd.daemon[21526]: Error: Error creating database: schema version '43' is more recent than expected '42'
May 31 15:25:54 pogo systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
May 31 15:25:54 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
May 31 15:25:55 pogo systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
May 31 15:25:55 pogo systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 2.
May 31 15:25:55 pogo systemd[1]: Stopped Service for snap application lxd.daemon.
May 31 15:25:55 pogo systemd[1]: Started Service for snap application lxd.daemon.
May 31 15:25:55 pogo lxd.daemon[21897]: => Preparing the system (24322)
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Loading snap configuration
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Setting up mntns symlink (mnt:[4026532617])
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Setting up kmod wrapper
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Preparing /boot
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Preparing a clean copy of /run
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Preparing /run/bin
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Preparing a clean copy of /etc
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Preparing a clean copy of /usr/share/misc
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Setting up ceph configuration
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Setting up LVM configuration
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Setting up OVN configuration
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Rotating logs
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Setting up ZFS (0.7)
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Escaping the systemd cgroups
May 31 15:25:55 pogo lxd.daemon[21897]: ====> Detected cgroup V1
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Escaping the systemd process resource limits
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Disabling shiftfs on this kernel (auto)
May 31 15:25:55 pogo lxd.daemon[21897]: => Re-using existing LXCFS
May 31 15:25:55 pogo lxd.daemon[21897]: ==> Reloading LXCFS
May 31 15:25:55 pogo lxd.daemon[21897]: => Starting LXD
May 31 15:25:55 pogo lxd.daemon[21897]: time="2023-05-31T15:25:55-04:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, sw
May 31 15:25:56 pogo lxd.daemon[21897]: time="2023-05-31T15:25:56-04:00" level=error msg="Failed to start the daemon" err="Error creating database
May 31 15:25:56 pogo lxd.daemon[21897]: Error: Error creating database: schema version '43' is more recent than expected '42'
May 31 15:25:56 pogo systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
May 31 15:25:56 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
May 31 15:25:57 pogo systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
May 31 15:25:57 pogo systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 3.
May 31 15:25:57 pogo systemd[1]: Stopped Service for snap application lxd.daemon.

In case it is relevant to the

I have a cron job that takes a zfs snapshot of one of the LXD containers every 15 minutes. It’s never been an issue in the past…

Thanks @tomp for pointing me to the journalctl logs.

I was able to solve this problem - perhaps I got lucky, but in case it helps someone else, here’s what I did.

So putting together info from this post on the dqlite github page with my situation below (reconstructed, so the date/times are not the actual)

Jun 08 2023 12:34:16 root@pogo:/var/snap/lxd/common/lxd/database/global# ll
drwxr-x--- 4 root root    4096 Jun  7 22:48 ./
drwx------ 4 root root    4096 Jun  7 22:49 ../
-rw------- 1 root root 8377384 Feb 22 00:33 0000000000014337-0000000000015151
-rw------- 1 root root 1033208 Feb 22 23:17 0000000000015152-0000000000015251
-rw------- 1 root root 1169264 Feb 23 22:19 0000000000015252-0000000000015360
-rw------- 1 root root 8381408 Mar  3 17:19 0000000000015361-0000000000016173
-rw------- 1 root root 2182112 Mar  5 18:19 0000000000016174-0000000000016384
-rw------- 1 root root 8377344 Mar 13 14:19 0000000000016385-0000000000017198
-rw------- 1 root root 2149152 Mar 15 14:19 0000000000017199-0000000000017408
-rw------- 1 root root 5938944 Mar 21 01:33 0000000000017409-0000000000017985
-rw------- 1 root root 4587576 Mar 25 05:33 0000000000017986-0000000000018432
-rw------- 1 root root 8385552 Apr  2 01:33 0000000000018433-0000000000019246
-rw------- 1 root root 2149152 Apr  4 01:33 0000000000019247-0000000000019456
-rw------- 1 root root 8381560 Apr 11 21:33 0000000000019457-0000000000020272
-rw------- 1 root root 2149064 Apr 13 20:33 0000000000020273-0000000000020480
-rw------- 1 root root 5290168 Apr 18 19:13 0000000000020481-0000000000020996
-rw------- 1 root root 5228120 Apr 23 14:13 0000000000020997-0000000000021504
-rw------- 1 root root 8381448 May  1 09:14 0000000000021505-0000000000022318
-rw------- 1 root root 2165624 May  3 09:14 0000000000022319-0000000000022528
-rw------- 1 root root 8380760 May 10 23:14 0000000000022529-0000000000023332
-rw------- 1 root root 421616 Jun  7 21:40 0000000000023333-0000000000023374
-rw------- 1 root root   4184 Jun  7 21:40 0000000000023375-0000000000023375
-rw------- 1 root root 226952 Jun  7 21:40 0000000000023376-0000000000023392
-rw------- 1 root root 408608 Jun  7 21:40 0000000000023376-0000000000023407
-rw------- 1 root root  651264 May 31 12:59 db.bin
-rw------- 1 root root      32 Oct  3  2022 metadata1
-rw------- 1 root root 8388608 Jun  8 11:49 open-1
-rw------- 1 root root 8388608 Jun  7 22:48 open-2
-rw------- 1 root root 8388608 Jun  7 22:48 open-3
-rw------- 1 root root   72150 Apr 23 14:13 snapshot-1-21504-8809030973
-rw------- 1 root root      56 Apr 23 14:13 snapshot-1-21504-8809030973.meta
-rw------- 1 root root  107155 May  3 09:14 snapshot-1-22528-9655034584
-rw------- 1 root root      56 May  3 09:14 snapshot-1-22528-9655034584.meta

I started by backing up lxd.

Inferring from the dqlite github link above, I moved 0000000000023333-0000000000023374, 0000000000023375-0000000000023375, 0000000000023376-0000000000023392, and 0000000000023376-0000000000023407 into an archive directory.

Next I attempted to downgrade lxd to the snap lxd 5.0/stable channel. Once again I was thwarted, as my schema, even with the files above removed, remained at 43.

So I changed the snap lxd channel back to latest/stable and lxd happily started up.

I have put a hold on the lxd snap updates to prevent unattended/unintentional updates from breaking this production server.

I’d prefer to be on a stable channel, but I guess I’ll have to wait for 6.0.

1 Like

Thanks for all the details. I’m glad you got this sorted out.

May I ask what kernel version you are running as that may be relevant, and I’ll pass this onto the dqlite team to investigate.

@JonathanK

...
-rw------- 1 root root 226952 Jun  7 21:40 0000000000023376-0000000000023392
-rw------- 1 root root 408608 Jun  7 21:40 0000000000023376-0000000000023407
...

Is this actually what you saw in the data directory, or is this some reconstruction artefact?

@JonathanK the full journalctl log of snap.lxd.daemon.service might also be helpful to reconstruct what happened (especially the part from the 23rd of April onwards).

@tomp. I should have made that more clear in the solution. What ultimately got this to work was upgrading to a version of zfs that LXD snap 5.12 and beyond require to work.

From above - though it wasn’t clear that this is where I ended up, it is where I ended up.

And

@mbordere - Are you referring to the dates?

This is a reconstruction. The file names are the same, but they come from a backup that changed the date. I’m not smart enough yet to know how to preserve dates when I back things up. I used a simple cp to copy the files. I can no longer produce the original state.

@freeekanayaka - see the journalctl log below. It’s not the full log b/c that is huge, swollen by many repeats of lxd trying to load and failing. My test processing skills are insufficient for me to edit out all the repetitions, so this is just the beginning.

I’m sure that what happened what the the snap updated beyond the compatiblility of the system in regards to the zfs version number at least. Perhaps in other kernel related areas also.

Then I tried to downgrade and found that discovered that lxd has a policy of not supporting downgrades when the schema changes.

The rest was recovering things to a working state by installing the hwe bit and figuring out how to make dqlite happy again by removing the files as listed above. To clarify, the original dates on the files that I removed were on or after the first error that I found in the journalctl logs for lxd. That was the criteria that I used for removing them.

-- Logs begin at Mon 2023-04-10 13:46:24 EDT, end at Tue 2023-06-13 21:28:00 EDT. --
Apr 18 19:13:28 pogo systemd[1]: Stopping Service for snap application lxd.daemon...
Apr 18 19:13:31 pogo lxd.daemon[11470]: => Stop reason is: snap refresh
Apr 18 19:13:31 pogo lxd.daemon[11470]: => Stopping LXD
Apr 18 19:13:32 pogo lxd.daemon[11470]: ==> Stopped LXD
Apr 18 19:13:32 pogo systemd[1]: Stopped Service for snap application lxd.daemon.
Apr 18 19:13:44 pogo systemd[1]: Started Service for snap application lxd.daemon.
Apr 18 19:13:44 pogo lxd.daemon[19565]: => Preparing the system (24758)
Apr 18 19:13:44 pogo lxd.daemon[19565]: ==> Loading snap configuration
Apr 18 19:13:44 pogo lxd.daemon[19565]: ==> Setting up mntns symlink (mnt:[4026532833])
Apr 18 19:13:44 pogo lxd.daemon[19565]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/storage-pools
Apr 18 19:13:44 pogo lxd.daemon[19565]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/devices
Apr 18 19:13:44 pogo lxd.daemon[19565]: ==> Setting up persistent shmounts path
Apr 18 19:13:45 pogo lxd.daemon[19565]: ==> Setting up kmod wrapper
Apr 18 19:13:45 pogo lxd.daemon[19565]: ==> Preparing /boot
Apr 18 19:13:45 pogo lxd.daemon[19565]: ==> Preparing a clean copy of /run
Apr 18 19:13:45 pogo lxd.daemon[19565]: ==> Preparing /run/bin
Apr 18 19:13:45 pogo lxd.daemon[19565]: ==> Preparing a clean copy of /etc
Apr 18 19:13:46 pogo lxd.daemon[19565]: ==> Preparing a clean copy of /usr/share/misc
Apr 18 19:13:46 pogo lxd.daemon[19565]: ==> Setting up ceph configuration
Apr 18 19:13:46 pogo lxd.daemon[19565]: ==> Setting up LVM configuration
Apr 18 19:13:46 pogo lxd.daemon[19565]: ==> Setting up OVN configuration
Apr 18 19:13:46 pogo lxd.daemon[19565]: ==> Rotating logs
Apr 18 19:13:46 pogo lxd.daemon[19565]: ==> Escaping the systemd cgroups
Apr 18 19:13:46 pogo lxd.daemon[19565]: ====> Detected cgroup V1
Apr 18 19:13:46 pogo lxd.daemon[19565]: ==> Escaping the systemd process resource limits
Apr 18 19:13:46 pogo lxd.daemon[19565]: ==> Disabling shiftfs on this kernel (auto)
Apr 18 19:13:46 pogo lxd.daemon[19565]: => Re-using existing LXCFS
Apr 18 19:13:46 pogo lxd.daemon[19565]: ==> snap base has changed, restart system to upgrade LXCFS
Apr 18 19:13:46 pogo lxd.daemon[19565]: ==> Cleaning up existing LXCFS namespace
Apr 18 19:13:47 pogo lxd.daemon[19565]: => Starting LXD
Apr 18 19:13:47 pogo lxd.daemon[19565]: time="2023-04-18T19:13:47-04:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, sw
May 11 09:48:44 pogo systemd[1]: Stopping Service for snap application lxd.daemon...
May 11 09:48:45 pogo lxd.daemon[23674]: => Stop reason is: snap refresh
May 11 09:48:45 pogo lxd.daemon[23674]: => Stopping LXD
May 11 09:48:46 pogo lxd.daemon[23674]: ==> Stopped LXD
May 11 09:48:46 pogo systemd[1]: Stopped Service for snap application lxd.daemon.
May 11 09:49:06 pogo systemd[1]: Started Service for snap application lxd.daemon.
May 11 09:49:06 pogo lxd.daemon[7505]: => Preparing the system (24846)
May 11 09:49:06 pogo lxd.daemon[7505]: ==> Loading snap configuration
May 11 09:49:06 pogo lxd.daemon[7505]: ==> Setting up mntns symlink (mnt:[4026532967])
May 11 09:49:06 pogo lxd.daemon[7505]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/storage-pools
May 11 09:49:06 pogo lxd.daemon[7505]: ==> Setting up mount propagation on /var/snap/lxd/common/lxd/devices
May 11 09:49:06 pogo lxd.daemon[7505]: ==> Setting up persistent shmounts path
May 11 09:49:06 pogo lxd.daemon[7505]: ==> Setting up kmod wrapper
May 11 09:49:06 pogo lxd.daemon[7505]: ==> Preparing /boot
May 11 09:49:06 pogo lxd.daemon[7505]: ==> Preparing a clean copy of /run
May 11 09:49:06 pogo lxd.daemon[7505]: ==> Preparing /run/bin
May 11 09:49:06 pogo lxd.daemon[7505]: ==> Preparing a clean copy of /etc
May 11 09:49:07 pogo lxd.daemon[7505]: ==> Preparing a clean copy of /usr/share/misc
May 11 09:49:07 pogo lxd.daemon[7505]: ==> Setting up ceph configuration
May 11 09:49:07 pogo lxd.daemon[7505]: ==> Setting up LVM configuration
May 11 09:49:07 pogo lxd.daemon[7505]: ==> Setting up OVN configuration
May 11 09:49:07 pogo lxd.daemon[7505]: ==> Rotating logs
May 11 09:49:07 pogo lxd.daemon[7505]: ==> Escaping the systemd cgroups
May 11 09:49:07 pogo lxd.daemon[7505]: ====> Detected cgroup V1
May 11 09:49:07 pogo lxd.daemon[7505]: ==> Escaping the systemd process resource limits
May 11 09:49:07 pogo lxd.daemon[7505]: ==> Disabling shiftfs on this kernel (auto)
May 11 09:49:07 pogo lxd.daemon[7505]: => Re-using existing LXCFS
May 11 09:49:07 pogo lxd.daemon[7505]: ==> snap base has changed, restart system to upgrade LXCFS
May 11 09:49:07 pogo lxd.daemon[7505]: ==> Cleaning up existing LXCFS namespace
May 11 09:49:14 pogo systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
May 11 09:49:14 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
May 11 09:49:14 pogo systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
May 11 09:49:14 pogo systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 1.
May 11 09:49:14 pogo systemd[1]: Stopped Service for snap application lxd.daemon.
May 11 09:49:14 pogo systemd[1]: Started Service for snap application lxd.daemon.
May 11 09:49:14 pogo lxd.daemon[9387]: => Preparing the system (24846)
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Loading snap configuration
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Setting up mntns symlink (mnt:[4026532967])
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Setting up kmod wrapper
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Preparing /boot
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Preparing a clean copy of /run
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Preparing /run/bin
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Preparing a clean copy of /etc
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Preparing a clean copy of /usr/share/misc
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Setting up ceph configuration
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Setting up LVM configuration
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Setting up OVN configuration
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Rotating logs
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Escaping the systemd cgroups
May 11 09:49:15 pogo lxd.daemon[9387]: ====> Detected cgroup V1
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Escaping the systemd process resource limits
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Disabling shiftfs on this kernel (auto)
May 11 09:49:15 pogo lxd.daemon[9387]: => Re-using existing LXCFS
May 11 09:49:15 pogo lxd.daemon[9387]: ==> snap base has changed, restart system to upgrade LXCFS
May 11 09:49:15 pogo lxd.daemon[9387]: ==> Cleaning up existing LXCFS namespace
May 11 09:49:19 pogo systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
May 11 09:49:19 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
May 11 09:49:19 pogo systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
May 11 09:49:19 pogo systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 2.
May 11 09:49:19 pogo systemd[1]: Stopped Service for snap application lxd.daemon.
May 11 09:49:19 pogo systemd[1]: Started Service for snap application lxd.daemon.
May 11 09:49:19 pogo lxd.daemon[13942]: => Preparing the system (24846)
May 11 09:49:19 pogo lxd.daemon[13942]: ==> Loading snap configuration
May 11 09:49:19 pogo lxd.daemon[13942]: ==> Setting up mntns symlink (mnt:[4026532967])
May 11 09:49:19 pogo lxd.daemon[13942]: ==> Setting up kmod wrapper
May 11 09:49:19 pogo lxd.daemon[13942]: ==> Preparing /boot
May 11 09:49:19 pogo lxd.daemon[13942]: ==> Preparing a clean copy of /run
May 11 09:49:19 pogo lxd.daemon[13942]: ==> Preparing /run/bin
May 11 09:49:19 pogo lxd.daemon[13942]: ==> Preparing a clean copy of /etc
May 11 09:49:19 pogo lxd.daemon[13942]: ==> Preparing a clean copy of /usr/share/misc
May 11 09:49:19 pogo lxd.daemon[13942]: ==> Setting up ceph configuration
May 11 09:49:19 pogo lxd.daemon[13942]: ==> Setting up LVM configuration
May 11 09:49:20 pogo lxd.daemon[13942]: ==> Setting up OVN configuration
May 11 09:49:20 pogo lxd.daemon[13942]: ==> Rotating logs
May 11 09:49:20 pogo lxd.daemon[13942]: ==> Escaping the systemd cgroups
May 11 09:49:20 pogo lxd.daemon[13942]: ====> Detected cgroup V1
May 11 09:49:20 pogo lxd.daemon[13942]: ==> Escaping the systemd process resource limits
May 11 09:49:20 pogo lxd.daemon[13942]: ==> Disabling shiftfs on this kernel (auto)
May 11 09:49:20 pogo lxd.daemon[13942]: => Re-using existing LXCFS
May 11 09:49:20 pogo lxd.daemon[13942]: ==> snap base has changed, restart system to upgrade LXCFS
May 11 09:49:20 pogo lxd.daemon[13942]: ==> Cleaning up existing LXCFS namespace
May 11 09:49:23 pogo systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
May 11 09:49:23 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
May 11 09:49:23 pogo systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
May 11 09:49:23 pogo systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 3.
May 11 09:49:23 pogo systemd[1]: Stopped Service for snap application lxd.daemon.
May 11 09:49:23 pogo systemd[1]: Started Service for snap application lxd.daemon.
May 11 09:49:23 pogo lxd.daemon[18285]: => Preparing the system (24846)
May 11 09:49:23 pogo lxd.daemon[18285]: ==> Loading snap configuration
May 11 09:49:23 pogo lxd.daemon[18285]: ==> Setting up mntns symlink (mnt:[4026532967])
May 11 09:49:23 pogo lxd.daemon[18285]: ==> Setting up kmod wrapper
May 11 09:49:23 pogo lxd.daemon[18285]: ==> Preparing /boot
May 11 09:49:24 pogo lxd.daemon[18285]: ==> Preparing a clean copy of /run
May 11 09:49:24 pogo lxd.daemon[18285]: ==> Preparing /run/bin
May 11 09:49:24 pogo lxd.daemon[18285]: ==> Preparing a clean copy of /etc
May 11 09:49:24 pogo lxd.daemon[18285]: ==> Preparing a clean copy of /usr/share/misc
May 11 09:49:25 pogo lxd.daemon[18285]: ==> Setting up ceph configuration
May 11 09:49:25 pogo lxd.daemon[18285]: ==> Setting up LVM configuration
May 11 09:49:25 pogo lxd.daemon[18285]: ==> Setting up OVN configuration
May 11 09:49:25 pogo lxd.daemon[18285]: ==> Rotating logs
May 11 09:49:25 pogo lxd.daemon[18285]: ==> Escaping the systemd cgroups
May 11 09:49:25 pogo lxd.daemon[18285]: ====> Detected cgroup V1
May 11 09:49:25 pogo lxd.daemon[18285]: ==> Escaping the systemd process resource limits
May 11 09:49:25 pogo lxd.daemon[18285]: ==> Disabling shiftfs on this kernel (auto)
May 11 09:49:25 pogo lxd.daemon[18285]: => Re-using existing LXCFS
May 11 09:49:25 pogo lxd.daemon[18285]: ==> snap base has changed, restart system to upgrade LXCFS
May 11 09:49:25 pogo lxd.daemon[18285]: ==> Cleaning up existing LXCFS namespace
May 11 09:49:25 pogo lxd.daemon[18285]: => Starting LXD
May 11 09:49:25 pogo lxd.daemon[18285]: time="2023-05-11T09:49:25-04:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, sw
May 11 09:49:28 pogo systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
May 11 09:49:28 pogo systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
May 11 09:49:29 pogo systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
May 11 09:49:29 pogo systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 4.
May 11 09:49:29 pogo systemd[1]: Stopped Service for snap application lxd.daemon.

etc., etc.

If you need the full logs, give me some hints as to how to get them to you and I will do so.