Snap refresh randomly fails and disables lxd

We’ve been having an issue where snap refresh manages to disable lxd for a prolonged period of time.
Containers keep running as normal, but commands remain unavailable for a while (anywhere between a few minutes to a few hours) until the problem fixes itself.

What is the correct way to recover from this manually (snap enable lxd ?) and are there any other logs I can provide to track down the issue?

OS: Ubuntu 20.04.2 LTS
Kernel: 5.4.0-66-generic
# lxc list
zsh: command not found: lxc
# snap logs lxd                           
2021-06-18T06:06:51Z lxd.daemon[1468466]: 2021/06/18 08:06:51 http: TLS handshake error from 10.80.40.1:51074: EOF
2021-06-18T06:06:51Z lxd.daemon[1468466]: 2021/06/18 08:06:51 http: TLS handshake error from 10.80.40.1:51072: EOF
2021-06-18T06:06:51Z lxd.daemon[1468466]: 2021/06/18 08:06:51 http: TLS handshake error from 10.80.40.1:51076: EOF
2021-06-18T06:07:24Z systemd[1]: Stopping Service for snap application lxd.daemon...
2021-06-18T06:07:28Z lxd.daemon[323967]: => Stop reason is: snap refresh
2021-06-18T06:07:28Z lxd.daemon[323967]: => Stopping LXD
2021-06-18T06:07:31Z lxd.daemon[1468154]: => LXD exited cleanly
2021-06-18T06:07:31Z lxd.daemon[323967]: ==> Stopped LXD
2021-06-18T06:07:31Z systemd[1]: snap.lxd.daemon.service: Succeeded.
2021-06-18T06:07:31Z systemd[1]: Stopped Service for snap application lxd.daemon
# snap refresh lxd
error: cannot refresh "lxd": refreshing disabled snap "lxd" not supported
# snap refresh    
All snaps up to date.
# snap info lxd                           
name:      lxd
summary:   System container and virtual machine manager
publisher: Canonical✓
store-url: https://snapcraft.io/lxd
contact:   https://github.com/lxc/lxd/issues
license:   unset
description: <description>
commands: <commands>
services:
  lxd.activate: oneshot, disabled, inactive
  lxd.daemon:   simple, disabled, inactive
snap-id:  J60k4JY0HppjwOjW8dZdYc8obXKxujRu
tracking: latest/stable
channels:
  latest/stable:    4.15        2021-06-18 (20789) 69MB -
  latest/candidate: 4.15        2021-06-18 (20789) 69MB -
  latest/beta:      ↑                                   
  latest/edge:      git-7d6ebc4 2021-06-18 (20784) 69MB -
  4.15/stable:      4.15        2021-06-18 (20789) 69MB -
  4.15/candidate:   4.15        2021-06-18 (20789) 69MB -
  4.15/beta:        ↑                                   
  4.15/edge:        ↑                                   
  4.14/stable:      4.14        2021-05-18 (20450) 73MB -
  4.14/candidate:   4.14        2021-05-18 (20450) 73MB -
  4.14/beta:        ↑                                   
  4.14/edge:        ↑                                   
  4.13/stable:      4.13        2021-05-04 (20309) 72MB -
  4.13/candidate:   4.13        2021-05-04 (20309) 72MB -
  4.13/beta:        ↑                                   
  4.13/edge:        ↑                                   
  4.0/stable:       4.0.6       2021-05-06 (20326) 70MB -
  4.0/candidate:    4.0.6       2021-05-05 (20326) 70MB -
  4.0/beta:         ↑                                   
  4.0/edge:         git-23fb72b 2021-06-01 (20535) 66MB -
  3.0/stable:       3.0.4       2019-10-10 (11348) 55MB -
  3.0/candidate:    3.0.4       2019-10-10 (11348) 55MB -
  3.0/beta:         ↑                                   
  3.0/edge:         git-81b81b9 2019-10-10 (11362) 55MB -
  2.0/stable:       2.0.12      2020-08-18 (16879) 38MB -
  2.0/candidate:    2.0.12      2021-03-22 (19859) 39MB -
  2.0/beta:         ↑                                   
  2.0/edge:         git-82c7d62 2021-03-22 (19857) 39MB -
installed:          4.15                   (20760) 69MB disabled

and snap logs lxd from when the problem fixes itself:

2021-06-18T08:15:56Z systemd[1]: Starting Service for snap application lxd.activate...
2021-06-18T08:15:56Z lxd.activate[411979]: => Starting LXD activation
2021-06-18T08:15:56Z lxd.activate[411979]: ==> Loading snap configuration
2021-06-18T08:15:56Z lxd.activate[411979]: ==> Checking for socket activation support
2021-06-18T08:15:57Z lxd.activate[411979]: ==> Setting LXD socket ownership
2021-06-18T08:15:57Z lxd.activate[411979]: ==> Checking if LXD needs to be activated
2021-06-18T08:16:00Z systemd[1]: Started Service for snap application lxd.daemon.
2021-06-18T08:16:00Z lxd.daemon[412100]: => Preparing the system (20789)
2021-06-18T08:16:02Z lxd.daemon[412100]: ==> Loading snap configuration
2021-06-18T08:16:02Z lxd.daemon[412100]: ==> Setting up mntns symlink (mnt:[4026532922])
2021-06-18T08:16:03Z lxd.daemon[412100]: ==> Setting up kmod wrapper
2021-06-18T08:16:03Z lxd.daemon[412100]: ==> Preparing /boot
2021-06-18T08:16:03Z lxd.daemon[412100]: ==> Preparing a clean copy of /run
2021-06-18T08:16:03Z lxd.daemon[412100]: ==> Preparing /run/bin
2021-06-18T08:16:03Z lxd.daemon[412100]: ==> Preparing a clean copy of /etc
2021-06-18T08:16:06Z lxd.daemon[412100]: ==> Preparing a clean copy of /usr/share/misc
2021-06-18T08:16:06Z lxd.daemon[412100]: ==> Setting up ceph configuration
2021-06-18T08:16:06Z lxd.daemon[412100]: ==> Setting up LVM configuration
2021-06-18T08:16:06Z lxd.daemon[412100]: ==> Rotating logs
2021-06-18T08:16:46Z lxd.daemon[412100]: ==> Setting up ZFS (0.8)
2021-06-18T08:16:46Z lxd.daemon[412100]: ==> Escaping the systemd cgroups
2021-06-18T08:16:46Z lxd.daemon[412100]: ====> Detected cgroup V1
2021-06-18T08:16:46Z lxd.daemon[412100]: ==> Escaping the systemd process resource limits
2021-06-18T08:16:46Z lxd.daemon[412100]: ==> Disabling shiftfs on this kernel (auto)
2021-06-18T08:16:46Z lxd.daemon[1274626]: Closed liblxcfs.so
2021-06-18T08:16:46Z lxd.daemon[1274626]: Running destructor lxcfs_exit
2021-06-18T08:16:46Z lxd.daemon[1274626]: Running constructor lxcfs_init to reload liblxcfs
2021-06-18T08:16:47Z lxd.daemon[1274626]: mount namespace: 5
2021-06-18T08:16:47Z lxd.daemon[1274626]: hierarchies:
2021-06-18T08:16:47Z lxd.daemon[1274626]:   0: fd:   6:
2021-06-18T08:16:47Z lxd.daemon[1274626]:   1: fd:   7: name=systemd
2021-06-18T08:16:47Z lxd.daemon[1274626]:   2: fd:   8: memory
2021-06-18T08:16:47Z lxd.daemon[1274626]:   3: fd:   9: hugetlb
2021-06-18T08:16:47Z lxd.daemon[1274626]:   4: fd:  10: freezer
2021-06-18T08:16:47Z lxd.daemon[1274626]:   5: fd:  11: perf_event
2021-06-18T08:16:47Z lxd.daemon[1274626]:   6: fd:  12: devices
2021-06-18T08:16:47Z lxd.daemon[1274626]:   7: fd:  13: blkio
2021-06-18T08:16:47Z lxd.daemon[1274626]:   8: fd:  14: cpuset
2021-06-18T08:16:47Z lxd.daemon[1274626]:   9: fd:  15: rdma
2021-06-18T08:16:47Z lxd.daemon[1274626]:  10: fd:  16: net_cls,net_prio
2021-06-18T08:16:47Z lxd.daemon[1274626]:  11: fd:  17: cpu,cpuacct
2021-06-18T08:16:47Z lxd.daemon[1274626]:  12: fd:  19: pids
2021-06-18T08:16:47Z lxd.daemon[1274626]: Kernel supports pidfds
2021-06-18T08:16:47Z lxd.daemon[1274626]: Kernel does not support swap accounting
2021-06-18T08:16:47Z lxd.daemon[1274626]: api_extensions:
2021-06-18T08:16:47Z lxd.daemon[1274626]: - cgroups
2021-06-18T08:16:47Z lxd.daemon[1274626]: - sys_cpu_online
2021-06-18T08:16:47Z lxd.daemon[1274626]: - proc_cpuinfo
2021-06-18T08:16:47Z lxd.daemon[1274626]: - proc_diskstats
2021-06-18T08:16:47Z lxd.daemon[1274626]: - proc_loadavg
2021-06-18T08:16:47Z lxd.daemon[1274626]: - proc_meminfo
2021-06-18T08:16:47Z lxd.daemon[1274626]: - proc_stat
2021-06-18T08:16:47Z lxd.daemon[1274626]: - proc_swaps
2021-06-18T08:16:47Z lxd.daemon[1274626]: - proc_uptime
2021-06-18T08:16:47Z lxd.daemon[1274626]: - shared_pidns
2021-06-18T08:16:47Z lxd.daemon[1274626]: - cpuview_daemon
2021-06-18T08:16:47Z lxd.daemon[1274626]: - loadavg_daemon
2021-06-18T08:16:47Z lxd.daemon[1274626]: - pidfds
2021-06-18T08:16:47Z lxd.daemon[1274626]: Reloaded LXCFS
2021-06-18T08:16:47Z lxd.daemon[412100]: => Re-using existing LXCFS
2021-06-18T08:16:47Z lxd.daemon[412100]: => Starting LXD
2021-06-18T08:16:47Z lxd.daemon[413049]: t=2021-06-18T10:16:47+0200 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, disk priority will be ignored"
2021-06-18T08:16:47Z lxd.daemon[413049]: t=2021-06-18T10:16:47+0200 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
2021-06-18T08:17:07Z systemd[1]: snap.lxd.activate.service: Succeeded.
2021-06-18T08:17:07Z systemd[1]: Finished Service for snap application lxd.activate.
2021-06-18T08:17:11Z lxd.daemon[412100]: => LXD is ready

Your best bet is to run snap changes when something like this happens to see what’s going on.

Just happened, but it took “only” 30 minutes and the server is seeing some IO load, so not sure if this is the issue.
(moved timestamps to the top so the output is readable)

snap changes &>> /root/cronsnap.log && echo $(date --iso-8601=seconds) 

2021-07-09T09:27:01+02:00
no changes found

2021-07-09T09:28:03+02:00
ID   Status  Spawn                Ready  Summary
155  Doing   today at 09:27 CEST  -      Auto-refresh snap "lxd"

...
2021-07-09T09:59:02+02:00
ID   Status  Spawn                Ready  Summary
155  Doing   today at 09:27 CEST  -      Auto-refresh snap "lxd"

2021-07-09T10:00:02+02:00
ID   Status  Spawn                Ready                Summary
155  Done    today at 09:27 CEST  today at 09:59 CEST  Auto-refresh snap "lxd"