Cant start lxd running "lxd" failed: cannot find installed snap "lxd" at revision 20638: missing file

kpfa · June 14, 2021, 12:52am

Rebooted our Ubuntu 20.04 running latest LXD snap after installing memory and none of the containers are starting.

lxc list produces:
internal error, please report: running "lxd" failed: cannot find installed snap "lxd" at revision 20638: missing file /snap/lxd/20638/meta/snap.yaml

systemctl start snap.lxd.daemon
followed by
lxc list produces:
cannot perform operation: mount --rbind /dev /tmp/snap.rootfs_f8RF31//dev: No such file or directory

a reboot and lxc list now produce:

cat: /proc/self/attr/current: Permission denied
/snap/lxd/20638/commands/lxc: 6: exec: aa-exec: Permission denied

kpfa · June 14, 2021, 1:16am

~~update! fix!:~~
sudo snap connect lxd:lxd-support core:lxd-support
sudo systemctl stop snap.lxd.daemon
sudo systemctl start snap.lxd.daemon

update #2
23 hours later - lxd is now stopped and in a restart loop

Error: Get "http://unix.socket/1.0": read unix @->/var/snap/lxd/common/lxd/unix.socket: read: connection reset by peer

kpfa · June 15, 2021, 1:49am

dmesg -wH produces
new mount options do not match the existing superblock, will be ignored

stgraber · June 15, 2021, 1:51am

That’s normal, the kernel logs that particular message pretty much every time something mounts a cgroupfs filesystem (in this case, lxcfs).

stgraber · June 15, 2021, 1:51am

journalctl -u snap.lxd.daemon -n 30 may be useful to see what’s up with the daemon.

kpfa · June 15, 2021, 1:53am

/var/snap/lxd/common/lxd/logs/lxd.log:
Failed to start the daemon: Failed to start dqlite server: raft_start(): io: closed segment 0000000000054298-0000000000054800 is past last snapshot snapshot-1-54272-9477069720

journalctl -u snap.lxd.daemon -n 30

-- Logs begin at Mon 2021-06-14 13:54:08 PDT, end at Mon 2021-06-14 18:52:19 PDT. --
Jun 14 18:52:17 srv-nd lxd.daemon[15882]:  11: fd:  17: cpuset
Jun 14 18:52:17 srv-nd lxd.daemon[15882]:  12: fd:  19: memory
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: Kernel supports pidfds
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: Kernel does not support swap accounting
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: api_extensions:
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - cgroups
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - sys_cpu_online
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - proc_cpuinfo
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - proc_diskstats
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - proc_loadavg
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - proc_meminfo
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - proc_stat
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - proc_swaps
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - proc_uptime
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - shared_pidns
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - cpuview_daemon
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - loadavg_daemon
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: - pidfds
Jun 14 18:52:17 srv-nd lxd.daemon[15882]: Reloaded LXCFS
Jun 14 18:52:18 srv-nd lxd.daemon[404961]: => LXD failed to start
Jun 14 18:52:18 srv-nd systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Jun 14 18:52:18 srv-nd systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jun 14 18:52:18 srv-nd systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 5.
Jun 14 18:52:18 srv-nd systemd[1]: Stopped Service for snap application lxd.daemon.
Jun 14 18:52:18 srv-nd systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
Jun 14 18:52:18 srv-nd systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jun 14 18:52:18 srv-nd systemd[1]: Failed to start Service for snap application lxd.daemon.
Jun 14 18:52:19 srv-nd systemd[1]: snap.lxd.daemon.service: Start request repeated too quickly.
Jun 14 18:52:19 srv-nd systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Jun 14 18:52:19 srv-nd systemd[1]: Failed to start Service for snap application lxd.daemon.

stgraber · June 15, 2021, 1:54am

Did your system have a bad crash or ran out of disk space recently?
It sounds like something bad happened to your database…

Can you show ls -lh /var/snap/lxd/common/lxd/database/global/?

kpfa · June 15, 2021, 2:01am

Yes, recently the rootfs / momentarily ran out of space.

ls -lh /var/snap/lxd/common/lxd/database/global/

total 46K
9.0K drwx------ 4 root root 6 Jun 13 18:01 .
9.0K drwx–x–x 20 root root 24 Jun 13 18:01 …
9.0K drwxr-x— 2 root root 30 Jun 13 18:01 global
9.0K drwxr-x— 2 root root 9 Jun 13 18:01 global.bak
5.0K -rw-r–r-- 1 root root 48K Jun 13 18:01 local.db
5.0K -rw-r–r-- 1 root root 40K Jun 13 18:01 local.db.bak

stgraber · June 15, 2021, 2:02am

The listing above appears to be for database not database/global

kpfa · June 15, 2021, 2:03am

my bad here is the correct output, should I remove 0000000000054298-0000000000054800?

total 43M
-rw------- 1 root root 8.0M Jun 2 09:00 0000000000045743-0000000000046232
-rw------- 1 root root 8.0M Jun 3 01:00 0000000000046233-0000000000046685
-rw------- 1 root root 8.0M Jun 3 18:00 0000000000046686-0000000000047158
-rw------- 1 root root 8.0M Jun 4 10:00 0000000000047159-0000000000047619
-rw------- 1 root root 8.0M Jun 5 01:00 0000000000047620-0000000000048088
-rw------- 1 root root 8.0M Jun 5 17:00 0000000000048089-0000000000048571
-rw------- 1 root root 8.0M Jun 6 07:20 0000000000048572-0000000000049024
-rw------- 1 root root 8.0M Jun 7 00:00 0000000000049025-0000000000049483
-rw------- 1 root root 6.8M Jun 7 13:09 0000000000049484-0000000000049878
-rw------- 1 root root 3.7M Jun 7 21:34 0000000000049879-0000000000050099
-rw------- 1 root root 8.0M Jun 8 12:00 0000000000050100-0000000000050568
-rw------- 1 root root 8.0M Jun 9 06:00 0000000000050569-0000000000051051
-rw------- 1 root root 2.9M Jun 9 11:54 0000000000051052-0000000000051217
-rw------- 1 root root 8.0M Jun 10 02:55 0000000000051218-0000000000051683
-rw------- 1 root root 8.0M Jun 10 18:00 0000000000051684-0000000000052156
-rw------- 1 root root 8.0M Jun 11 11:55 0000000000052157-0000000000052630
-rw------- 1 root root 8.0M Jun 12 03:00 0000000000052631-0000000000053088
-rw------- 1 root root 8.0M Jun 12 18:00 0000000000053089-0000000000053551
-rw------- 1 root root 8.0M Jun 13 11:00 0000000000053552-0000000000054022
-rw------- 1 root root 4.3M Jun 13 17:40 0000000000054023-0000000000054297
-rw------- 1 root root 8.0M Jun 14 06:00 0000000000054298-0000000000054800
-rw------- 1 root root 2.1M Jun 14 10:05 0000000000054799-0000000000054913
-rw------- 1 root root 2.0M Jun 14 10:05 db.bin
-rw------- 1 root root 32K May 11 22:53 db.bin-shm
-rw------- 1 root root 3.9M Jun 14 10:05 db.bin-wal
-rw------- 1 root root 32 Feb 23 23:11 metadata1
-rw------- 1 root root 1.4M Jun 12 07:55 snapshot-1-53248-9355593965
-rw------- 1 root root 56 Jun 12 07:55 snapshot-1-53248-9355593965.meta
-rw------- 1 root root 810K Jun 13 17:39 snapshot-1-54272-9477069720
-rw------- 1 root root 56 Jun 13 17:39 snapshot-1-54272-9477069720.meta

stgraber · June 15, 2021, 2:09am

So this is a bit confusing… the last snapshot is 54272, the last transaction is 54913 and you have segment files going from 54023 through to 54913 so I’m not too sure why it’s not loading properly.

So I guess start by making a full backup copy of the global directory.
Then you can try removing:

0000000000054298-0000000000054800
0000000000054799-0000000000054913

Which will make LXD entirely rely on that latest snapshot and try to start back up from it. If that doesn’t work, you’ll have to restore global from your copy and we can then try to remove that latest snasphot (snapshot-1-54272-9477069720 and snapshot-1-54272-9477069720.meta) and see if the snapshot prior to that + the segment files can get you back online.

kpfa · June 15, 2021, 2:14am

Thanks, rm worked.