LXD won't start: Failed applying Patch - snapshot directory not empty

Hi Folks,

LXD 4.3 on ubuntu 18 using btrfs (and some dir by accident)

Yesterday some of my containers became unresponsive. Was not able to lxc list - got some sort of socket error. Host was long overdue for updates … So I updated host, rebooted. Similar issue.

Ran:
sudo systemctl stop snap.lxd.daemon
sudo lxd --debug --group lxd

NFO[07-22|12:29:02] LXD 4.3 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[07-22|12:29:02] Kernel uid/gid map:
INFO[07-22|12:29:02] - u 0 0 4294967295
INFO[07-22|12:29:02] - g 0 0 4294967295
INFO[07-22|12:29:02] Configured LXD uid/gid map:
INFO[07-22|12:29:02] - u 0 1000000 1000000000
INFO[07-22|12:29:02] - g 0 1000000 1000000000
INFO[07-22|12:29:02] Kernel features:
INFO[07-22|12:29:02] - netnsid-based network retrieval: no
INFO[07-22|12:29:02] - pidfds: no
INFO[07-22|12:29:02] - uevent injection: no
INFO[07-22|12:29:02] - seccomp listener: no
INFO[07-22|12:29:02] - seccomp listener continue syscalls: no
INFO[07-22|12:29:02] - unprivileged file capabilities: yes
INFO[07-22|12:29:02] - cgroup layout: hybrid
WARN[07-22|12:29:02] - Couldn’t find the CGroup memory swap accounting, swap limits will be ignored
INFO[07-22|12:29:02] - shiftfs support: no
INFO[07-22|12:29:02] Initializing local database
DBUG[07-22|12:29:02] Initializing database gateway
DBUG[07-22|12:29:02] Start database node id=1 address= role=voter
INFO[07-22|12:29:02] Starting /dev/lxd handler:
INFO[07-22|12:29:02] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[07-22|12:29:02] REST API daemon:
INFO[07-22|12:29:02] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[07-22|12:29:02] - binding TCP socket socket=[::]:8443
INFO[07-22|12:29:02] Initializing global database
DBUG[07-22|12:29:02] Dqlite: attempt 0: server 1: connected
DBUG[07-22|12:29:02] Firewall detected “nftables” incompatibility: Kernel version does not meet minimum requirement of 5
INFO[07-22|12:29:02] Firewall loaded driver “xtables”
INFO[07-22|12:29:02] Initializing storage pools
DBUG[07-22|12:29:02] Initializing and checking storage pool “default”
DBUG[07-22|12:29:02] Mount started driver=dir pool=default
DBUG[07-22|12:29:02] Mount finished driver=dir pool=default
DBUG[07-22|12:29:02] Initializing and checking storage pool “btrfstor”
DBUG[07-22|12:29:02] Mount started driver=btrfs pool=btrfstor
DBUG[07-22|12:29:02] Mount finished driver=btrfs pool=btrfstor
INFO[07-22|12:29:02] Applying patch “storage_api_rename_container_snapshots_dir_again”
DBUG[07-22|12:29:02] Mount started driver=btrfs pool=btrfstor
DBUG[07-22|12:29:02] Mount finished driver=btrfs pool=btrfstor
EROR[07-22|12:29:02] Failed to start the daemon: Failed applying patch “storage_api_rename_container_snapshots_dir_again”: remove /var/snap/lxd/common/lxd/storage-pools/btrfstor/snapshots/HAbtfs2: directory not empty
INFO[07-22|12:29:02] Starting shutdown sequence
INFO[07-22|12:29:02] Closing the database
INFO[07-22|12:29:02] Stop database gateway
INFO[07-22|12:29:02] Stopping REST API handler:
INFO[07-22|12:29:02] - closing socket socket=[::]:8443
INFO[07-22|12:29:02] - closing socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[07-22|12:29:02] Stopping /dev/lxd handler:
INFO[07-22|12:29:02] - closing socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[07-22|12:29:02] Unmounting temporary filesystems
INFO[07-22|12:29:02] Done unmounting temporary filesystems
Error: Failed applying patch “storage_api_rename_container_snapshots_dir_again”: remove /var/snap/lxd/common/lxd/storage-pools/btrfstor/snapshots/HAbtfs2: directory not empty

I have a snap0 directory in the mentioned folder, which I backed up elsewhere then tried to rm -r … which emptied snap0 but left the folder.
Oddly, I can’t delete the folder (as root, or with sudo):

rmdir /var/snap/lxd/common/lxd/storage-pools/btrfstor/snapshots/HAbtfs2/snap0
rmdir: failed to remove ‘/var/snap/lxd/common/lxd/storage-pools/btrfstor/snapshots/HAbtfs2/snap0’: Operation not permitted

Any thoughts on how to get past this? (If it makes it easier, the snapshots are not important ad can be flushed…)

Thanks!

What’s in container-snapshots?

The goal of that particular patch was to move the content of snapshots into container-snapshots. Your error suggests that you have some snapshots in both, causing the issue.

The delete error you’re getting is because of btrfs and its subvolumes.

Something like nsenter --mount=/run/snapd/ns/lxd.mnt /snap/lxd/current/bin/btrfs subvol delete /var/snap/lxd/common/lxd/storage-pools/btrfstor/snapshots/HAbtfs2/snap0 should work to delete whatever needs deleting to clear the conflict.

Thanks Stéphane!

Looks like container-snapshots is a symlink to snapshots:

rwx–x--x 1 root root 134 Apr 24 2019 ./
drwx–x--x 4 root root 4096 Apr 23 2019 …/
drwxr-xr-x 1 root root 156 Dec 31 2017 containerbackups/
drwx–x--x 1 root root 208 May 25 2019 containers/
lrwxrwxrwx 1 root root 11 Apr 24 2019 containers-snapshots -> ./snapshots/
drwx–x--x 1 root root 0 Dec 29 2017 custom/
drwx------ 1 root root 1408 May 25 2019 images/
drwx------ 1 root root 72 May 6 2019 snapshots/

Was able to remove the directory (subvolume) with your command … however there are 5 other container snapshots left in there … same error now throws on another container name.

Should I manually migrate the snapshots to a new directory? … have no issue flushing them completely either …

Thanks!

Hmm, I don’t believe we’ve ever made containers-snapshots a symlink, this is confusing.

Can you rm the containers-snapshots symlink and rename snapshots to container-snapshots?

Thanks Stéphane!.. that has got me back up and running! I can say with a fair of confidence that I did NOT make that symlink manually … ??
Thanks again for the help!