LXD won't start: Failed applying Patch - snapshot directory not empty

JC-Mac · July 22, 2020, 12:45pm

Hi Folks,

LXD 4.3 on ubuntu 18 using btrfs (and some dir by accident)

Yesterday some of my containers became unresponsive. Was not able to lxc list - got some sort of socket error. Host was long overdue for updates … So I updated host, rebooted. Similar issue.

Ran:
sudo systemctl stop snap.lxd.daemon
sudo lxd --debug --group lxd

NFO[07-22|12:29:02] LXD 4.3 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[07-22|12:29:02] Kernel uid/gid map:
INFO[07-22|12:29:02] - u 0 0 4294967295
INFO[07-22|12:29:02] - g 0 0 4294967295
INFO[07-22|12:29:02] Configured LXD uid/gid map:
INFO[07-22|12:29:02] - u 0 1000000 1000000000
INFO[07-22|12:29:02] - g 0 1000000 1000000000
INFO[07-22|12:29:02] Kernel features:
INFO[07-22|12:29:02] - netnsid-based network retrieval: no
INFO[07-22|12:29:02] - pidfds: no
INFO[07-22|12:29:02] - uevent injection: no
INFO[07-22|12:29:02] - seccomp listener: no
INFO[07-22|12:29:02] - seccomp listener continue syscalls: no
INFO[07-22|12:29:02] - unprivileged file capabilities: yes
INFO[07-22|12:29:02] - cgroup layout: hybrid
WARN[07-22|12:29:02] - Couldn’t find the CGroup memory swap accounting, swap limits will be ignored
INFO[07-22|12:29:02] - shiftfs support: no
INFO[07-22|12:29:02] Initializing local database
DBUG[07-22|12:29:02] Initializing database gateway
DBUG[07-22|12:29:02] Start database node id=1 address= role=voter
INFO[07-22|12:29:02] Starting /dev/lxd handler:
INFO[07-22|12:29:02] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[07-22|12:29:02] REST API daemon:
INFO[07-22|12:29:02] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[07-22|12:29:02] - binding TCP socket socket=[::]:8443
INFO[07-22|12:29:02] Initializing global database
DBUG[07-22|12:29:02] Dqlite: attempt 0: server 1: connected
DBUG[07-22|12:29:02] Firewall detected “nftables” incompatibility: Kernel version does not meet minimum requirement of 5
INFO[07-22|12:29:02] Firewall loaded driver “xtables”
INFO[07-22|12:29:02] Initializing storage pools
DBUG[07-22|12:29:02] Initializing and checking storage pool “default”
DBUG[07-22|12:29:02] Mount started driver=dir pool=default
DBUG[07-22|12:29:02] Mount finished driver=dir pool=default
DBUG[07-22|12:29:02] Initializing and checking storage pool “btrfstor”
DBUG[07-22|12:29:02] Mount started driver=btrfs pool=btrfstor
DBUG[07-22|12:29:02] Mount finished driver=btrfs pool=btrfstor
INFO[07-22|12:29:02] Applying patch “storage_api_rename_container_snapshots_dir_again”
DBUG[07-22|12:29:02] Mount started driver=btrfs pool=btrfstor
DBUG[07-22|12:29:02] Mount finished driver=btrfs pool=btrfstor
EROR[07-22|12:29:02] Failed to start the daemon: Failed applying patch “storage_api_rename_container_snapshots_dir_again”: remove /var/snap/lxd/common/lxd/storage-pools/btrfstor/snapshots/HAbtfs2: directory not empty
INFO[07-22|12:29:02] Starting shutdown sequence
INFO[07-22|12:29:02] Closing the database
INFO[07-22|12:29:02] Stop database gateway
INFO[07-22|12:29:02] Stopping REST API handler:
INFO[07-22|12:29:02] - closing socket socket=[::]:8443
INFO[07-22|12:29:02] - closing socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[07-22|12:29:02] Stopping /dev/lxd handler:
INFO[07-22|12:29:02] - closing socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[07-22|12:29:02] Unmounting temporary filesystems
INFO[07-22|12:29:02] Done unmounting temporary filesystems
Error: Failed applying patch “storage_api_rename_container_snapshots_dir_again”: remove /var/snap/lxd/common/lxd/storage-pools/btrfstor/snapshots/HAbtfs2: directory not empty

I have a snap0 directory in the mentioned folder, which I backed up elsewhere then tried to rm -r … which emptied snap0 but left the folder.
Oddly, I can’t delete the folder (as root, or with sudo):

rmdir /var/snap/lxd/common/lxd/storage-pools/btrfstor/snapshots/HAbtfs2/snap0
rmdir: failed to remove ‘/var/snap/lxd/common/lxd/storage-pools/btrfstor/snapshots/HAbtfs2/snap0’: Operation not permitted

Any thoughts on how to get past this? (If it makes it easier, the snapshots are not important ad can be flushed…)

Thanks!

stgraber · July 22, 2020, 6:47pm

What’s in container-snapshots?

The goal of that particular patch was to move the content of snapshots into container-snapshots. Your error suggests that you have some snapshots in both, causing the issue.

The delete error you’re getting is because of btrfs and its subvolumes.

Something like nsenter --mount=/run/snapd/ns/lxd.mnt /snap/lxd/current/bin/btrfs subvol delete /var/snap/lxd/common/lxd/storage-pools/btrfstor/snapshots/HAbtfs2/snap0 should work to delete whatever needs deleting to clear the conflict.

JC-Mac · July 22, 2020, 9:20pm

Thanks Stéphane!

Looks like container-snapshots is a symlink to snapshots:

rwx–x--x 1 root root 134 Apr 24 2019 ./
drwx–x--x 4 root root 4096 Apr 23 2019 …/
drwxr-xr-x 1 root root 156 Dec 31 2017 containerbackups/
drwx–x--x 1 root root 208 May 25 2019 containers/
lrwxrwxrwx 1 root root 11 Apr 24 2019 containers-snapshots -> ./snapshots/
drwx–x--x 1 root root 0 Dec 29 2017 custom/
drwx------ 1 root root 1408 May 25 2019 images/
drwx------ 1 root root 72 May 6 2019 snapshots/

Was able to remove the directory (subvolume) with your command … however there are 5 other container snapshots left in there … same error now throws on another container name.

Should I manually migrate the snapshots to a new directory? … have no issue flushing them completely either …

Thanks!

stgraber · July 23, 2020, 1:37am

Hmm, I don’t believe we’ve ever made containers-snapshots a symlink, this is confusing.

Can you rm the containers-snapshots symlink and rename snapshots to container-snapshots?

JC-Mac · July 23, 2020, 11:32am

Thanks Stéphane!.. that has got me back up and running! I can say with a fair of confidence that I did NOT make that symlink manually … ??
Thanks again for the help!