Cannot delete a snapshot after its container has been erased

Hello,

I’m trying to delete the default storage to use another disk, separated from /var, for my servers (cluster of two servers called server and server2). The first problem is some volume are still present on the storage, although all containers and images have been erased.

% lxc storage delete default
Error: storage pool "default" has volumes attached to it

Indeed, something went wrong while deleting all containers and their snapshots

% lxc storage volume list default
+----------------------+---------------------+-------------+---------+----------+
|         TYPE         |        NAME         | DESCRIPTION | USED BY | LOCATION |
+----------------------+---------------------+-------------+---------+----------+
| container (snapshot) | ntp-backup/working  |             | 1       | server   |
+----------------------+---------------------+-------------+---------+----------+
| container (snapshot) | template/2019051201 |             | 1       | server2  |
+----------------------+---------------------+-------------+---------+----------+

But I cannot delete theses snapshots.

% lxc storage volume delete default ntp-backup/working
Error: No such object
% lxc storage volume delete default template/2019051201
Error: No such object

I’m using lxd 3.13

% lxc --version
3.13

Back-end storage is BTRFS.
Thank you in advance for your help.

It has happened to me one time and I used btrfs tool to delete the phantom snapshot.
you should be able to see your snapshots with btrfs using

sudo nsenter -t $(pgrep daemon.start) -m – /snap/lxd/current/bin/btrfs subvolume list /var/snap/lxd/common/lxd/storage-pools/default

It works for me at least; now I’m not doing more advanced tests of this kind on my own disk :slight_smile: but given that you want to get rid of it anyway I think that trying out
‘subvolume delete’ should do what you want.

Strange thing

root@server:/var/snap/lxd/common/lxd/storage-pools/default# btrfs subvolume delete containers-snapshots/ntp-backup
ERROR: not a subvolume: containers-snapshots/ntp-backup

Although it is listed as subvolume

% sudo nsenter -t $(pgrep daemon.start) -m -- /snap/lxd/current/bin/btrfs subvolume list /var/snap/lxd/common/lxd/storage-pools/default
ID 420 gen 2712846 top level 5 path snap/lxd/common/lxd/storage-pools/default
ID 421 gen 2712846 top level 420 path containers
ID 422 gen 2712961 top level 420 path containers-snapshots
ID 423 gen 2712846 top level 420 path images
ID 424 gen 2712846 top level 420 path custom
ID 425 gen 2712846 top level 420 path custom-snapshots
ID 510 gen 2643133 top level 422 path containers-snapshots/ntp-backup/working

And the subvolume is in readonly mode

% sudo btrfs subvolume show /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working
snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working
        Name:                   working
        UUID:                   6e3f2cbf-14fc-2742-8ff5-a24ce137966a
        Parent UUID:            a2159e66-f131-6c46-907f-7958cce550a2
        Received UUID:          -
        Creation time:          2019-05-15 09:37:15 +0200
        Subvolume ID:           510
        Generation:             2643133
        Gen at creation:        2643133
        Parent ID:              422
        Top level ID:           422
        Flags:                  readonly

I guess something terribly wrong happened when some snapshots were deleted.

no the problem is that I said ‘brtfs subvolume delete’ and assumed that you would replace in the command I gave you list by delete. Instead you entered btrfs subvolume delete directly and omitted the nsenter command. This nsenter stuff is essential with snap since in this case the storage is mapped only for the lxd process, not your user process. So reenter the btrfs subvolume delete with all the nsenter incantation and it should work better.

Oki, sorry, I didn’t understand. Unfortunately, it is still not working

% sudo nsenter -t $(pgrep daemon.start) -m -- /snap/lxd/current/bin/btrfs subvolume delete /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working
Delete subvolume (no-commit): '/var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working'
ERROR: cannot delete '/var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working': Operation not permitted

I’m not sure what happens here. I’d have expected that by running sudo nsenter you would have inherited the root powers of the lxd process. Maybe try to delete directly ntp-backup ? or even adding another sudo before the btrfs command ?

Neither are working :confused:

The system doesn’t really like adding the sudo ^^

% sudo nsenter -t $(pgrep daemon.start) -m -- sudo /snap/lxd/current/bin/btrfs subvolume delete /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working
[sudo] password for clement:
sudo: unable to stat /etc/sudoers: No such file or directory
sudo: no valid sudoers sources found, quitting
sudo: unable to initialize policy plugin

Removing data directly returns a bunch of error messages

# rm -r /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working/
rm: cannot remove '/var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working/backup.yaml': Read-only file system

so much for sudo, but i did not think to using rm, I was meaning using btrfs subvolume delete on containers-snapshots/ntp-backup

Still no luck

% sudo nsenter -t $(pgrep daemon.start) -m -- /snap/lxd/current/bin/btrfs subvolume delete /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup
ERROR: not a subvolume: /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup

got it I think.
sudo nsenter -t $(pgrep daemon.start) -m – ls -l /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup
should show you that the snapshot has a ‘+’ displayed showing that it has an ACL set. I think that using getfacl and setfacl -b should get you to the light (do not forget to use nsenter)

I’m not so sure to understand, but here are the results

% sudo nsenter -t $(pgrep daemon.start) -m -- ls -l /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup
total 0
drwx--x--x 1 root root 78 May 15 00:25 working

I don’t see anything unusual. I don’t know the two other tools

% sudo nsenter -t $(pgrep daemon.start) -m -- getfacl /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup
getfacl: Removing leading '/' from absolute path names
# file: var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup
# owner: root
# group: root
user::rwx
group::--x
other::--x

% sudo nsenter -t $(pgrep daemon.start) -m -- getfacl /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working
getfacl: Removing leading '/' from absolute path names
# file: var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working
# owner: root
# group: root
user::rwx
group::--x
other::--x

The setfacl -b command didn’t return any error, a getfacl returned the same output on both directories, I still cannot remove the subvolume.

Baffling. Can you try

sudo nsenter -t $(pgrep daemon.start) -m -- chmod g+rw,o+rw /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working

does it work ? if yes, is still btrfs subvolume delete returning no perm ? If yes, probably the storage must be read only and the perm error is a bad error message.
Maybe try btrfs scrub then (still with nsenter of course)
Or possibly restart lxd with sudo snap restart lxd. Maybe it’s as simple as that (it would be a bug of course). Try this first.

The first command didn’t work

% sudo nsenter -t $(pgrep daemon.start) -m -- chmod g+rw,o+rw /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working
chmod: changing permissions of '/var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working': Read-only file system

btrfs scrub didn’t return any error

% sudo nsenter -t $(pgrep daemon.start) -m -- /snap/lxd/current/bin/btrfs scrub start -B /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working
WARNING: cannot create scrub data file, mkdir /var/lib/btrfs failed: Read-only file system. Status recording disabled
WARNING: failed to open the progress status socket at /var/lib/btrfs/scrub.progress.de67eea8-b6fc-40c8-bc0e-55f293f95
77e: No such file or directory. Progress cannot be queried
scrub done for de67eea8-b6fc-40c8-bc0e-55f293f9577e
        scrub started at Wed May 22 09:31:01 2019 and finished after 00:00:30
        total bytes scrubbed: 4.05GiB with 0 errors

So, I restarted lxd

 % sudo snap restart lxd
Restarted.

Try to delete the volume again from lxd without success

% lxc storage volume delete default ntp-backup/working
Error: No such object

However, btrfs command are working again

% sudo nsenter -t $(pgrep daemon.start) -m -- /snap/lxd/current/bin/btrfs subvolume delete /var/snap
/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working
Delete subvolume (no-commit): '/var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/ntp-backup/working'

BUT, it is still listed in storage

% lxc storage volume list default
+----------------------+---------------------+-------------+---------+----------+
|         TYPE         |        NAME         | DESCRIPTION | USED BY | LOCATION |
+----------------------+---------------------+-------------+---------+----------+
| container (snapshot) | ntp-backup/working  |             | 1       | server   |
+----------------------+---------------------+-------------+---------+----------+
| container (snapshot) | template/2019051201 |             | 1       | server2  |
+----------------------+---------------------+-------------+---------+----------+

I did the same on server2 for the other snapshot. The subvolume are indeed gone, but still listed in the storage of lxd. So, I cannot delete the storage default

 % lxc storage delete default
Error: storage pool "default" has volumes attached to it

Did you try to restart again lxd after deleting snapshots with btrfs ? maybe lxd needs to be informed that you deleted stuff.

I tried but it doesn’t update the storage status. I also tried to stop lxd on both servers at the same time, then start again, but it’s not working either.

so if you run

sudo nsenter -t $(pgrep daemon.start) -m – /snap/lxd/current/bin/btrfs subvolume list /var/snap/lxd/common/lxd/storage-pools/default

you do not see your snapshots anymore but they can still be seen with lxc storage list default ?

yes, exactly

oh, yuck. I’m pretty sure that it worked for me.
At this point, I am at a loss for rational answers. Maybe restart computers ? Or trying to be a bit Conan-the-Barbarian with lxc storage edit default ???

Conan-the-Barbarian it is. I exported all my containers, removed lxd on both servers and removed all subvolumes and /var/snap/lxd folders. I also removed the partition corresponding to my second storage. I reinstalled lxd without defining a storage and added my own afterward. It is ok now. I don’t know what happened.

For the record, looking at LXD code, I think that this wasl not bad but not sufficient:

        // Delete the mountpoint.
        if shared.PathExists(customSubvolumeName) {
                err = os.Remove(customSubvolumeName)
                if err != nil {
                        return err
                }
        }

        err = s.s.Cluster.StoragePoolVolumeDelete(
                "default",
                s.volume.Name,
                storagePoolVolumeTypeCustom,
                s.poolID)
        if err != nil {
                logger.Errorf(`Failed to delete database entry for BTRFS storage volume "%s" on storage pool "%s"`, s.volume.Name, s.pool.Name)
        }

Deleting the mountpoint and the sqlite database entry (when using a cluster as it is the case for you) were necessary as well. More difficult than I thought, maybe it worked for me because I’m not using clusters.