Custom storage volume not getting deleted

Error :

I’ve been trying to run travis-worker locally however I’m running into LXC storage volume issues.

root@travis-worker-anup:~# lxc storage volume list data
+--------+----------------------------------------------------+-------------+--------------+---------+
|  TYPE  |                        NAME                        | DESCRIPTION | CONTENT-TYPE | USED BY |
+--------+----------------------------------------------------+-------------+--------------+---------+
| custom | travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker |             | filesystem   | 0       |
+--------+----------------------------------------------------+-------------+--------------+---------+
root@travis-worker-anup:~# lxc storage volume delete data travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker
Storage volume travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker deleted
root@travis-worker-anup:~# lxc storage volume list data
+--------+----------------------------------------------------+-------------+--------------+---------+
|  TYPE  |                        NAME                        | DESCRIPTION | CONTENT-TYPE | USED BY |
+--------+----------------------------------------------------+-------------+--------------+---------+
| custom | travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker |             | filesystem   | 0       |
+--------+----------------------------------------------------+-------------+--------------+---------+
root@travis-worker-anup:~#

As you can see, even though the delete command returns that the storage volume was deleted, it’s still showing up in the list. It’s very annoying since this error gets travis-worker to fail with

level=error msg="couldn't create the container Docker storage volume" err="Volume name \"travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker\" already exists."

Things I’ve Tried :

  • Restart lxd daemon ( systemctl restart snap.lxd.daemon )
  • Reboot the VM

Other details :

  • lxc version: 5.0.0
  • platform: ppc64le

I changed the topic type to LXD rather than LXC.

Can you show output of lxc storage show data?

root@travis-worker-anup:~# lxc storage show data
config:
  source: /mnt/travis-docker-data
description: ""
name: data
driver: dir
used_by:
- /1.0/storage-pools/data/volumes/custom/travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker
status: Created
locations:
- none

Can you show:

sudo ls -la /mnt/travis-docker-data/custom/

Can you also show:

sudo lxd sql global 'select * from storage_volumes'
root@travis-worker-anup:~# sudo ls -la /mnt/travis-docker-data/custom/
total 12
drwx--x--x 3 root root 4096 Apr 20 10:58 .
drwxr-xr-x 9 root root 4096 Apr  9 15:02 ..
drwx--x--x 2 root root 4096 Apr 20 10:58 default_travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker

root@travis-worker-anup:~# sudo lxd sql global 'select * from storage_volumes'
+-----+------------------------------------------------------------------+-----------------+---------+------+-------------+------------+--------------+
| id  |                               name                               | storage_pool_id | node_id | type | description | project_id | content_type |
+-----+------------------------------------------------------------------+-----------------+---------+------+-------------+------------+--------------+
| 4   | d8b2d2161a497b7e1a2d897eb4880db672b2f03c24e6ae940c5975276d0ecf4d | 1               | 1       | 1    |             | 1          | 0            |
| 39  | a51246c878bd670fc00c14c5dcc9a4b96d78b2ce97e9bf4f39174282bdb1526d | 1               | 1       | 1    |             | 1          | 0            |
| 44  | 0b2823d07a5b4e2dc7a92ec2f03b93702c59cea229fbe78697e1f88e1ea2c1ad | 1               | 1       | 1    |             | 1          | 0            |
| 45  | endless-doberman                                                 | 1               | 1       | 0    |             | 1          | 0            |
| 137 | travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker               | 2               | 1       | 2    |             | 1          | 0            |
+-----+------------------------------------------------------------------+-----------------+---------+------+-------------+------------+--------------+

I’ve not been able to recreate the issue:

lxd init --auto
mkdir /mnt/data
lxc storage create data dir source=/mnt/data
Storage pool data created
lxc storage volume create data travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker
Storage volume travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker created
 lxc storage volume ls data
+--------+----------------------------------------------------+-------------+--------------+---------+
|  TYPE  |                        NAME                        | DESCRIPTION | CONTENT-TYPE | USED BY |
+--------+----------------------------------------------------+-------------+--------------+---------+
| custom | travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker |             | filesystem   | 0       |
+--------+----------------------------------------------------+-------------+--------------+---------+
ls /mnt/data/custom
default_travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker
lxc storage volume delete data travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker
Storage volume travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker deleted
ls /mnt/data/custom
lxc storage volume ls data
+------+------+-------------+--------------+---------+
| TYPE | NAME | DESCRIPTION | CONTENT-TYPE | USED BY |
+------+------+-------------+--------------+---------+

Can you manually remove /mnt/travis-docker-data/custom/default_travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker?

Nope. Doesn’t work. I’ll try reinstalling lxd.

I would try rebooting first.

What error do you get though?

I’ve tried rebooting but unfortunately doesn’t do anything.

What error do you get though?

Nothing. I removed the /mnt/travis-docker-data/custom/default_travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker directory myself and ran lxc storage volume delete data travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker just to be sure. It still doesn’t get rid of that storage volume.

OK so don’t reinstall just yet otherwise we may lose ability to diagnose and fix the bug

Sure. Also couldn’t it be possible that the problem actually lies here?

I don’t think so as

lxc storage volume delete data travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker
Storage volume travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker deleted

Is pure LXD and should have resulted in the DB record being removed.

Can you show output of:

sudo lxd sql global 'select * from storage_pools'

Can you also run a debug monitor and then try to delete the storage volume again and capture the events that occur when that is run.

lxc monitor --type=logging --pretty

So instead of deleting /mnt/travis-docker-data/custom/default_travis-job-xxxxxxxx-xxxxx-travis-job-721271-docker I deleted the whole custom directory as it only had “default_travis-job…”
and then made an empty custom directory, which seems to have resolved the issue.

Anyway, here’s the thing you asked for:

ubuntu@travis-worker-anup:~/worker$ sudo lxd sql global 'select * from storage_pools'
+----+-----------+--------+-------------+-------+
| id |   name    | driver | description | state |
+----+-----------+--------+-------------+-------+
| 3  | instances | zfs    |             | 1     |
| 4  | data      | dir    |             | 1     |
+----+-----------+--------+-------------+-------+

OK good. Although I don’t think we can identify and resolve the bug in LXD now that you’ve made manual changes. I was really after the debug log I requested to see why LXD thought it had removed it when it had left both the dir and the DB record behind.

I’ll try to reproduce the situation again. I ran into it after repeatedly cancelling travis jobs while they were executing, and I’ll keep the debug monitor running :smiley:

Thanks, it would be preferable to fix the issue rather than have to make manual clear up steps each time. :slight_smile: