HowTo: Delete Container with CEPH RBD volume giving Device or resource busy

I’ve ran into this several times with CEPH, this time I was deleting some unused docker containers.

Log messages look like this:

unmap container_docker-2: rbd: sysfs write failed\nrbd: unmap failed: (16) Device or resource busy"
t=2019-10-09T21:45:36-0500 lvl=eror msg=“Failed to delete RBD storage volume for container "docker-2" on storage pool "remote"”
t=2019-10-09T21:45:36-0500 lvl=eror msg=“Failed deleting container storage” err=“Failed to delete RBD storage volume for container "docker-2" on storage pool "remote"” name=docker-2

I found an easy fix between these forums and ceph-users though.

Here’s the process, first find the /dev path:

cat /proc/*/mountinfo | grep docker-2

1757 346 252:80 / /var/snap/lxd/common/shmounts/storage-pools/remote/containers/docker-2 rw,relatime shared:653 - ext4 /dev/rbd5 rw,discard,stripe=1024,data=ordered

sudo rbd unmap -o force /dev/rbd5
lxc delete docker-2

1 Like

Hmm, the fact that the remaining reference is in common/shmounts suggests a bug in the mntns logic we have in the LXD snap. We’ve seen a few issues with that before but need to patch the tool to be more verbose on failures so that we may track those down for good.

Ya, I’m not complaining just documenting a fix that is reliable and easy.

Pretty sure I hit this same issue when trying to resize disks too. Workaround for that is to set the default size and copy the container if anyone is looking for that answer instead or maybe this force unmap would work too. Need to test.

thank you for information. It is useful.

Have same issue… Cannot delete a container because his rbd device is “busy”.

Everytime the rbd device is mapped inside an OSD namespace launched with podman, it happens about 1 of 20 time and everytime the lock is in an Podman OSD NS.

here i cannot remove the ec-5aca19ff container :

root@ceph06:~# rbd showmapped 
id  pool                              namespace  image                    snap  device   
4             container_ec-5aca19ff    -     /dev/rbd4
root@ceph06:~# grep rbd4 /proc/*/mountinfo
/proc/3377307/mountinfo:3897 3721 253:64 / /rootfs/var/lib/incus/storage-pools/default/containers/ec-5aca19ff rw,relatime - ext4 /dev/rbd4 rw,discard,stripe=16
/proc/3377309/mountinfo:3897 3721 253:64 / /rootfs/var/lib/incus/storage-pools/default/containers/ec-5aca19ff rw,relatime - ext4 /dev/rbd4 rw,discard,stripe=16
root@ceph06:~# ps -fauxw | grep 3377307
root     3377307  0.0  0.0   1072     4 ?        Ss   Apr02   0:00  \_ /run/podman-init -- /usr/bin/ceph-osd -n osd.24 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false
root@ceph06:~# ps -fauxw | grep 3377309
167      3377309  1.0  0.4 1921140 1280924 ?     Sl   Apr02  27:57      \_ /usr/bin/ceph-osd -n osd.24 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false
root@ceph06:~# nsenter -t 3377309 -m -- umount /rootfs/var/lib/incus/storage-pools/default/containers/ec-5aca19ff
root@ceph06:~# nsenter -t 3377307 -m -- umount /rootfs/var/lib/incus/storage-pools/default/containers/ec-5aca19ff
umount: /rootfs/var/lib/incus/storage-pools/default/containers/ec-5aca19ff: umount failed: No such file or directory.
root@ceph06:~# rbd unmap /dev/rbd4

I’m trying to find out the root cause and find out how to have a systematic reproduction scheme and gotcha ! It seems to be related to adding OSD :

  • I had started a loop who create / start / add interfaces / stop / delete a container and it works for thousands iterations … If I add an OSD while in loop it failed and the namespace who hold the rbd device is … the new OSD podman !

This morning I will start another loop who create image / map / format / write something / unmount / umap / delete and see if the OSD add broke the chain.

So maybe more related to Ceph/RBD than Incus.