RBD issue when issuing LXC move


(Michael Hoyle) #1

This is a 3 node LXD cluster via snap version 3.8. In this case, the prometh container has been stopped. I’m just trying to rename it. Any guidance to clear this up? I made a copy of prometh called prometh-tmp. Was able to move it to another node with no issues.

root@lxd2-a:/home/choyle# lxc move prometh prometh-bak
Error: Failed to run: rbd --id admin --cluster ceph --pool rbd unmap container_prometh: rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy
root@lxd2-a:/home/choyle#


(Michael Hoyle) #2

Also tried

root@lxd2-a:/dev/rbd/rbd# rbd unmap container_prometh
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy


(Stéphane Graber) #3

This suggests the rbd device is still active, most likely because of a mount of some kind.

If you can figure out the rbd device number, you could then grep for it in /proc/*/mountinfo which would tell you what process is keeping the mount active, preventing it from getting unmapped.


(Michael Hoyle) #4

I used “rbd showmapped” to get the rbd#, so I grepped

root@lxd2-a:/dev/rbd/rbd# grep rbd2 /proc/*/mountinfo
/proc/2256841/mountinfo:187 713 250:32 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometh rw,relatime - ext4 /dev/rbd2 rw,discard,stripe=1024,data=ordered
/proc/2257062/mountinfo:187 713 250:32 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometh rw,relatime - ext4 /dev/rbd2 rw,discard,stripe=1024,data=ordered
/proc/2257748/mountinfo:187 713 250:32 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometh rw,relatime - ext4 /dev/rbd2 rw,discard,stripe=1024,data=ordered
/proc/2798897/mountinfo:476 812 250:32 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometh rw,relatime - ext4 /dev/rbd2 rw,discard,stripe=1024,data=ordered
/proc/3874784/mountinfo:476 812 250:32 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometh rw,relatime - ext4 /dev/rbd2 rw,discard,stripe=1024,data=ordered

Any guidance on which process to kill without impacting something else?


(Stéphane Graber) #5

Unlikely to be a process, more likely to just be a mount namespace reference, so doing something like:

  • nsenter -t 2256841 -m – umount /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometheus

Should succeed and running your grep again should be empty, letting you unmap the rbd device.

I pushed a change to the LXD snap last week which should help preventing such issues in the future, at least for the cases that we fully understand now.