RBD issue when issuing LXC move

ncpe2001 · January 4, 2019, 1:41am

This is a 3 node LXD cluster via snap version 3.8. In this case, the prometh container has been stopped. I’m just trying to rename it. Any guidance to clear this up? I made a copy of prometh called prometh-tmp. Was able to move it to another node with no issues.

root@lxd2-a:/home/choyle# lxc move prometh prometh-bak
Error: Failed to run: rbd --id admin --cluster ceph --pool rbd unmap container_prometh: rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy
root@lxd2-a:/home/choyle#

ncpe2001 · January 4, 2019, 1:57am

Also tried

root@lxd2-a:/dev/rbd/rbd# rbd unmap container_prometh
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy

stgraber · January 7, 2019, 8:23pm

This suggests the rbd device is still active, most likely because of a mount of some kind.

If you can figure out the rbd device number, you could then grep for it in /proc/*/mountinfo which would tell you what process is keeping the mount active, preventing it from getting unmapped.

ncpe2001 · January 12, 2019, 2:35pm

I used “rbd showmapped” to get the rbd#, so I grepped

root@lxd2-a:/dev/rbd/rbd# grep rbd2 /proc/*/mountinfo
/proc/2256841/mountinfo:187 713 250:32 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometh rw,relatime - ext4 /dev/rbd2 rw,discard,stripe=1024,data=ordered
/proc/2257062/mountinfo:187 713 250:32 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometh rw,relatime - ext4 /dev/rbd2 rw,discard,stripe=1024,data=ordered
/proc/2257748/mountinfo:187 713 250:32 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometh rw,relatime - ext4 /dev/rbd2 rw,discard,stripe=1024,data=ordered
/proc/2798897/mountinfo:476 812 250:32 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometh rw,relatime - ext4 /dev/rbd2 rw,discard,stripe=1024,data=ordered
/proc/3874784/mountinfo:476 812 250:32 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometh rw,relatime - ext4 /dev/rbd2 rw,discard,stripe=1024,data=ordered

Any guidance on which process to kill without impacting something else?

stgraber · January 13, 2019, 1:21pm

Unlikely to be a process, more likely to just be a mount namespace reference, so doing something like:

nsenter -t 2256841 -m – umount /var/snap/lxd/common/lxd/storage-pools/remote/containers/prometheus

Should succeed and running your grep again should be empty, letting you unmap the rbd device.

I pushed a change to the LXD snap last week which should help preventing such issues in the future, at least for the cases that we fully understand now.

Maran · November 4, 2020, 11:08am

Sorry for the necro but I’m having the exact same issue.

#grep "rbd1 rw" /proc/*/mountinfo 2>/dev/null
/proc/3517630/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3517655/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3517692/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3537155/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3537175/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3554782/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3555055/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3555152/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3555169/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3556119/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3556346/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3556408/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3556722/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3556809/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3556919/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16
/proc/3556973/mountinfo:360 348 252:16 / /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us rw,relatime shared:76 - ext4 /dev/rbd1 rw,discard,stripe=16

Sadly for me the nsenter command does not appear to be doing anything.

# nsenter -t 3517630 -m - umount /var/snap/lxd/common/lxd/storage-pools/remote/containers/among-us
nsenter: failed to execute -: No such file or directory

Any other way to solve this?

stgraber · November 4, 2020, 2:44pm

Remove the dash between the PID and umount

Maran · November 4, 2020, 3:50pm

Thanks that did the trick!

Is there anything you know of that could trigger this since I seem to get this consistently, some things I should perhaps not be doing?