Error removing LVM logical volume

Hey,

Sometimes, I don’t know why and when and how, the container according to LXD is still in use.
As suggested on github I did a bit debugging this crap out of it

I can reproduce this on different systems, all running ubuntu 20.04 with different kernels and different operating systems. But until now, I don’t know the fuck what is causing it.

I would like to shoot it in the face but I don’t know where its.
Maybe someone has a suggestions, so far someone did suggest I shall turn it off and on again until suddently it shits itself.

I try this later, if someone else has a better suggestion, lemme know.
Thanks guys

Correction, as suggestion I shall unmount the container:

root@lima:~# sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- umount /var/snap/lxd/common/lxd/storage-pools/secondary/containers/lxcc39dbb31
umount: /var/snap/lxd/common/lxd/storage-pools/secondary/containers/lxcc39dbb31: not mounted.
root@lima:~# sudo umount /var/snap/lxd/common/lxd/storage-pools/secondary/containers/lxcc39dbb31
umount: /var/snap/lxd/common/lxd/storage-pools/secondary/containers/lxcc39dbb31: not mounted.

However, it appears not tbe mounted?
mountinfo still suggest its mounted.

So there are two issues going on here:

  1. How to clear the existing mounts.
  2. What is causing it in the first place.

For 1:

Please can you provide output of sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- cat /proc/mounts please.

For 2:

Can you reboot the machine so you have a fresh mount table and then try starting and stopping the container until the problem arises. At that point can you provide a copy of /var/snap/lxd/common/lxd/logs/lxd.log for debugging.

For number 1:
https://pastebin.com/raw/h3HsQVJ6

For number 2, small bash script will do.

I let you know when it shits itself.
Going to play some games, until then.

thanks for now.

@stgraber any ideas on this one, the host sees the volume mounted but cannot unmount it, and the snap mount namespace doesn’t see the volume mounted (hence why its trying to mount it).

What does grep lxcc39dbb31 /proc/*/mountinfo show you?

See location: nonemetadata: class: task created_at: "2021-09-28T17:14:17.393 - Pastebin.com

Can you try nsenter -t 91906 -m umount /var/snap/lxd/common/lxd/storage-pools/secondary/containers/lxcc39dbb31?

I rebooted the system as suggested and try to re produce the bug right now by starting and stopping it.
As soon it hangs again, I let you know.

So far the script restarted the container 120 times, no issue yet.
I will check if I find a stuck container somewhere else, and give you the output when I get to it.

The script rebooted the container the entire fucking night, no issue yet.
Either I am unlucky or the bug needs something else.

I think something else, is triggering it and when you shut it down, the issue gets visible.

The script ran for another day, can’t reproduce the error yet.
Will check for another stuck container and post the debug output here shortly.

Maybe related to:

I found another container:

@stgraber

Can you try:

nsenter -t 1471 -m umount /var/snap/lxd/common/shmounts/storage-pools/primary/containers/lxc14fc3901

Seems not to work.
https://pastebin.com/raw/dxnVfaQf

Can you show cat /proc/1471/mountinfo?

Sure
https://pastebin.com/raw/GG5UDYDj

Something caused some repeated overmounting of the shmounts directory somehow…
Can you show journalctl -u snap.lxd.daemon -n 500 as well as snap changes?

Yea.
https://pastebin.com/raw/kYxDujPg

The journalctl output looks significantly shorter than the requested 500 lines.

That’s all what it returns.