Error removing LVM logical volume

FireLXC · September 28, 2021, 3:38pm

Hey,

Sometimes, I don’t know why and when and how, the container according to LXD is still in use.
As suggested on github I did a bit debugging this crap out of it

I can reproduce this on different systems, all running ubuntu 20.04 with different kernels and different operating systems. But until now, I don’t know the fuck what is causing it.

I would like to shoot it in the face but I don’t know where its.
Maybe someone has a suggestions, so far someone did suggest I shall turn it off and on again until suddently it shits itself.

I try this later, if someone else has a better suggestion, lemme know.
Thanks guys

FireLXC · September 28, 2021, 3:52pm

Correction, as suggestion I shall unmount the container:

root@lima:~# sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- umount /var/snap/lxd/common/lxd/storage-pools/secondary/containers/lxcc39dbb31
umount: /var/snap/lxd/common/lxd/storage-pools/secondary/containers/lxcc39dbb31: not mounted.
root@lima:~# sudo umount /var/snap/lxd/common/lxd/storage-pools/secondary/containers/lxcc39dbb31
umount: /var/snap/lxd/common/lxd/storage-pools/secondary/containers/lxcc39dbb31: not mounted.

However, it appears not tbe mounted?
mountinfo still suggest its mounted.

tomp · September 28, 2021, 4:16pm

So there are two issues going on here:

How to clear the existing mounts.
What is causing it in the first place.

For 1:

Please can you provide output of sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- cat /proc/mounts please.

For 2:

Can you reboot the machine so you have a fresh mount table and then try starting and stopping the container until the problem arises. At that point can you provide a copy of /var/snap/lxd/common/lxd/logs/lxd.log for debugging.

FireLXC · September 28, 2021, 4:45pm

For number 1:
https://pastebin.com/raw/h3HsQVJ6

For number 2, small bash script will do.

I let you know when it shits itself.
Going to play some games, until then.

thanks for now.

tomp · September 28, 2021, 4:48pm

@stgraber any ideas on this one, the host sees the volume mounted but cannot unmount it, and the snap mount namespace doesn’t see the volume mounted (hence why its trying to mount it).

stgraber · September 28, 2021, 7:28pm

What does grep lxcc39dbb31 /proc/*/mountinfo show you?

FireLXC · September 28, 2021, 7:30pm

See location: nonemetadata: class: task created_at: "2021-09-28T17:14:17.393 - Pastebin.com

stgraber · September 28, 2021, 7:41pm

Can you try nsenter -t 91906 -m umount /var/snap/lxd/common/lxd/storage-pools/secondary/containers/lxcc39dbb31?

FireLXC · September 28, 2021, 7:43pm

I rebooted the system as suggested and try to re produce the bug right now by starting and stopping it.
As soon it hangs again, I let you know.

So far the script restarted the container 120 times, no issue yet.
I will check if I find a stuck container somewhere else, and give you the output when I get to it.

FireLXC · September 29, 2021, 5:14am

The script rebooted the container the entire fucking night, no issue yet.
Either I am unlucky or the bug needs something else.

I think something else, is triggering it and when you shut it down, the issue gets visible.

FireLXC · September 30, 2021, 9:35am

The script ran for another day, can’t reproduce the error yet.
Will check for another stuck container and post the debug output here shortly.

FireLXC · September 30, 2021, 12:47pm

Maybe related to:

I found another container:
https://pastebin.com/BgAFK1t9

@stgraber

stgraber · September 30, 2021, 2:01pm

Can you try:

nsenter -t 1471 -m umount /var/snap/lxd/common/shmounts/storage-pools/primary/containers/lxc14fc3901

FireLXC · September 30, 2021, 2:03pm

Seems not to work.
https://pastebin.com/raw/dxnVfaQf

stgraber · September 30, 2021, 2:07pm

Can you show cat /proc/1471/mountinfo?

FireLXC · September 30, 2021, 2:10pm

Sure
https://pastebin.com/raw/GG5UDYDj

stgraber · September 30, 2021, 2:29pm

Something caused some repeated overmounting of the shmounts directory somehow…
Can you show journalctl -u snap.lxd.daemon -n 500 as well as snap changes?

FireLXC · September 30, 2021, 2:33pm

Yea.
https://pastebin.com/raw/kYxDujPg

stgraber · September 30, 2021, 2:35pm

The journalctl output looks significantly shorter than the requested 500 lines.

FireLXC · September 30, 2021, 2:36pm

That’s all what it returns.