Lxc delete result in Failed to destroy ZFS filesystem dataset is busy

Hi,

lxc delete lxc1602
Error: Failed to destroy ZFS filesystem: Failed to run: zfs destroy -r lxc-vrtx-zfs-storage/containers/lxc1602: cannot destroy ‘lxc-vrtx-zfs-storage/containers/lxc1602’: dataset is busy

grep lxc-vrtx-zfs-storage/containers/lxc1602 /proc/*/mounts

/proc/11518/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/11768/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/11811/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/13965/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/15420/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/16135/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/17775/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/2219/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/2983/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/30212/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/32541/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/37696/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/46040/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/48152/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

/proc/48376/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

These are all the lxc monitor processes from the other lxc containers that run on zfs too:

cat /proc/11768/cmdline
[lxc monitor] /var/snap/lxd/common/lxd/containers lxc1593

cat /proc/11811/cmdline
[lxc monitor] /var/snap/lxd/common/lxd/containers lxc1592

cat /proc/13965/cmdline
[lxc monitor] /var/snap/lxd/common/lxd/containers lxc1598

AND

cat /proc/2219/cmdline
lxcfs/var/snap/lxd/common/var/lib/lxcfs-p/var/snap/lxd/common/lxcfs.pid

Any idea how to fix this ? Is that normal ?

Thank you !
Greetings
Oliver

That’s not normal but also not unheard of, we’re yet to find a good way to have zfs properly get rid of mounts on delete…

A workaround should be:

nsenter -t 37696 -m -- umount /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602

Hi,

because i started yesterday the lxd daemon multiple times to analyse other issues, its now down to one process blocking things:

grep lxc-vrtx-zfs-storage/containers/lxc1602 /proc/*/mounts

/proc/2219/mounts:lxc-vrtx-zfs-storage/containers/lxc1602 /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602 zfs rw,relatime,xattr,posixacl 0 0

nsenter -t 2219 -m -- umount /var/snap/lxd/common/lxd/storage-pools/lxc-vrtx-zfs-storage/containers/lxc1602

nsenter: failed to execute umount: No such file or directory

This PID 2219 is:

lxcfs/var/snap/lxd/common/var/lib/lxcfs-p/var/snap/lxd/common/lxcfs.pid

Any other idea ?

Edit: Also after changing to the lxd 3.17 candidate release, nothing changed here.

I’m having a similar problem with lxc storage delete. I created a pool and dataset outside of lxc for testing, and then went to delete it when done:

root@uberphoenix:~# lxc storage delete pool1
Error: Failed to delete the ZFS pool: Failed to run: zfs destroy -r uberphoenix/lxc: cannot destroy 'uberphoenix/lxc': dataset is busy
root@uberphoenix:~# lxc storage list
+---------+-------------+--------+------------------------------------------------+---------+
|  NAME   | DESCRIPTION | DRIVER |                     SOURCE                     | USED BY |
+---------+-------------+--------+------------------------------------------------+---------+
| default |             | dir    | /var/snap/lxd/common/lxd/storage-pools/default | 4       |
+---------+-------------+--------+------------------------------------------------+---------+
| pool1   |             | zfs    | uberphoenix/lxc                                | 0       |
+---------+-------------+--------+------------------------------------------------+---------+
root@uberphoenix:~# zfs list
NAME                  USED  AVAIL     REFER  MOUNTPOINT
uberphoenix          1.46M  7.04T      192K  /
uberphoenix/lxc       192K  7.04T      192K  none
uberphoenix/testing   192K  7.04T      192K  /testing

(the testing dataset is unrelated).

I was then able to manually delete the dataset with zfs destroy -r uberphoenix/lxc, and then delete the storage pool.

It’s as if the lxc process itself was opening a handle on the dataset, causing the destroy to fail.

Hi,

yes, you are right i think:

This PID 2219 is:

lxcfs/var/snap/lxd/common/var/lib/lxcfs-p/var/snap/lxd/common/lxcfs.pid

LXC itself keeps it busy, so can not be deleted.

But just as Stéphane already wrote is this somehow a known behaviour.

I hope there can be a fix very very soon, otherwise using zfs with lxd or lxd with zfs is simply too dangerous, if you risk that basic functions like removing containers are not working properly.

I am also seeing this issue when multiple containers are created at the same time. Looks like the daemon gets into a weird state and that causes the error.