Cannot delete container on ZFS: dataset is busy

amartin33 · August 23, 2022, 3:26pm

I’m running LXD 5.4 on Ubuntu 20.04 using ZFS as the storage backend:

config:
  source: storage
  volatile.initial_source: storage
  zfs.pool_name: storage
description: ""
name: default
driver: zfs

When I attempt to delete a container, it fails as follows:

lxc delete mycontainer
Error: Error deleting storage volume: Failed to run: zfs destroy -r storage/containers/mycontainer: cannot destroy 'storage/containers/mycontainer': dataset is busy

I do not see the container listed in /proc/mounts or /proc/*/mounts but I do see the following in the output of lsof:

lxd         48766                                root   24w      REG              259,2          0    1180352 /var/snap/lxd/common/lxd/logs/mycontainer/lxc.log.old (deleted)

that log file contains the following:

lxc mycontainer 20220819195925.549 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc mycontainer 20220819195925.549 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc mycontainer 20220819195925.550 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc mycontainer 20220819195925.550 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing
lxc mycontainer 20220819195925.550 WARN     cgfsng - ../src/src/lxc/cgroups/cgfsng.c:fchowmodat:1611 - No such file or directory - Failed to fchownat(42, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc mycontainer 20220821161531.373 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3592 - newuidmap binary is missing
lxc mycontainer 20220821161531.373 WARN     conf - ../src/src/lxc/conf.c:lxc_map_ids:3598 - newgidmap binary is missing

Running systemctl restart snap.lxd.daemon makes the log file no longer show up in lsof's output, but subsequent attempts to delete the container also fail with dataset is busy. How can I successfully delete this container and prevent this from happening in the future? Thanks!

tomp · August 25, 2022, 7:38am

Have you tried rebooting, this might free up what ever resource is holding the mount open.

amartin33 · September 6, 2022, 3:28pm

Rebooting does indeed fix the problem, but it is not possible to reboot a production server hosting many LXD containers on a regular basis, so a different solution (that doesn’t impact production uptime) is needed

tomp · September 6, 2022, 6:23pm

Sounds like its caused by this issue with zfs https://github.com/lxc/lxd-pkg-snap/issues/61#issuecomment-1238497150