Why does LXD kill this import process, What does "Error: Post hook: In use" even mean, and how do I solve it?

Do you recall what error you got before the in use errors? You can private message me I think whilst you’re in your new user daily limit.

I notice you were the author of How can I control the file system path used by lxc import commands to avoid failures? suggesting you originally run out of disk space. This gives me a useful hint as to what left the orphaned internal reference counter which is now preventing import completion.

The systemctl reload snap.lxd.daemon command allowed the import process to complete successfully, so whatever LXD was failing to clean up got cleaned up by reloading the process. That seems like a bug. (Also, having more useful/informative error messages would be more helpful.)

I’m answering the rest of the questions just for reference.

The only errors I had observed follow:

  1. Error about running out of disk space. This was before my first topic on how to get LXD to use the ZFS pool for imports. (Note that my previous comment specified that I hadn’t run out of space while using the ZFS pool. They’re also on different mountpoints.)

  2. Error about missing profile. (I manually created the profile and tried again. Would be nice if this could be checked very early in the process; I ran into this at least 15 minutes after launch.)

  3. The terminated error in the OP, which shows up whenever I detach from my byobu session.

  4. The Error: Post hook: In use.

I’ve not observed any other error messages besides these 4. Errors #1 and #2 are solved already, #3 has no explanation but is very easy to reproduce, and #4 is the one I was stuck on.

Yes, but that was when LXD was using an unexpected location in the root file system instead of the ZFS pool. Prior to using the ZFS pool/dataset, it’d get to ~14% before running the USB drive out of space.

Yes I suspect the bug was caused when the import process was killed when you disconnected from the terminal. I’ll try and recreate that using screen or tmux too. I suspect what ever is killing it is somehow stopping the cleanup of resources created during the transfer taking place.

We could change the ‘in use’ error to ‘volume in use’ which is all LXD really knows about that error.

Glad you got it resolved though.

Thanks. I marked your command as the solution to the original issue.

Also, I noticed that if I enter a container, i.e. lxc exec <container> -- zsh, as soon as I F6 from my byobu session, something happens, and when you get back, the terminal you had left inside the LXC is all messed up. You basically have no choice but to exit and then open a new PTY via F2. I had seen this before (months ago) and was somewhat rarer, but now it’s almost predictable.

I’ve just tried using byobu to start a local terminal and then creating a container and installing zsh inside it.

I then ran lxc exec <container> -- zsh inside the byobu session and pressed F6 to detach.

When I reattached using byobu command the zsh shell inside the container was still running and usable. I repeated this a few times without being able to see any issues.

I also tried exporting and importing a ZFS backed container using optimized and unoptimized storage and while it was importing pressing F6 to detach and left it detached until the LXD debug logs showed the import succeeded OK. I then reattached and the container was imported fine.

I’ll look into whether it is possible that a failed import can leave an orphaned volume in-use counter around, but I’ve not been able to recreate the issues with byobu.

What OS and version are you running, I’m wondering if you have a version of systemd that is killing your running commands when you detach from the shell.

This bug was borne out of a discussion on the #lxc channel recently and I just thought maybe could be partially related: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989317

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

$ systemd --version
systemd 245 (245.4-4ubuntu3.6)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD+IDN2 -IDN +PCRE2 default-hierarchy=hybrid

I’ve recorded a demo of the problem. The unlisted video can be seen here: https://youtu.be/dJcaTYpZ4PQ

I tried one where I reconnected immediately after detaching and was surprised to not see the problem. For the demo clip, I waited ~23 seconds before reconnecting and the problem was observed. (I edited out the wait time in-between commands to save time, but no other edits were made.)

I believe the profile error you got occurred during the import stage where the instance’s config is restored into the LXD database. This stage occurs after the instance has been unpacked to the storage pool as in cases where the backup is using optimized storage format we cannot access the instance’s backup.yaml earlier because it is stored inside the optimized binary blob from the storage layer.

This is why that error cannot be reported earlier in the process currently.

I also believe that is where the bug occurred that prevented future imports.
In order to prevent volumes from being incorrectly unmounted when there are ongoing operations occuring on that volume we maintain an internal mount counter that increments when an operation requests a volume be mounted (even if it is already mounted) and decrements it when the operation i done with the volume. If at that point the mount counter is zero then we also unmount the volume as there are no other ongoing operations happening on that volume.

In this case there is a bug in the ZFS driver that is fixed with this PR:

In this case the following steps were happening:

  1. Import unpacked the backup into new ZFS volumes.
  2. Volume was mounted ready for accessing the backup.yaml (incrementing the mount counter).
  3. Restoration process completes (from a storage driver perspective) and a revert hook function is returned that can be called to clean up the imported ZFS volumes if a future error occurs.
  4. DB import process begins by accessing the backup.yaml on the mounted volume and tries to recreate the DB records.
  5. DB import process fails due to missing profile.
  6. Revert hook is run removing the ZFS volumes (and by implication the actual ZFS mount is cleared).
  7. However because LXD wasn’t aware of this implicit unmount, the mount counter is left at 1.
  8. Subsequent import succeeds, and after DB import phase, we try to unmount the volume in order to resize it to the size specified in the newly created DB records.
  9. This fails with the In Use error due to the old mount counter.

The fix in the PR above is to include the UnmountVolume call that was part of the local reverter in CreateVolumeFromBackup in the returned revert hook function. This way if the DB import stage fails the volume is unmounted by LXD before its removed, and so LXD can decrement the mount counter.