Creating a new container results in zfs rename error!?

datablitz7 · February 11, 2019, 4:49pm

After creating a second zfs pool and re-creating my containers on the new pool, everything seemed to go smoothly. I then exported/re-imported my images and deleted the old pool. So far so good.

Now, when I try to create a new container from the imported image, the operation fails due to a zfs error, as per below:

t=2019-02-11T16:25:23+0000 lvl=info msg="Creating container" ephemeral=false name=new_container
t=2019-02-11T16:25:24+0000 lvl=info msg="Created container" ephemeral=false name=new_container
t=2019-02-11T16:25:57+0000 lvl=eror msg="zfs rename failed: umount: /var/lib/lxd/storage-pools/pool00/containers/random_container: target is busy\n        (In some cases useful info about processes that\n         use the device is found by lsof(8) or fuser(1).)\ncannot unmount '/var/lib/lxd/storage-pools/pool00/containers/random_container': umount failed\n"
t=2019-02-11T16:25:57+0000 lvl=info msg="Deleting container" created=2019-02-11T16:25:23+0000 ephemeral=false name=new_container used=1970-01-01T00:00:00+0000
t=2019-02-11T16:25:57+0000 lvl=info msg="Deleted container" created=2019-02-11T16:25:23+0000 ephemeral=false name=new_container used=1970-01-01T00:00:00+0000

If i shutdown the referenced “random_container”, another one is reported.

I am buffled as to why zfs would be trying to rename an existing dataset, let alone why it fails.

Any ideas?

stgraber · February 11, 2019, 5:08pm

What kernel are you running?

ZFS does very weird things when an image is renamed or moved, effectively causing all existing containers to get unmounted and remounted…

We did kernel work a while back to make this less of an issue, not sure if you’ve got a recent enough kernel for that though.

datablitz7 · February 11, 2019, 5:10pm

I just tried to create a container from an ubuntu image and it worked fine, so I got suspicious with my image import. I found 2 things

It does not show up under my new_pool/images/

If I try to edit the image, the lxc monitor spits this:

 metadata:
   context:
     ip: '@'
     method: PUT
     url: /1.0/images/db29f6f1afde60a2d86ff554712baa0a92fbc4bed2f2ff2910213e4e8bde4bde
   level: dbug
   message: handling
 timestamp: "2019-02-11T17:07:08.788745941Z"
 type: logging


 metadata:
   context: {}
   level: dbug
   message: 'Database error: &errors.errorString{s:"sql: no rows in result set"}'
 timestamp: "2019-02-11T17:07:08.795259536Z"
 type: logging

Kernel is 4.4.0-138-generic (Ubuntu 16.04.5 running LXD 3.0.3 from xenial-backports)

stgraber · February 11, 2019, 5:16pm

Ok, you’d indeed need the 4.15 kernel from bionic (hwe kernel) to have the kernel change that helps with this.

Can you show zfs list -t all?

Your symptoms sound like your have the right image in deleted/images and that LXD is attempting to move it back to images, hitting that annoying ZFS behavior…

If that’s the case, you pretty much have two solutions:

Stop all containers, then launch, then restart all containers, that should take care of that image for good
Upgrade to the HWE kernel (4.15) and reboot, which should then help ZFS deal with that weird behavior

datablitz7 · February 11, 2019, 6:25pm

Spot on! My image shows up under pool00/deleted/images.

I was hoping to avoid reboot/service interruption but it looks like it’s the only way.

datablitz7 · February 12, 2019, 4:28pm

So, bad news. Restarted on kernel 4.15.0-45 and still same behavior.

If I try to destroy the dataset after deleting the image, I get:

zfs destroy pool00/deleted/images/db29f6f1afde60a2d86ff554712baa0a92fbc4bed2f2ff2910213e4e8bde4bde
cannot destroy 'pool00/deleted/images/db29f6f1afde60a2d86ff554712baa0a92fbc4bed2f2ff2910213e4e8bde4bde': filesystem has children
use '-r' to destroy the following datasets:
pool00/deleted/images/db29f6f1afde60a2d86ff554712baa0a92fbc4bed2f2ff2910213e4e8bde4bde@readonly

Any thoughts?

datablitz7 · February 12, 2019, 5:25pm

This actually fixed it. Unfortunately the kernel on it’s own didn’t do it.