Container with ZFS-backed LXD takes a while to start

When using LXD (LTS release v2.0.11) with a ZFS storage backend there’s a noticeable delay when starting a container. The delay does not happen for the dir backend. Is this expected behavior with the ZFS backend?

After a container starts, I see this error message in the LXD log:

INFO[01-11|00:26:47] Starting container                       action=start creation date=2018-01-10T23:12:07+0000 ephemeral=false name=trusty stateful=false
EROR[01-11|00:26:58] zfs mount failed                         driver=storage/zfs output="cannot mount 'lxd/containers/trusty': filesystem already mounted\n"
INFO[01-11|00:26:58] Started container                        action=start creation date=2018-01-10T23:12:07+0000 ephemeral=false name=trusty stateful=false

There’s roughly a ten second delay from when lxc start is ran and when the “zfs mount failed” error shows up and the container ultimately starts.

When I trace exec syscalls on the system (using bcc execsnoop), I see that /sbin/zfs mount is being called repeatedly up until the “zfs mount failed” log error message appears.

Given that the LXD error message says “filesystem already mounted”, I tried manually unmounting the container’s zfs mountpoint. After unmounting, there is no delay when starting the container and the zfs error message doesn’t happen. Stopping and starting the container again results in the delay as the zfs mountpoint resides even after the container has stopped.

Was that a manually built version of LXD 2.0.11?

What you’re describing is reminding me of a bug that we found when we packaged LXD 2.0.11 in Ubuntu and for which we cherry-picked a fix from the LXD stable-2.0 branch.

4a736e3e5c4d9b03b872f1ef5a9406c1bc3fffee is the commit in question.

It’s LXD 2.0.11 from Ubuntu 14.04 trusty-backports.

Looks like that’s the fix I’m missing. I was looking at the code from the git tag lxd-2.0.11, which is earlier than commit 4a736e3, and didn’t come across the fix. Does that mean there’s a difference between the versioned git tag reference, the source release, and the packaged Ubuntu version (e.g., trusty-backports)?

That fix is in the package though, so it must be something else then…

I’m not confident that the fix is in the package because the log doesn’t include the “.zfs” suffix: “cannot mount 'lxd/containers/trusty”.

well, I found the patch in debian/patches in the source of the package, so it’s there :slight_smile:

Now maybe we’re somehow dealing with something else.

ZFS seems to hang for me (and fail to start at all) on LXD 2.21 (5408, stable) on snapd 16-2.30 (3748, stable), Ubuntu 17.10, Linux 4.13.0-25. pcdummy was helping me on IRC but they thought there was something wrong with my ZFS loopback or something: LXD wasn’t starting, just hanging. I rebooted and that didn’t help. I used snap remove lxd and snap install lxd and lxd init and, again, with the ZFS backend, LXD just hangs after What IPv6 address should be used (CIDR subnet notation, “auto” or “none”) [default=auto]? (the last init question). dir works fine.

Feel free to split this into a new post if appropriate…

That’s very odd, is that on Ubuntu or some other distro with ZFS support?

Can you get a full process list (ps fauxww) when the hang occurs?

This is on Ubuntu 17.10.

Strangely, I just ran sudo snap remove lxd, sudo snap install lxd, lxd init and now the only available storage backends are dir, btrfs, ceph, lvm, no ZFS where there was before…

I’m now on LXD 2.21 (5522, stable) and core 16-2.30 (3887, stable) so there’s been patches since I encountered the bug.

Did you restart your system recently? If not, it could be that it’s simply incapable of loading the appropriate zfs kernel module on your system due to the kernel modules having been removed as part of a kernel update.

Yes I had restarted my system recently, but I’ve just restarted again and get the same four formats come up… this is a new issue and really probably should be split into a new thread. I’m pretty sure Ubuntu 17.10 does support ZFS?!

Yeah, it should… what happens if you do sudo modprobe zfs?
If that succeeds, does the ZFS option show up again?

Yes, and then I get LXD has been successfully configured. after reinstalling LXD and running lxd init again with the ZFS option selected this time around…so I can no longer reproduce the bug I came across before.

Edit: Though…I have been inputting zfs manually rather than leaving it as the ZFS default…and now I get error: Failed to create the ZFS pool: cannot create 'default': pool already exists which persists even on reinstall, how do I remove this?

Looks like you have an existing zpool called “default” on your system.
If that pool was created by LXD, reinstalling the snap and rebooting the system should get rid of it, if it’s somehow persisting, then it means it’s backed by some device/file outside of the snap.

In this case, the easiest way to deal with this would be to install the zfsutils-linux package on your system and then use zpool destroy default to remove it (probably worth triple checking that nothing actually needs it though :)).

Ah, I didn’t realize rebooting cleared that. OK, it’s cleared and this time I ran lxd init and just used the default settings and all was fine, hopefully the bug I came across was inadvertently fixed in a patch and wasn’t an edge case that people will come across again… Thanks for the help! :slight_smile: