I cannot find a config drive in the output of mount on the host. The only config-related is configfs on /sys/kernel/config type configfs (rw,relatime) which I think is unrelated?
The config.iso file does not exist (at /var/snap/lxd/common/lxd/virtual-machines/vm3/) when I get this error.
I believe we already have a function to determine the backing fs of a given path and have magic numbers for all architectures too, so should be pretty straightforward to special case files on zfs and avoid aio in that case.
I just tested on 4.15 kernel, have re-created the problem. Quick test of using aio=threads didn’t help.
I need to spend a bit of time getting my graphics drivers and networking working in 4.15 so I can work in that environment for a bit (in a resolution >800x600 :))
However that seemingly suffers from the kernel crash we saw on ZFS loopback files that froze my whole system.
There’s a useful table in the qemu manual:
cache=cache
cache is "none", "writeback", "unsafe", "directsync" or "writethrough" and controls how the host cache is
used to access block data. This is a shortcut that sets the cache.direct and cache.no-flush options (as in
-blockdev), and additionally cache.writeback, which provides a default for the write-cache option of block
guest devices (as in -device). The modes correspond to the following settings:
│ cache.writeback cache.direct cache.no-flush
─────────────┼─────────────────────────────────────────────────
writeback │ on off off
none │ on on off
writethrough │ off off off
directsync │ off on off
unsafe │ on off on
The default mode is cache=writeback.
We have it set to “cache=none” right now, and “aio=native”.
However we need to disable cache direct, which also requires setting “aio=threads”.
I will try writethrough, which seems to turn the most stuff off.
So using kernel 4.15.0-74-generic on a ZFS loop file, the only setting that works is:
cache = "unsafe"
aio = "threads"
Note: This isn’t just for the config drive iso, this is for all drives. Whilst if the cache.direct mode is enabled, it fails to start complaining about O_DIRECT, if you remove the config drive, then it still starts to boot and then qemu uses 100% CPU and the entire I/O system halts and I have to turn my PC off as the kernel crashes.
This is even with the sync=disabled in the ZFS LXD driver too.
@stgraber should I use these options when detecting ZFS backend?
I tried both cases, ZFS back by loop file and ZFS on a real device. They had the same outcome, relating to O_DIRECT missing from ZFS (v0.7.5-1ubuntu16.6, i.e. Ubuntu 18.04.3).
I edited the vm profile not to include the config drive, and I still get the same error.
$ lxc launch ubuntu:18.04 vm4 --vm --profile default --profile vm
Creating vm4
Starting vm4
Error: Failed to run: qemu-system-x86_64 -S -name vm4 -uuid b34ce8e1-5cf8-4374-58f1-a7647d9e1357 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-reboot -no-user-config -readconfig /var/snap/lxd/common/lxd/logs/vm4/qemu.conf -pidfile /var/snap/lxd/common/lxd/logs/vm4/qemu.pid -D /var/snap/lxd/common/lxd/logs/vm4/qemu.log -chroot /var/snap/lxd/common/lxd/virtual-machines/vm4 -runas lxd:
Try `lxc info --show-log local:vm4` for more info
$ lxc info --show-log local:vm4
qemu-system-x86_64:/var/snap/lxd/common/lxd/logs/vm4/qemu.conf:150: file system may not support O_DIRECT
qemu-system-x86_64:/var/snap/lxd/common/lxd/logs/vm4/qemu.conf:150: Could not open '/var/snap/lxd/common/lxd/virtual-machines/vm4/config.iso': Invalid argument
Can you double check with lxc config show v1 --expanded that the config disk isn’t being used, as the qemu config gets regenerated each time a VM is started, and if you’re still getting that error (specifically about the config.iso file), then the config disk is still being attached somehow.
I’m working on a fix for detecting ZFS in various guises and then switching to async I/O.
There is a typo on my behalf above. Although I created a vmnoconfig profile, I used the vm profile when creating vm4 (shell history). Therefore, let’s try again.
I am running Ubuntu 18.04.3 (v0.7.5-1ubuntu16.6) with ZFS on a real drive, on one computer.
The other computer with ZFS on a loop file is running on Ubuntu 19.10 (which already has a newer ZFS with O_DIRECT).
That’s interesting, on the 18.04.3 box, can you give the output of lxc info v1 --show-log when starting the VM without the config drive. I want to see if you still get the message about lack of direct I/O support or not. Thanks
Cool, so it seems that using direct I/O mode on ubuntu 18.04 with kernel 4.15 doesn’t cause issues as long as the disk is backed by a real device, and is not an image.
This means my patch should work fine, as it only detects root disks on loop file backed ZFS storage pools and disks attached pointing directly to files on ZFS filesystems.
If you run snap info lxd, you can see that LXD 3.19 has been released to the stable channel.
Here is how to switch back from candidate to stable. When I run snap refresh, I pressed Enter a few times in order to capture some of the snapd messages.
$ snap switch lxd --channel stable
"lxd" switched to the "stable" channel
$ snap refresh
Download snap "lxd" (13073) from channel "stable"
Mount snap "lxd" (13073)
Stop snap "lxd" services
Setup snap "lxd" (13073) security profiles
Start snap "lxd" (13073) services
Consider re-refresh of "lxd"
lxd 3.19 from Canonical✓ refreshed
$
I am marking this post as Solution so that others can view the instructions to switch back to stable.