Hmm. The VM stops immediately after reaching cloud-init. Trying to start it doesn’t do much. I see the qemu process being spawned and immediately going away. Any way to debug it?
Using the candidate snap on Ubuntu 19.10.
gabriel@rossak:~$ qemu-system-x86_64 --version
QEMU emulator version 4.0.0 (Debian 1:4.0+dfsg-0ubuntu9.2)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers
gabriel@rossak:~$ uname -a
Linux rossak 5.3.0-26-generic #28-Ubuntu SMP Wed Dec 18 05:37:46 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
I’d be happy to open an issue on github if you prefer.
The only thing in dmesg when I do an lxc start v1 is:
[230661.714954] lxdbr0: port 8(vethcf6f97a4) entered blocking state
[230661.714956] lxdbr0: port 8(vethcf6f97a4) entered disabled state
[230661.715083] device vethcf6f97a4 entered promiscuous mode
[230661.730502] lxdbr0: port 8(vethcf6f97a4) entered blocking state
[230661.730504] lxdbr0: port 8(vethcf6f97a4) entered forwarding state
[230663.456580] lxdbr0: port 8(vethcf6f97a4) entered disabled state
Not sure if it’s relevant, but the storage pool is btrfs.
Thanks, I’ll attempt to reproduce this on one of our test systems. It’s odd that btrfs doesn’t work but dir does as they are very similar in the way they store VMs.
Have moved this onto its own topic. I’ve been playing around with qemu a bit but haven’t yet found any reason why this is failing on btrfs and not on dir.
The instruction in question is likely coming from cloud-init growing the root partition on first boot, which apparently results in a truncate request making it to qemu (unsure why as the backing size shouldn’t change) and this hits an assert in qemu and crashes…
Then simply go to recovery and transformation options (experts only) by typing r and select load main partition table from disk (rebuilding backup). Save your changes, remove the loopback mapping:
sudo losetup -d /dev/loop30
and the VM should now start. This may just be treating the symptom though. Any reason why you are converting the qcow2 images to raw? You could just use qcow2 with qemu. You could even create COW root disks for VMs using the downloaded image as a backing file. That would work on any storage backend.
Right, so this looks like a qemu bug, we shouldn’t be hitting such an assert and it should be handling whatever size file on whatever underlying filesystem.
That being said, forcing our file size to align on 1k boundaries seems to be fixing the issue. It’s effectively as if qemu doesn’t accept a block device which isn’t using a traditional 512byte or higher block size.
We can workaround that in LXD. For ZFS we need to meet an 8k boundary anyway, so easiest is likely to change our logic to always round to the closest 8k boundary when creating a block or file that’s used to back a VM. That way we know it will work with all backends equally well.
The logic around our root.img handling is a bit sparse and may be incorrect so I’ve asked @tomp to look into it tomorrow morning, once that’s more solid, we can tweak that logic and our other storage drivers to always line up on 8k which should fix the issue regardless of qemu versions.
Also worth pointing out that if you’ve previously launched a VM on BTRFS then you’ll also need to delete the VM and the cached VM image snapshot after applying the patch.
lxc image ls -c Fda
lxc storage delete volume <poolname> image/<vm image fingerprint>
As it will have been created with the problematic size.
So directory and btrfs storage pools do not offer block device support directly (unlike lvm and zfs backends ) so to support vms we create a raw disk image file ontop of the respective filesystem. With btrfs if you are using a loop file backed storage pool too then you would end up with a VM image inside a loop back image, which wouldnt be optimal.