VM failure on btrfs

Gabriel-Adrian_Samfi · January 17, 2020, 2:17pm

Would you mind posting the “vm” profile you use?

I tried the following:

lxc init ubuntu:18.04 v1 --vm
lxc config device add v1 config disk source=cloud-init:config
lxc start v1

The instance starts up, gets to this point:

https://paste.ubuntu.com/p/3vXyrrYN3n/

and stops. Any attempt to start it returns with success, but the qemu process dies shortly after, and the VM shows as “STOPPED” in lxc list

tomp · January 17, 2020, 4:24pm

This is what I’m using currently, it sets the login user/pass to “ubuntu”:

lxc profile show vm
config:
  user.user-data: |
    #cloud-config
    ssh_pwauth: yes

    users:
      - name: ubuntu
        passwd: "$6$s.wXDkoGmU5md$d.vxMQSvtcs1I7wUG4SLgUhmarY7BR.5lusJq1D9U9EnHK2LJx18x90ipsg0g3Jcomfp0EoGAZYfgvT22qGFl/"
        lock_passwd: false
        groups: lxd
        shell: /bin/bash
        sudo: ALL=(ALL) NOPASSWD:ALL
description: Default LXD profile
devices:
  config:
    source: cloud-init:config
    type: disk
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: vm
used_by:
- /1.0/instances/v1?project=test

stgraber · January 17, 2020, 5:58pm

Oh, good point, I’ll update the example to show the vm profile too.

Gabriel-Adrian_Samfi · January 18, 2020, 8:10pm

Hmm. The VM stops immediately after reaching cloud-init. Trying to start it doesn’t do much. I see the qemu process being spawned and immediately going away. Any way to debug it?

Using the candidate snap on Ubuntu 19.10.

gabriel@rossak:~$ qemu-system-x86_64 --version
QEMU emulator version 4.0.0 (Debian 1:4.0+dfsg-0ubuntu9.2)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

gabriel@rossak:~$ uname -a
Linux rossak 5.3.0-26-generic #28-Ubuntu SMP Wed Dec 18 05:37:46 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I’d be happy to open an issue on github if you prefer.

stgraber · January 18, 2020, 8:12pm

Anything useful looking in dmesg or /var/snap/lxd/common/lxd/logs/lxd.log?

Gabriel-Adrian_Samfi · January 18, 2020, 8:15pm

Nothing in /var/snap/lxd/common/lxd/logs/lxd.log.

The only thing in dmesg when I do an lxc start v1 is:

[230661.714954] lxdbr0: port 8(vethcf6f97a4) entered blocking state
[230661.714956] lxdbr0: port 8(vethcf6f97a4) entered disabled state
[230661.715083] device vethcf6f97a4 entered promiscuous mode
[230661.730502] lxdbr0: port 8(vethcf6f97a4) entered blocking state
[230661.730504] lxdbr0: port 8(vethcf6f97a4) entered forwarding state
[230663.456580] lxdbr0: port 8(vethcf6f97a4) entered disabled state

Not sure if it’s relevant, but the storage pool is btrfs.

Gabriel-Adrian_Samfi · January 18, 2020, 8:57pm

Apparently it works if I choose dir as the storage pool type. If I choose btrfs, this happens:

https://asciinema.org/a/a8jjkusE4cZJSe55mSGQb5QNO

stgraber · January 18, 2020, 10:10pm

Thanks, I’ll attempt to reproduce this on one of our test systems. It’s odd that btrfs doesn’t work but dir does as they are very similar in the way they store VMs.

stgraber · January 18, 2020, 11:14pm

Reproduced the issue, will investigate, thanks.

stgraber · January 18, 2020, 11:20pm

qemu-system-x86_64: block/io.c:1871: bdrv_co_write_req_prepare: Assertion `end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE' failed.

stgraber · January 19, 2020, 12:08am

Have moved this onto its own topic. I’ve been playing around with qemu a bit but haven’t yet found any reason why this is failing on btrfs and not on dir.

The instruction in question is likely coming from cloud-init growing the root partition on first boot, which apparently results in a truncate request making it to qemu (unsure why as the backing size shouldn’t change) and this hits an assert in qemu and crashes…

Gabriel-Adrian_Samfi · January 19, 2020, 12:58am

It seems that the backup GPT header is corrupted. The VM finally booted after I recovered it by doing:

root@rossak:~$ sudo losetup -f 
/dev/loop30
root@rossak:~$ sudo losetup /dev/loop30 /var/snap/lxd/common/lxd/virtual-machines/v1/root.img
root@rossak:~$ sudo gdisk /dev/loop30

Then simply go to recovery and transformation options (experts only) by typing r and select load main partition table from disk (rebuilding backup). Save your changes, remove the loopback mapping:

sudo losetup -d /dev/loop30

and the VM should now start. This may just be treating the symptom though. Any reason why you are converting the qcow2 images to raw? You could just use qcow2 with qemu. You could even create COW root disks for VMs using the downloaded image as a backing file. That would work on any storage backend.

stgraber · January 19, 2020, 9:40am

Yeah, the corruption is coming from the qemu crash happening during the resize operation done by cloud-init.

We’re storing as raw as it’s slightly faster and is the format we need to use on block based storage drivers like zfs, lvm and ceph anyway.

stgraber · January 19, 2020, 8:22pm

Right, so this looks like a qemu bug, we shouldn’t be hitting such an assert and it should be handling whatever size file on whatever underlying filesystem.

That being said, forcing our file size to align on 1k boundaries seems to be fixing the issue. It’s effectively as if qemu doesn’t accept a block device which isn’t using a traditional 512byte or higher block size.

We can workaround that in LXD. For ZFS we need to meet an 8k boundary anyway, so easiest is likely to change our logic to always round to the closest 8k boundary when creating a block or file that’s used to back a VM. That way we know it will work with all backends equally well.

The logic around our root.img handling is a bit sparse and may be incorrect so I’ve asked @tomp to look into it tomorrow morning, once that’s more solid, we can tweak that logic and our other storage drivers to always line up on 8k which should fix the issue regardless of qemu versions.

Falstaff · January 20, 2020, 3:55am

I confirm this bug; when growroot is disabled in cloud-init, the VM is initiated without any issues:

#cloud-config
write_files:
  - content: |
        hello
    path: /etc/growroot-disabled 
growpart:
    mode: auto
    devices: ["/"]
    ignore_growroot_disabled: false

tomp · January 20, 2020, 9:19am

The pull-request for this change is here: https://github.com/lxc/lxd/pull/6734

Gabriel-Adrian_Samfi · January 20, 2020, 10:23am

Awesome! Great work everyone!

tomp · January 20, 2020, 2:22pm

Also worth pointing out that if you’ve previously launched a VM on BTRFS then you’ll also need to delete the VM and the cached VM image snapshot after applying the patch.

lxc image ls -c Fda
lxc storage delete volume <poolname> image/<vm image fingerprint>

As it will have been created with the problematic size.

tomp · January 25, 2020, 7:22pm

So directory and btrfs storage pools do not offer block device support directly (unlike lvm and zfs backends ) so to support vms we create a raw disk image file ontop of the respective filesystem. With btrfs if you are using a loop file backed storage pool too then you would end up with a VM image inside a loop back image, which wouldnt be optimal.

laralar · March 12, 2020, 7:09pm

Not sure if its related or not… but I posted a similar bug related with alignment from MAAS, creating KVM on ZFS.

https://bugs.launchpad.net/bugs/1858201