Error starting instance from custom published image

subpop · February 23, 2021, 6:02pm

I’m experimenting with building a custom image from which I can start ephemeral instances. I created a VM following this tutorial. The resulting VM works exactly as expected. I can then publish it as an image.

When I try to start a new instance using that image, it gives me an error about the disk.

Creating the instance
Error: Create instance from image: Failed to run: /snap/lxd/19389/bin/sgdisk --move-second-header /var/snap/lxd/common/lxd/storage-pools/default/images/6ba8557f5c0f8ba1c8123a72324544505f9c415ad625019059bdaf6a0c395c3b/root.img: Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Warning! Error 25 reading partition table for CRC check!
Warning! One or more CRCs don't match. You should repair the disk!

Aborting write operation!

Should I be doing something else to my VM image before publishing it?

Edit: lxc init also throws the above error; so I can’t really ever get to the root.img disk file to try to repair it.

stgraber · February 23, 2021, 6:04pm

Hmm, this is odd. Can you confirm your VM was properly stopped prior to publishing it as an image?

Also, what’s the disk size of your VM?

@tomp can you look into this one?

subpop · February 23, 2021, 6:05pm

I definitely stopped it before publishing. lxd publish refused at first because I had forgotten to stop the VM.

The original VM size is 16GB.

subpop · February 23, 2021, 6:12pm

I think these are the exact steps I used to create the original VM. I’ll try this again to make sure.

lxc init rhel-8-3 --empty --vm
lxc config device override rhel-8-3 root size=16GB
lxc config device add rhel-8-3 cdrom disk source=/home/link/Downloads/rhel-8.3-x86_64-boot.iso
lxc start rhel-8-3

stgraber · February 23, 2021, 6:13pm

What storage backend are you using?

tomp · February 23, 2021, 6:14pm

Can you check if you see the same errors when publishing a standard images:ubuntu/focal VM image?

So we know if its an image specific issue or general issue.

subpop · February 23, 2021, 6:14pm

Looks like it’s btrfs.

subpop · February 23, 2021, 6:29pm

Looks like the ubuntu/focal image publishes fine.

link@subpop:~% lxc launch images:ubuntu/focal --vm
Creating the instance
Instance name is: obliging-mollusk
Starting obliging-mollusk
link@subpop:~% lxc stop obliging-mollusk
link@subpop:~% lxc list
+-----------------------------+---------+------------------------+-------------------------------------------------+-----------------+-----------+
|            NAME             |  STATE  |          IPV4          |                      IPV6                       |      TYPE       | SNAPSHOTS |
+-----------------------------+---------+------------------------+-------------------------------------------------+-----------------+-----------+
| obliging-mollusk            | STOPPED |                        |                                                 | VIRTUAL-MACHINE | 0         |
+-----------------------------+---------+------------------------+-------------------------------------------------+-----------------+-----------+
| rhel-8-3                    | RUNNING | 10.65.208.199 (eth0)   | fd42:a809:27f8:7ea4:e3b4:5619:5ca5:5ac4 (eth0)  | VIRTUAL-MACHINE | 0         |
|                             |         |                        | fd42:a809:27f8:7ea4:216:3eff:fec0:59c1 (eth0)   |                 |           |
+-----------------------------+---------+------------------------+-------------------------------------------------+-----------------+-----------+
link@subpop:~% lxc publish obliging-mollusk --alias test-publish
Instance published with fingerprint: d93bb7352734f5efa1df0509668507189df5bf068b103c01c2d57ccc679fe73b
link@subpop:~% lxc launch test-publish -e
Creating the instance
Instance name is: prepared-ox
Starting prepared-ox
link@subpop:~%

stgraber · February 23, 2021, 6:41pm

How much free space do you have in df -h /var/snap/lxd/common/lxd?

subpop · February 23, 2021, 6:44pm

Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  228G   14G  203G   7% /

Also, my default storage pool is on a separate physical device.

link@subpop:~% lxc storage info default
info:
  description: ""
  driver: btrfs
  name: default
  space used: 36.92GB
  total space: 600.13GB

stgraber · February 23, 2021, 6:59pm

Ok, that shouldn’t be an issue then.

Basically when you publish, LXD will first create a temporary qcow2 file, then transfer all the data into that (file is stored directly in /var/snap/lxd/common/lxd/images), once that’s done, the qcow2 is wrapped into a tarball along with some metadata and compressed (again as a file in /var/snap/lxd/common/lxd/images).

Once all done, that image is then uncompressed as a new volume to create instances from, that uncompressed volume is stored in your storage pool.

What’s the size of the image in lxc image info?

subpop · February 23, 2021, 7:00pm

Size: 1371.82MB

stgraber · February 23, 2021, 7:01pm

Does that roughly line up with how much space you had used in there?

subpop · February 23, 2021, 7:02pm

Yea, the root filesystem has about 1.8GB used. So compressed, that seems reasonable.

subpop · February 23, 2021, 7:17pm

I just did all the same steps creating a custom VM with a CentOS 8.3 ISO, and it throws the same error.

tomp · February 23, 2021, 8:04pm

OK great that gives some steps I can try to reproduce myself. What partition layout are you using inside the VM?

subpop · February 23, 2021, 8:06pm

The default, which I believe is LVM. I also had to set the boot priority of the cdrom disk to 1 to get it to boot off the ISO over the local disk.

tomp · February 24, 2021, 12:14pm

Thanks, I am looking into this now.

tomp · February 24, 2021, 2:54pm

OK I have recreated this, but appears to be a BTRFS specific issue.

I have successfully performed the following:

Created and installed centos 8.3 on a VM on dir storage pool.
Exported to a tarball and then reimported into a BTRFS storage pool.
Published the image from the BTRFS pool.
Created new VM from published image on a dir and ZFS storage pool.

But creating a VM from the published image on a BTRFS storage pool fails with the error you mentioned above. Investigating if this happens with any other storage pools.

tomp · February 24, 2021, 4:24pm

OK I have a fix for this (LXD was truncating the optimized volume’s block file to the default 10GB block volume size). This explains why it works for dir (as no optimized volume is created) and other non-block-file based storage pools. But I need to check why our automated tests didn’t pick this up, as there is a dedicated check just for this scenario (publishing a VM as an image with a root disk larger than the default 10GB size), which runs for all storage drivers (including BTRFS) daily.

github.com

lxc/lxc-ci/blob/master/bin/test-lxd-vm#L146-L166


echo "==> Checking copied VM root disk size is 11GB"
lxc exec v2 -- df -B1000000000 | grep sda2 | grep 11
lxc delete -f v2
echo "==> Publishing larger VM"
lxc publish v1 --alias vmbig
lxc delete -f v1
lxc storage set "${poolName}" volume.size 9GB
echo "==> Check VM create fails when image larger than volume.size"
! lxc init vmbig v1 --vm -s "${poolName}" || false
echo "==> Check VM create succeeds when no volume.size set"
lxc storage unset "${poolName}" volume.size
lxc init vmbig v1 --vm -s "${poolName}"
lxc start v1
sleep 90
lxc info v1
echo "==> Checking new VM root disk size is 11GB"
lxc exec v1 -- df -B1000000000 | grep sda2 | grep 11