Error starting instance from custom published image

I’m experimenting with building a custom image from which I can start ephemeral instances. I created a VM following this tutorial. The resulting VM works exactly as expected. I can then publish it as an image.

When I try to start a new instance using that image, it gives me an error about the disk.

Creating the instance
Error: Create instance from image: Failed to run: /snap/lxd/19389/bin/sgdisk --move-second-header /var/snap/lxd/common/lxd/storage-pools/default/images/6ba8557f5c0f8ba1c8123a72324544505f9c415ad625019059bdaf6a0c395c3b/root.img: Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Warning! Error 25 reading partition table for CRC check!
Warning! One or more CRCs don't match. You should repair the disk!

Aborting write operation!

Should I be doing something else to my VM image before publishing it?

Edit: lxc init also throws the above error; so I can’t really ever get to the root.img disk file to try to repair it.

Hmm, this is odd. Can you confirm your VM was properly stopped prior to publishing it as an image?

Also, what’s the disk size of your VM?

@tomp can you look into this one?

I definitely stopped it before publishing. lxd publish refused at first because I had forgotten to stop the VM.

The original VM size is 16GB.

I think these are the exact steps I used to create the original VM. I’ll try this again to make sure.

lxc init rhel-8-3 --empty --vm
lxc config device override rhel-8-3 root size=16GB
lxc config device add rhel-8-3 cdrom disk source=/home/link/Downloads/rhel-8.3-x86_64-boot.iso
lxc start rhel-8-3

What storage backend are you using?

Can you check if you see the same errors when publishing a standard images:ubuntu/focal VM image?

So we know if its an image specific issue or general issue.

Looks like it’s btrfs.

Looks like the ubuntu/focal image publishes fine.

link@subpop:~% lxc launch images:ubuntu/focal --vm
Creating the instance
Instance name is: obliging-mollusk
Starting obliging-mollusk
link@subpop:~% lxc stop obliging-mollusk
link@subpop:~% lxc list
|            NAME             |  STATE  |          IPV4          |                      IPV6                       |      TYPE       | SNAPSHOTS |
| obliging-mollusk            | STOPPED |                        |                                                 | VIRTUAL-MACHINE | 0         |
| rhel-8-3                    | RUNNING | (eth0)   | fd42:a809:27f8:7ea4:e3b4:5619:5ca5:5ac4 (eth0)  | VIRTUAL-MACHINE | 0         |
|                             |         |                        | fd42:a809:27f8:7ea4:216:3eff:fec0:59c1 (eth0)   |                 |           |
link@subpop:~% lxc publish obliging-mollusk --alias test-publish
Instance published with fingerprint: d93bb7352734f5efa1df0509668507189df5bf068b103c01c2d57ccc679fe73b
link@subpop:~% lxc launch test-publish -e
Creating the instance
Instance name is: prepared-ox
Starting prepared-ox

How much free space do you have in df -h /var/snap/lxd/common/lxd?

Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  228G   14G  203G   7% /

Also, my default storage pool is on a separate physical device.

link@subpop:~% lxc storage info default
  description: ""
  driver: btrfs
  name: default
  space used: 36.92GB
  total space: 600.13GB

Ok, that shouldn’t be an issue then.

Basically when you publish, LXD will first create a temporary qcow2 file, then transfer all the data into that (file is stored directly in /var/snap/lxd/common/lxd/images), once that’s done, the qcow2 is wrapped into a tarball along with some metadata and compressed (again as a file in /var/snap/lxd/common/lxd/images).

Once all done, that image is then uncompressed as a new volume to create instances from, that uncompressed volume is stored in your storage pool.

What’s the size of the image in lxc image info?

Size: 1371.82MB

Does that roughly line up with how much space you had used in there?

Yea, the root filesystem has about 1.8GB used. So compressed, that seems reasonable.

I just did all the same steps creating a custom VM with a CentOS 8.3 ISO, and it throws the same error.

OK great that gives some steps I can try to reproduce myself. What partition layout are you using inside the VM?

The default, which I believe is LVM. I also had to set the boot priority of the cdrom disk to 1 to get it to boot off the ISO over the local disk.

Thanks, I am looking into this now.

OK I have recreated this, but appears to be a BTRFS specific issue.

I have successfully performed the following:

  • Created and installed centos 8.3 on a VM on dir storage pool.
  • Exported to a tarball and then reimported into a BTRFS storage pool.
  • Published the image from the BTRFS pool.
  • Created new VM from published image on a dir and ZFS storage pool.

But creating a VM from the published image on a BTRFS storage pool fails with the error you mentioned above. Investigating if this happens with any other storage pools.

OK I have a fix for this (LXD was truncating the optimized volume’s block file to the default 10GB block volume size). This explains why it works for dir (as no optimized volume is created) and other non-block-file based storage pools. But I need to check why our automated tests didn’t pick this up, as there is a dedicated check just for this scenario (publishing a VM as an image with a root disk larger than the default 10GB size), which runs for all storage drivers (including BTRFS) daily.