I’m experimenting with building a custom image from which I can start ephemeral instances. I created a VM following this tutorial. The resulting VM works exactly as expected. I can then publish it as an image.
When I try to start a new instance using that image, it gives me an error about the disk.
Creating the instance
Error: Create instance from image: Failed to run: /snap/lxd/19389/bin/sgdisk --move-second-header /var/snap/lxd/common/lxd/storage-pools/default/images/6ba8557f5c0f8ba1c8123a72324544505f9c415ad625019059bdaf6a0c395c3b/root.img: Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
Warning! Error 25 reading partition table for CRC check!
Warning! One or more CRCs don't match. You should repair the disk!
Aborting write operation!
Should I be doing something else to my VM image before publishing it?
Edit: lxc init also throws the above error; so I can’t really ever get to the root.img disk file to try to repair it.
Basically when you publish, LXD will first create a temporary qcow2 file, then transfer all the data into that (file is stored directly in /var/snap/lxd/common/lxd/images), once that’s done, the qcow2 is wrapped into a tarball along with some metadata and compressed (again as a file in /var/snap/lxd/common/lxd/images).
Once all done, that image is then uncompressed as a new volume to create instances from, that uncompressed volume is stored in your storage pool.
OK I have recreated this, but appears to be a BTRFS specific issue.
I have successfully performed the following:
Created and installed centos 8.3 on a VM on dir storage pool.
Exported to a tarball and then reimported into a BTRFS storage pool.
Published the image from the BTRFS pool.
Created new VM from published image on a dir and ZFS storage pool.
But creating a VM from the published image on a BTRFS storage pool fails with the error you mentioned above. Investigating if this happens with any other storage pools.
OK I have a fix for this (LXD was truncating the optimized volume’s block file to the default 10GB block volume size). This explains why it works for dir (as no optimized volume is created) and other non-block-file based storage pools. But I need to check why our automated tests didn’t pick this up, as there is a dedicated check just for this scenario (publishing a VM as an image with a root disk larger than the default 10GB size), which runs for all storage drivers (including BTRFS) daily.