Isn't the virtual machine's disk dynamically growing?

itviewer · September 29, 2024, 9:37am

The concept I learned when using VMware is that the size allocated to the virtual machine disk is only a logical concept. For example, setting the disk size to 100GB, but on the physical disk, the storage occupied by the virtual machine is only the size of its actual content, not 100GB.

When I used incus to create a virtual machine, I specified a 20GB disk size. I saw a 20GB root.img in the storage pool. Doesn’t the physical disk actually occupied by the virtual machine grow dynamically based on the content?

candlerb · September 29, 2024, 11:56am

That’s called “thin provisioning” - allocating space on demand, whenever a block is first written to.

The fact that its “size” is 20GB does not mean it consumes 20GB of disk space. It could be a sparse file, with unallocated space within it.

Try these commands:

du -k root.img
ls -s root.img

du -k gives you the used space in KB (or to be entirely accurate KiB - kibibytes). ls -s prints the allocated size, in blocks (which I believe are 1024 bytes by default).

itviewer · September 29, 2024, 1:15pm

Thanks, I learned a lot and now I am trying to understand the difference between the following commands

df -h in VMs
ls -sh
ls -lh
du -sh
qemu-img info

My lack of storage knowledge let ls -lh deceive me!

candlerb · September 29, 2024, 1:36pm

Inside a VM, all it sees is a “block device” - a virtual hard drive. It has a fixed number of sectors and a fixed capacity, and the operations permitted are just “read block N” and “write block N”.

Inside that hard drive, it creates a filesystem, which is an ordered way of storing files and directories. It will initially write some blocks to carve up the space (metadata), but most of the rest is untouched. Think about this like buying a blank notebook, and writing numbers at the bottom of the pages. You’re only writing to a small part of each page, but you’ve organized the pages so you can find stuff more easily when you need to.

Now, inside the VM:

blockdev --getsize64 /dev/vda

will give you the full size of the virtual disk (assuming it appears as /dev/vda; it might be /dev/sda or something else).

df [-h] tells you how much space the filesystem is currently using for files and metadata. It doesn’t count unallocated blocks, nor blocks which were written but are no longer needed because the files were rm’d.

Next, think about it from the point of the VM host.

It needs to provide this virtual hard drive to the VM using host storage. There are lots of ways it could do this - it could map the VM’s virtual disk to a partition or logical volume, a raw file, a specially formatted disk file like qcow2 or vmdk, and so on. All it needs to ensure is that when the VM reads or writes block number N, it performs the corresponding action on the underlying storage.

With thin provisioning, no space is allocated for blocks that haven’t yet been written. It’s assumed that the block devices has an initial state of all zeros. If the VM reads block N before it has been written, the hypervisor returns a block of zeros. If the VM writes block N, and hasn’t written to it before, then if necessary the hypervisor allocates storage on the host - by allocating part of a sparse file, or by updating a qcow2 or vmdk file etc.

HTH!

itviewer · September 29, 2024, 2:14pm

Now I understand that for raw disk files, ls and du are different, and qcow2 format does not seem to have such confusion

candlerb · September 30, 2024, 10:27am

That’s correct: qcow2 files (and vmdk and vdi) are specifically designed for VM images and have their own header and mapping between block number and location within the file, without holes.

Since they are plain files they are easy to copy between systems without worrying about losing the “sparseness”.