Cant start virtual machine, disk quota exceeded. Cant grow disk quota or attach new storage pool either

shvuvnkldvu · June 16, 2022, 10:11pm

I am using debian 11 as the host and have installed lxd via snap using the btrfs storage type since that was default when I initially ran lxd init.

When I try to run an ubuntu virtual machine, I get the following error:

$ lxc start ubuntu
Error: open /var/snap/lxd/common/lxd/virtual-machines/ubuntu/config/server.crt: disk quota exceeded
Try `lxc info --show-log ubuntu` for more info

when I run more info I get:

Name: ubuntu
Status: STOPPED
Type: virtual-machine
Architecture: x86_64
Created: REDACTED
Last Used: REDACTED
Error: open /var/snap/lxd/common/lxd/logs/ubuntu/qemu.log: no such file or director

when I try to edit the disk size from 15GB to something larger, I get the following error:

$ lxc config edit ubuntu
Config parsing error: Failed to update device "root": Failed resizing disk image "/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/ubuntu/root.img" to size 20000006144: Failed to create sparse file /var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/ubuntu/root.img: truncate /var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/ubuntu/root.img: disk quota exceeded
Press enter to open the editor again or ctrl+c to abort change

Also note that just opening the config without making any changes gives me an error:

$ lxc config edit ubuntu
Config parsing error: Failed to write backup file: Failed to create file "/var/snap/lxd/common/lxd/virtual-machines/ubuntu/backup.yaml": open /var/snap/lxd/common/lxd/virtual-machines/ubuntu/backup.yaml: disk quota exceeded
Press enter to open the editor again or ctrl+c to abort change

I dont really understand how all this works, but after skimming the documentation again to try and find info on how to increase the disk size, I tried attaching a new storage volume with:

$ lxc storage create tst3 btrfs
$ lxc storage volume create tst3 tstblockvol1 --type=block
$ lxc storage volume attach tst3 tstblockvol1 ubuntu
Error: Failed to write backup file: Failed to create file "/var/snap/lxd/common/lxd/virtual-machines/ubuntu/backup.yaml": open /var/snap/lxd/common/lxd/virtual-machines/ubuntu/backup.yaml: disk quota exceeded

I’m very new to lxd, and don’t work in a tech field and am pretty out of my depth, I just wanted something more flexible / powerful then virtualbox and I could not get virt machine manager working the way I wanted it to. so any advice / suggestions would be appreciated.

tomp · June 16, 2022, 10:17pm

Please see the 4th bullet point here

About setting the size.state setting on the root disk when using btrfs (or even better don’t use btrfs for VMS).

tomp · June 16, 2022, 10:19pm

Also check the storage pool size and check you’ve not just run out of disk space in the pool

shvuvnkldvu · June 16, 2022, 10:30pm

Please see the 4th bullet point here

I dont understand any of that. I dont work in a technology feild. I see that it says " please ensure that the instance root disk’s size.state property is set to 2x the size of the root disk’s size to allow all blocks in the disk image file to be rewritten without reaching the qgroup quota.", but I dont know what to do with that information. there is no “size.state” entry in the config of that particular virtual machine, and I am not sure what to set it to or how to change it.

Also check the storage pool size and check you’ve not just run out of disk space in the pool

I have not. or at least, when I run ‘lxc storage edit default’ and jack up the size of the disk it does nothing to change the error message.

or even better don’t use btrfs for VMS).

Apologies for using the default configuration. If its not recommended to use the default configuration, I would suggest changing the default configuration.

tomp · June 17, 2022, 8:04am

In LXD a VM is made up of 2 volumes; a small state volume (that contains config and metadata about the VM), and the data volume (that contains the VM’s root disk).

By default in LXD, BTRFS storage pools don’t have per filesystem volume size limits, so the state volume won’t be size limited. The VM root volume itself will be an image file sized to the pool’s volume.size setting or if not set will default to 10GiB.

If you grow the VM’s root disk size after creation using:

lxc config device set <instance> root size=20GiB

Then this will resize the root image file to the specified size, and enable BTRFS quotas with a quota set at the specified size + the state volume’s default size (which is 100MiB).

However because of the nature of BTRFS’ quota tracking (as explained in that 4th bullet point) it is possible for the BTRFS quota to be exceeded due to the way that BTRFS tracks the blocks that are changing inside the VM’s root image file. The absolute worst case scenario is that it could actually keep track of up to 2x the allowed quota when using a VM, so we encourage setting the state volume size to 2x the specified root volume size to be safe.

This can be done by doing:

lxc config device set <instance> root size.state=40GiB

This will cause the BTRFS quota to be set at 20+40GiB to allow quota to track block changes in the VM’s root disk.

This is all rather complicated, and this is why we actually suggest not using BTRFS for VMs.

On your point regarding LXD suggesting BTRFS for the default. The default pool type is actually ZFS, but its only suggested if your system has it available. The “next best” (although we’ve seen now that “best” is somewhat nuanced based on your workload and preferences) is suggested as BTRFS, because it is the next most flexible and efficient for containers (which LXD started out only supporting).

In both cases LXD will by default create BTRFS and ZFS pools on a fixed size loop file. This is to allow users to get up and running quickly, without having to provide a dedicated disk or partition for their instances. This works great for testing or development, however stacking filesystems ontop of the host’s filesystem is not as efficient, so for production workloads we recommend not using loop-backed pools, see:

Please can I see the output of the follow commands to better ascertain the state of your system:

lxc storage ls
lxc storage show default
lxc config show <instance> --expanded

Thanks

shvuvnkldvu · June 19, 2022, 12:32pm

Thank you for the explanation, but It still does not make a whole lot of sense but I dont think this is the time or place to try to educate me on what a volume is. I think i got the jist of it though.

Please can I see the output of the follow commands to better ascertain the state of your system:

Its difficult since i would need to redact anything that could identify me and there are to many options that I dont recognize.

Regardless, ran the command you suggested

lxc config device set ubuntu root size.state=40GiB

which did not work, so I just tried upping the size to 60, and now at least I can edit the config without getting errors. the size parts now read:

size: 20GB
size.state: 80GiB

but when I attempt to start the VM, it starts but wont allow me to connect with lxc exec ubuntu bash, just spits out Error: LXD VM agent isn't currently running. googling for that error spits out only the lxd github repo, and no troubleshooting help.

Attempting to use the spice window thing just shows the lxd logo forever. I also cant stop the VM unless I use the -f flag. Jacking up the size to 80 did not help either.

tomp · June 19, 2022, 12:56pm

Which image did you use to create the VM?

tomp · June 19, 2022, 12:57pm

A volume is just a real or virtual hard disk

EastonRoyce · November 3, 2022, 8:01am

Hello,

I’m a long time reader, first time poster. I got some valuable help from this article today, so I’ve decided to pay the deed forward.

I evidently ran into the same issue experienced by the OP today. Here are my findings.

Like the OP, I was attempting to use BTRFS in production (I also didn’t know ZFS was the default, so long as it was installed, or that BTRFS was a runner-up) and had also used this command to stop a VM a few times:

lxc stop -f <instancename>

What I didn’t realise was that running that command to effectively yank its virtual power cord, coupled with the fact that the file system in that VM had gone read-only (due to the aforementioned issue of using BTRFS), had caused the VM file system to become corrupt. In other words, it couldn’t start.

I had already applied this command:

lxc storage set default btrfs.mount_options=compress-force

Where ‘default’ is the name of my storage (hey, no judging).

I had also applied this command:

lxc config device set <instancename> root size.state=50GiB

Where 50GiB is twice the volume size of 25GiB

(as an aside, this command is the command needed to rescue the VM, you can do both, but this command will help on a per case basis without the performance impact of forceful compression across the entire BTRFS volume - we don’t all have M.2 drives or SSDs).

lxc config device set <instancename> root size.state=50GiB

(as a further aside, if you still want to push forward with BTRFS, I’d suggest starting fresh and applying the following command before deploying any VMs).

lxc storage set default btrfs.mount_options=compress-force

I didn’t stick around (I’m sorry) to determine if the command retrospectively fixes existing virtual machines. Perhaps someone else can chime in on this? Also, containers don’t seem to have these issues. Just VMs.

Back on track…

Still, when I started the VM, it ‘started’ but never became available.

lxc start <instancename>

Where ‘blankIPv4’ is the undetected IPv4 Address
Where ‘blankIPv6’ is the undetected IPv6 Address

lxc shell <instancename>

As per the OP, I got this message:

Error: LXD VM agent isn't currently running

The LXD VM agent isn’t running. It can’t run because the OS can’t start. The OS can’t start because the file system is corrupt. The file system is corrupt because it was shutdown uncleanly. It was shutdown uncleanly due to the disk being read only. The disk was read-only because the underlying disk quota had been exceeded by the meta data!

The rest, as they say is ~~history~~ well documented here: Linux Containers - LXD - Has been moved to Canonical

(read the fourth bullet point!)

These two posts also provided some valuable insight:

https://github.com/lxc/lxd/issues/9124

I made my discovery when I used the following command to get a virtual/serial terminal to my VM:

lxc console <instancename>

I pressed the enter key 1-2 times and discovered the VM was at the following boot prompt:

(initramfs)
(initramfs)

At this point I had already used the following command to take a backup of my VM from the host:

lxc export <instancename> instancename-backup.tar.gz

Without much more to lose, I tried to exit the prompt with:

(initramfs) exit

(I was expecting to reboot and return to the same prompt, I was hoping it would boot though!) I got the following output:

rootfs contains a file system with errors, check forced.
rootfs:
Inodes that were part of a corrupted orphan linked list found.

rootfs: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
fsck exited with status code 4
The root filesystem on /dev/sda2 requires a manual fsck

BusyBox v1.30.1 (Ubuntu 1:1.30.1-7ubuntu3) built-in shell (ash)
Enter 'help' for a list of built-in commands.

I know this one all too well. I followed the instructions:

(initramfs) fsck /dev/sda2

I pressed the ‘y’ key a few times:

fsck from util-linux 2.37.2
e2fsck 1.46.5 (30-Dec-2021)
rootfs contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix<y>? yes
Inode 18643 was part of the orphaned inode list. FIXED.
Inode 18666 was part of the orphaned inode list. FIXED.
Inode 18675 was part of the orphaned inode list. FIXED.
Inode 18704 was part of the orphaned inode list. FIXED.
Inode 266152 extent tree (at level 1) could be narrower. Optimize<y>? yes
Pass 1E: Optimizing extent trees
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (1705636, counted=1128276).
Fix<y>? yes
Inode bitmap differences: -18632 -18643 -18666 -18675 -18704
Fix<y>? yes
Free inodes count wrong for group #1 (2036, counted=2041).
Fix<y>? yes
Free inodes count wrong (2749440, counted=2744590).
Fix<y>? yes

rootfs: ***** FILE SYSTEM WAS MODIFIED *****
rootfs: 228434/2973024 files (0.5% non-contiguous), 4949379/6077655 blocks
(initramfs)

I attempted another exit:

(initramfs) exit

The VM started and came back online!

rootfs: clean, 228434/2973024 files, 4949379/6077655 blocks

Ubuntu 22.04.1 LTS <instancename> ttyS0

<instancename> login:
Password:

From here I was able to take a full backup of my data (I’m using virtualmin), and start again. I wanted a fresh start on ZFS, so after backing up all the data in my VMs and exporting them onto the host, I removed lxd:

sudo snap remove lxd

2022-11-03T12:50:11+11:00 INFO Waiting for "snap.lxd.daemon.service" to stop.
Save data of snap "lxd" in automatic snapshot set #3
lxd removed

This took a little while. Then I listed the saved snapshot (I didn’t want a rerun of what I had just experienced when I reinstalled LXD).

sudo snap saved

Set Snap Age Version Rev Size Notes
3 lxd 27.1m 5.7-c62733b 23889 13.9GB auto

I noted the number of the snapshot (3 in my case) and issued the following command:

sudo snap forget 3

Snapshot #3 forgotten.

I checked that it was really gone:

sudo snap saved

No snapshots found.

I’m using Pop!_OS (), so I needed to install ZFS and reboot.

sudo apt install zfsutils-linux zfs-dkms

The zfs-dkms package is especially important. Many articles on the web just specify zfsutils-linux when you Google: how to install zfs on Pop!_OS.

You will receive a notice with regards to the license model of the kernel vs ZFS. Be mindful and thoughtful and then continue.

Once the installation was complete, I rebooted and installed LXD again.

sudo snap install lxd --channel=latest/stable

lxd 5.7-c62733b from Canonical✓ installed

I rebooted once more and then ran the lxd init utility.

lxd init

ZFS was available! Yay!

Would you like to use LXD clustering? (yes/no) [default=no]: no
Do you want to configure a new storage pool? (yes/no) [default=yes]: yes
Name of the new storage pool [default=default]: default

–

Name of the storage backend to use (ceph, cephobject, dir, lvm, zfs, btrfs) [default=zfs]: zfs
Create a new ZFS pool? (yes/no) [default=yes]: yes

–

Would you like to use an existing empty block device (e.g. a disk or partition)? (yes/no) [default=no]: no
Size in GiB of the new loop device (1GiB minimum) [default=30GiB]: 450GiB
Would you like to connect to a MAAS server? (yes/no) [default=no]: no
Would you like to create a new local network bridge? (yes/no) [default=yes]: yes
What should the new bridge be called? [default=lxdbr0]: lxdbr0
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 1.2.3.4/24
Would you like LXD to NAT IPv4 traffic on your bridge? [default=yes]: yes
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: none
Would you like the LXD server to be available over the network? (yes/no) [default=no]: no
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: yes
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: yes

Finally, I imported my VMs

lxc import <instancename> instance name-backup.tar.gz

Importing instance: 100% (141.10MB/s)

These took a while (spinning rust, I know right!)

I needed to recreate my networks and a few other things afterwards, but you get the idea.

What did I learn today?

Read the fine manual. Read it well.

P.S. This is great too:

tomp · November 14, 2022, 10:48am

Thanks for the great write up, really useful

Also see Btrfs - btrfs - LXD documentation where we recommend against using VMs on BTRFS for this reason.