Zfs quota set for vm dataset by LXD?

Hi All,

I just spent the last few hours trying to determine why my LXD VM’s would boot, but not find network interfaces. I’ve solved the issue, but I’m left with a question about how LXD handles ZFS storage for its VMs.

I see that when I run lxc launch ubuntu:focal --vm -p default my-test that LXD generates two zfs objects, a dataset, which in my case lives at data-pool/lxd-store/virtual-machines/default_my-test and appears to hold config data and other qemu related things, and a zvol at data-pool/lxd-store/virtual-machines/default_my-test.block. The dataset has a ZFS quota of 95.4M applied to it, but I didn’t set this quota, so I assume that LXD did somehow? (my actual issue was warning: tap: open vhost char device failed: Permission denied at the end of lxd info my-test --show-log and the result was that commands hung, and no VM’s could use the network)

The thing that is strange is that older VM’s on the same system (LXD 4.15, has always been v4.x) do not have this quota set. I assume that because of my snapshot policy(@daily), this quota was responsible for making all the VM’s fail to boot with networking and show lots of other weird issues, like hung lxc stop my-test commands when my snapshots filled up the 95.4M of space on that dataset.

My two questions are:

  1. Did LXD set this quota, and how to I control that/set a new default value?
  2. Is there any problem for the long run with removing the quota, or should i leave it at 1G which is how i set it to get things working again?

All my LXD containers ran flawlessly throughout this incident.
Thanks for any tips or help you all can offer!

You can set size.state to increase the size of the config/state volume.

1 Like

Assuming data-pool/lxd-store/virtual-machines/default_my-test holds the configuration information, how did this become greater than 100MB, I am curious because i just started to use LXD VMs.

I’ve been thinking about this myself today. I suspect it may be that the copy of the lxd-agent each time the instance is started to the config dir might be causing extra snapshot usage. Will investigate.

I did a quick test, created a VM, installed apache and mysql-server, and then created a snapshot. I then went and uninstalled, and the snapshot went to 95MB, but that config file for the VM did not increase in size.

Yes thats the issue:

lxc launch images:ubuntu/focal v1 -s zfs --vm
sudo zfs list | grep test_v1
NAME                                                                                  USED  AVAIL     REFER  MOUNTPOINT
zfs/virtual-machines/test_v1                                                         10.5M  84.9M     10.5M  /var/lib/lxd/storage-pools/zfs/virtual-machines/test_v1
zfs/virtual-machines/test_v1.block                                                      1K   223G      383M  -

lxc stop v1
lxc snapshot v1
lxc start v1

sudo zfs list | grep test_v1
NAME                                                                                  USED  AVAIL     REFER  MOUNTPOINT
zfs/virtual-machines/test_v1                                                         20.5M  74.8M     10.0M  /var/lib/lxd/storage-pools/zfs/virtual-machines/test_v1
zfs/virtual-machines/test_v1.block                                                   1.51M   223G      384M  -

So just over 10M difference in the AVAIL space for the config volume (meaning that we’ve lost 10M space tracking the new snapshot), and the lxd-agent is just over 10M in size:

du  /var/lib/lxd/storage-pools/zfs/virtual-machines/test_v1/config/lxd-agent 
10711	/var/lib/lxd/storage-pools/zfs/virtual-machines/test_v1/config/lxd-agent

@stgraber I’m going to look at whether we can use a shared lxd-agent binary directly or be more clever about when we copy it into the config dir.

You are correct, as I did not restart the VM I did not see it. However after restarting the size of the size jumped from 10MB to 21. I then restarted again, but this time it did not increase. I then created another snapshot and restarted and now it jumped to 31MB.

This helps quite a bit:

There is still some minor usage each snapshot due to regenerating the certificates and other small text files, but certainly the big usage jump of the lxd-agent has been resolved.

Hey thanks everyone! I understand the problem better now, and its awesome to have a fix so soon! Much appreciated.

1 Like