Lxc import: Disk quota exceeded when filesystem is not full (ZFS)

dutchy76 · January 3, 2022, 5:08pm

Hi there,

I’m currently trying to import a virtual machine created with lxc export. However I’m running into some ZFS-related trouble (I think):

Error: Create instance from backup: Error starting unpack: Failed to run: tar -zxf - --xattrs-include=* -C /var/snap/lxd/common/lxd/storage-pools/storage-pool/virtual-machines/the-vm --strip-components=2 backup/virtual-machine: tar: state: Cannot write: Disk quota exceeded

This seems odd to me as neither my root filesystem, nor my ZFS volume are full:

Filesystem                         Size  Used Avail Use% Mounted on
udev                                40G     0   40G   0% /dev
tmpfs                              7.9G  4.5M  7.9G   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv  365G   37G  310G  11% /

NAME                                                                                                     USED  AVAIL     REFER  MOUNTPOINT
storage-pool                                                                                            2.08T   623G     39.3K  /storage-pool
storage-pool/k8s                                                                                        39.3K   623G     39.3K  /storage-pool/k8s
storage-pool/lxd                                                                                         384G   623G     39.3K  /mnt/tmp/

I dont have a quota, refquota, reservation or resreservation set either on any ZFS (sub)volume. The VM’s configured root disk size is 30GB, the backup itself is 2.2GB.

System details

uname -a:

Linux artemis 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

zfs -V:

zfs-0.8.3-1ubuntu12.9
zfs-kmod-0.8.3-1ubuntu12.12

lxc --version:

4.20

The original export was created with lxc export on a btrfs filesystem, which wasn’t full either nor does it use quotas. lxc copy fails with the same error (this is what I initially tried).

Any help would be appreciated. Thanks!

kevinmoilar · January 3, 2022, 8:12pm

Hi @dutchy76. Did you check the diskspace inside the lxd namespace?
sudo nsenter -t $(cat /var/snap/lxd/common/lxd.pid) -m
then you can see the space used on the storage backend per df -h

If using snap, else the path for cat will be different

dutchy76 · January 3, 2022, 9:02pm

Hi there, thanks for your response!

After entering the LXD namespace and checking with df -h for the usage of the storage-pool filesystem, it is shown to have 1% usage with 624GB of free space left.

Thanks.

kevinmoilar · January 3, 2022, 9:58pm

how about the inodes on your host system?
df -hi

dutchy76 · January 3, 2022, 9:59pm

Inodes are at 2% usage for / and 1% usage for storage-pool

stgraber · January 4, 2022, 2:44am

The failure would likely have caused the dataset to be deleted so not much to look at.
I’d recommend posting zfs list -t all as well as running the command during the lxc copy or lxc import to get a better idea of what’s running out of space.

dutchy76 · January 4, 2022, 12:32pm

I’ve given that a go. What seems to be happening is that the VM gets allocated a block of storage, which doesn’t grow past 95.5 MB for some reason, eventually leaving the VM’s allocated storage with 0B free.

I’ve tried manually creating the subvolume, which isn’t possible as LXD tries to as well. I’ve also tried to reserve space using zfs set reservation=5G <subvolume>, but this fails with size is greater than available space which is odd to me.

dutchy76 · January 4, 2022, 12:47pm

So the issue seems to be on the metadata volume. After removing it’s quota, it seems to be importing fine.

However the metadata volume seemed oddly large, coming it at just over 300MB, whereas my other VMs didnt go much bigger than 10MB. After mounting that dataset, the largest file is the state file, the VM has stateful migration enabled.

I think this is related to https://github.com/lxc/lxd/issues/9723 although I’m not 100% certain.

tomp · January 4, 2022, 1:10pm

Was the VM running when you exported it?
Can you show the output of lxc config show <instance> --expanded on the source?

dutchy76 · January 4, 2022, 1:15pm

The VM was not running no. The config did have migration.stateful: true on shutdown though, which I have since removed, resulting in a successful export and re-import (lxc copy succeeds too).

This is the config as it is now, the only difference being that migration.stateful has been removed:

architecture: x86_64
config:
  image.architecture: amd64
  image.description: Ubuntu focal amd64 (20210318_07:42)
  image.os: Ubuntu
  image.release: focal
  image.serial: "20210318_07:42"
  image.type: disk-kvm.img
  image.variant: cloud
  limits.cpu: "1"
  limits.memory: 1GB
  migration.stateful: "true"
  security.secureboot: "false"
  user.user-data: |
    #cloud-config
    package_upgrade: true
    packages:
    - openssh-server
    - curl
    - wget
    - nano
    - htop
    users:
    - name: tobias
      sudo: ALL=(ALL) NOPASSWD:ALL
      shell: /bin/bash
      lock_passwd: false
      ssh_authorized_keys:
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJvgqqT0ZmhYxcq1ZbOkGG53Pe5GDlWNXBdiZc8zodLQ github@thedutchmc.nl
  volatile.base_image: 712bf2caa78d3a72377c34c9b52198f0db4bebfb646b284aad5a9d560580295d
  volatile.eth0.host_name: tapdb102bf8
  volatile.eth0.hwaddr: 00:16:3e:58:dd:b4
  volatile.last_state.power: RUNNING
  volatile.uuid: d2e1c072-0db9-4855-a930-bbf18ff66d63
  volatile.vsock_id: "107"
devices:
  eth0:
    nictype: bridged
    parent: br1
    type: nic
  root:
    path: /
    pool: storage-pool
    size: 30GB
    type: disk
ephemeral: false
profiles:
- default
- cloud-init
- limits.tiny
stateful: false
description: ""

tomp · January 4, 2022, 1:21pm

Had you done a stateful stop before exporting?

dutchy76 · January 4, 2022, 1:22pm

I did lxc stop originally, when the migration.stateful: true was still present, so I think i was, yes.

tomp · January 5, 2022, 9:55am

You would have had to run lxc stop <instance> --stateful though to do that.

Can you get me a list of the contents of the exported tarball please.

dutchy76 · January 5, 2022, 9:00pm

Of course, I’ve attached it as a screenshot.

tomp · January 6, 2022, 9:20am

OK so looks like at some point you’ve performed a stateful stop and then disabled migration.stateful, which has left the stateful state file in place. Then this has been included in the export, but there’s not enough room to recreate it when importing it because the root disk device doesn’t have size.state set to a large enough value.

My suggestion would be:

Start the source VM again - with migration.stateful off this should remove the state file.
Export the VM again.
Import it this time without the state file should work.

@stgraber should LXD perhaps remove the state file if migration.stateful is disabled?

stgraber · January 6, 2022, 2:57pm

Hmm, I don’t think so, that may lead to data loss if say, someone modifies a profile by accident and wipes saved state from a bunch of VMs.

In general our logic to wipe any state file during start should work pretty well.