I am experiencing a problem while moving one of my LXD containers to a different (larger) storage pool.
My initial goal was to increase the size of the storage pool used by my container. To that end, I tried to follow the steps described here: Change Storage (Size and Driver).
My container was initially using the default strorage pool (30GB). I first created a new storage pool bigstorage with 130GB and then did the following:
During the execution of the last command, I obtained the following error: Error: Create instance from copy: Create instance volume from copy failed: [Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/MycontainerCopy/snap-Jan-17-2020/rootfs/var/lib/docker/btrfs/subvolumes/381c436496c9289c6464401febbede4ff21dc0c5e45e8d79d73f678a5636e0d8 ro true: ERROR: failed to set flags for /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/MycontainerCopy/snap-Jan-17-2020/rootfs/var/lib/docker/btrfs/subvolumes/381c436496c9289c6464401febbede4ff21dc0c5e45e8d79d73f678a5636e0d8: No space left on device Failed to run: btrfs receive -e /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/migration.207101739: ERROR: empty stream is not considered valid]
It seems the problem occured while transferring a snapshot of my container called snap-Jan-17-2020.
Mycontainer is taking 16GB of disk space:
$ lxc storage info default
info:
description: ""
driver: btrfs
name: default
space used: 16.59GB
total space: 30.00GB
used by:
images:
- bbe2058f62ee0778bba9427feba2f75fd8902995cd4867732d3211bdd3904db8
instances:
- MycontainerCopy
profiles:
- default
I also notice that the space used on my disk decreased, I have the feeling that part of the data that was generated while executing command lxc move MycontainerCopy Mycontainer --storage=bigstorage was not deleted. Is it possible to free this space?
Take a look int /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers and see if there are left over migration directories. You can try deleting them although they may be write protected.
Thanks for the hint! Actually this directory and its default counterpart are empty, I must have missed something (probably the decrease in free space is due to something else I did).
The default storage is btrfs (did I understand correctly what you mean by “type”?)
I tried more or less the same procedure on a different LXD manager and something really weird happened. Initially, I had two containers on the default storage: taking only 18GB out of 30GB:
info:
description: ""
driver: btrfs
name: default
space used: 18.45GB
total space: 30.00GB
used by:
images:
- d42adae80c7bbe780b23d225d73dec2003b4f17fed0690706d4f1baf7e2c9d10
instances:
- container1
- container2
profiles:
- default
The total space of the disk is 160GB and outside LXD only 2GB were taken (total free space before the procedure: 160 - 18 -2 = 140GB).
I created a new storage pool bigstorage of 100GB. I then tried to move container2 on the new storage pool. The procedure took quite a while and I could see a progressbar showing:
Once the progressbar reached 89GB of transferred data, the same No space left on device error appeared and now I only have 40GB of free space left on the disk! I am wondering why LXD tried to transfer more than 90GB of data from the default storage (with size 30GB and only filled with 18GB of data, roughly half for container2 so I would expect at most 9GB of data transfer). Where did it find all this volume and where is it stored now?
There is only 36GB left on /dev/sda1 because I already tried to move container2 from the default pool to bigstorage (and for some reason 89GB of data were transferred even though container2 was taking something like 10GB in memory, and then the No space left on device error showed up), before that the available space was roughly 140GB.
where __ increases up to some really high number (last time I stayed in front and it stopped at 89GB, this time I was checking from time to time and saw it went above 50GB --which is already much much bigger than the space taken by MycontainerCopy --, I guess it also increased up to 89GB this time). At the end the progressbar disappears and the following error is returned:
Error: Create instance from copy: Create instance volume from copy failed: [Failed sending volume MycontainerCopy:/rootfs/var/lib/docker/btrfs/subvolumes/e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619: Btrfs send failed: [signal: killed context canceled] (At subvol /var/snap/lxd/common/lxd/storage-pools/default/containers/migration.502227479/.migration-send/rootfs/var/lib/docker/btrfs/subvolumes/e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619
) Failed to run: btrfs receive -e /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/migration.442894474: At subvol e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619
ERROR: writing to opt/pyenv/versions/3.8.6/bin/python3.8 failed: No space left on device]
Can it be related to the content of my container? I don’t know if it is relevant but I have docker containers inside the LXD container.
Yes this seems likely. What storage driver/layer do you use docker with? Perhaps there is some expansion going on with BTRFS subvolumes?
Out of interest, have you tried deleting the new storage pool and creating one with a non-BTRFS driver, say LVM or ZFS instead, and trying copying to that?
It would also be interesting for you to re-try copying using a BTRFS target with debug mode enabled and show the output of the logs:
According to docker info, I am also using btrfs storage driver for the docker containers inside the lxd container.
I tried with LVM driver and obtained a similar error:
Error: Create instance from copy: Create instance volume from copy failed: [Rsync send failed: MycontainerCopy, /var/snap/lxd/common/lxd/storage-pools/default/containers/MycontainerCopy/: [exit status 11 read unix @lxd/c6aad517-cb61-4e40-aab4-8c5e333048b6->@: use of closed network connection] (rsync: write failed on "/var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/Mycontainer/rootfs/var/lib/docker/btrfs/subvolumes/178a37a7d94c4037f4656fea51117cd5bda6197d7773fc7365c61fef953177ce/openedx/venv/lib/python3.8/site-packages/_sass.cpython-38-x86_64-linux-gnu.so": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2]
) Rsync receive failed: /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/Mycontainer/: [exit status 11] ()]
I am currently trying with zfs, it is taking a lot of time and the amount of data transfer is already bigger than 30GB so I expect the same error.
I aslo tried with debug mode and obtained the following logs:
t=2021-01-27T15:59:53+0000 lvl=dbug msg="Failure for task operation: 517395a3-f316-42cb-a24e-e0d11cec100e: Create instance from copy: Create instance volume from copy failed: [Failed sending volume MycontainerCopy:/rootfs/var/lib/docker/btrfs/subvolumes/e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619: Btrfs send failed: [signal: killed context canceled] (At subvol /var/snap/lxd/common/lxd/storage-pools/default/containers/migration.152091799/.migration-send/rootfs/var/lib/docker/btrfs/subvolumes/e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619\n) Failed to run: btrfs receive -e /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/migration.081846296: At subvol e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619\nERROR: writing to openedx/edx-platform/common/test/data/manual-testing-complete/static/1.pdf failed: No space left on device]"
Is there a way to increase the size of the default storage instead of creating a bigger storage and moving the container to that new storage?
So the issue is on the sending side (i.e generating too much data).
Can you send me the full debug logs from the moment you start the operation, rather than just the failure messages (I want to see if it logs anything about BTRFS subvolumes). Thanks
Ok so actually I was wrong: it worked with zfs, but on the new storage bigstorage the space used is 50GB which is really high compared to the storage used by the moved container on the original default container.
lxc storage info bigstorage
info:
description: ""
driver: zfs
name: bigstorage
space used: 49.98GB
total space: 96.74GB
used by:
instances:
- Mycontainer
I think the size usage your seeing on the source pool is somewhat misleading as its taking into account optimizations used by BTRFS snapshots. These will not necessarily be replicated as efficiently when moving pools.
I would have expected ZFS and LVM to be reasonably similar, but running du should help to see whats happening.
@tomp The debug logs are really long and I do not know if I am able to differentiate the various commands I launched
The lines output by sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- du -h --threshold=1GB /var/snap/lxd/common/lxd/storage-pools/default (only greater than 1GB) are:
(container1 is the one that stayed on the default storage, now the other container has moved on bigstorage since it succeeded with zfs driver)
Already it is surprising since lxc storage info default is indicating only 10GB of used space… (which only corresponds to the space used by the container without snapshots). On the other hand sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- du -h /var/snap/lxd/common/lxd/storage-pools/bigstorage gives only: