No space left on device while changing the BTRFS storage pool of an LXD container with docker

Hello all,

I am experiencing a problem while moving one of my LXD containers to a different (larger) storage pool.
My initial goal was to increase the size of the storage pool used by my container. To that end, I tried to follow the steps described here: Change Storage (Size and Driver).
My container was initially using the default strorage pool (30GB). I first created a new storage pool bigstorage with 130GB and then did the following:

lxc stop Mycontainer
lxc move Mycontainer MycontainerCopy
lxc move MycontainerCopy Mycontainer --storage=bigstorage

During the execution of the last command, I obtained the following error:
Error: Create instance from copy: Create instance volume from copy failed: [Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/MycontainerCopy/snap-Jan-17-2020/rootfs/var/lib/docker/btrfs/subvolumes/381c436496c9289c6464401febbede4ff21dc0c5e45e8d79d73f678a5636e0d8 ro true: ERROR: failed to set flags for /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/MycontainerCopy/snap-Jan-17-2020/rootfs/var/lib/docker/btrfs/subvolumes/381c436496c9289c6464401febbede4ff21dc0c5e45e8d79d73f678a5636e0d8: No space left on device Failed to run: btrfs receive -e /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/migration.207101739: ERROR: empty stream is not considered valid]

It seems the problem occured while transferring a snapshot of my container called snap-Jan-17-2020.
Mycontainer is taking 16GB of disk space:

$ lxc storage info default
	info:
	  description: ""
	  driver: btrfs
	  name: default
	  space used: 16.59GB
	  total space: 30.00GB
	used by:
	  images:
	  - bbe2058f62ee0778bba9427feba2f75fd8902995cd4867732d3211bdd3904db8
	  instances:
	  - MycontainerCopy
	  profiles:
	- default

I tried to increase the volume.size of both default and bigstorage to 100GB as suggested here LXC copy runs out of disk space and here “lxc launch” fails due to “no space left on device” but I obtained the same error. The total space of my disk is 160GB.

I also notice that the space used on my disk decreased, I have the feeling that part of the data that was generated while executing command lxc move MycontainerCopy Mycontainer --storage=bigstorage was not deleted. Is it possible to free this space?

Thank you in advance!

Take a look int /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers and see if there are left over migration directories. You can try deleting them although they may be write protected.

What storage type is your original default pool?

Thanks for the hint! Actually this directory and its default counterpart are empty, I must have missed something (probably the decrease in free space is due to something else I did).

The default storage is btrfs (did I understand correctly what you mean by “type”?)

I tried more or less the same procedure on a different LXD manager and something really weird happened. Initially, I had two containers on the default storage: taking only 18GB out of 30GB:

info:
  description: ""
  driver: btrfs
  name: default
  space used: 18.45GB
  total space: 30.00GB
used by:
  images:
  - d42adae80c7bbe780b23d225d73dec2003b4f17fed0690706d4f1baf7e2c9d10
  instances:
  - container1
  - container2
  profiles:
  - default

The total space of the disk is 160GB and outside LXD only 2GB were taken (total free space before the procedure: 160 - 18 -2 = 140GB).

I created a new storage pool bigstorage of 100GB. I then tried to move container2 on the new storage pool. The procedure took quite a while and I could see a progressbar showing:

 Transferring instance: container2: 89.01GB (35MB/s)

Once the progressbar reached 89GB of transferred data, the same No space left on device error appeared and now I only have 40GB of free space left on the disk! I am wondering why LXD tried to transfer more than 90GB of data from the default storage (with size 30GB and only filled with 18GB of data, roughly half for container2 so I would expect at most 9GB of data transfer). Where did it find all this volume and where is it stored now?

Please can you show the output of:

  • lxc storage show <pool> for source and target pools.
  • lxc config show <instance> --expanded for instance.
  • lxc info <instance> for instance.
  • df -h inside the container.

The source pool:

$ lxc storage show default
config:
  size: 30GB
  source: /var/snap/lxd/common/lxd/disks/default.img
description: ""
name: default
driver: btrfs
used_by:
- /1.0/images/d42adae80c7bbe780b23d225d73dec2003b4f17fed0690706d4f1baf7e2c9d10
- /1.0/instances/container1
- /1.0/instances/container2
- /1.0/profiles/default
status: Created
locations:
- none

The target pool:

$ lxc storage show bigstorage
config:
  size: 100GB
  source: /var/snap/lxd/common/lxd/disks/bigstorage.img
description: ""
name: bigstorage
driver: btrfs
used_by: []
status: Created
locations:
- none

The container I want to move from source to target pool:

 $ lxc config show container2 --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 20.04 LTS amd64 (release) (20210105)
  image.label: release
  image.os: ubuntu
  image.release: focal
  image.serial: "20210105"
  image.type: squashfs
  image.version: "20.04"
  limits.kernel.memlock: unlimited
  security.nesting: "true"
  security.privileged: "false"
  user.network-config: |
    version: 2
    ethernets:
        eth0:
            addresses:
            - IP_ADDRESS/32
            nameservers:
                addresses:
                - DNS_IP
                search: []
            routes:
            -   to: 0.0.0.0/0
                via: 169.254.0.1
                on-link: true
  volatile.base_image: 21da67063730fc446ca7fe090a7cf90ad9397ff4001f69907d7db690a30897c3
  volatile.eth0.host_name: veth1d6e677c
  volatile.eth0.hwaddr: 00:16:3e:3e:81:b6
  volatile.eth0.last_state.created: "false"
  volatile.eth0.name: eth0
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
devices:
  eth0:
    ipv4.address: IP_ADDRESS
    nictype: routed
    parent: ens3
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
- routed
stateful: false
description: ""

$ lxc info container2
Name: container2
Location: none
Remote: unix://
Architecture: x86_64
Created: 2021/01/06 17:44 UTC
Status: Stopped
Type: container
Profiles: default, routed
Snapshots:
  snap-Jan-17-2020 (taken at 2021/01/17 18:49 UTC) (stateless)

From inside container2:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop5       28G   23G  4.6G  84% /
none            492K  4.0K  488K   1% /dev
udev            3.8G     0  3.8G   0% /dev/tty
tmpfs           100K     0  100K   0% /dev/lxd
tmpfs           100K     0  100K   0% /dev/.lxd-mounts
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           778M  212K  777M   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.8G     0  3.8G   0% /sys/fs/cgroup

Thanks.

Can you show the output of df -h on the host too please.

df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.8G     0  3.8G   0% /dev
tmpfs           778M  1.0M  777M   1% /run
/dev/sda1       155G  120G   36G  78% /
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.8G     0  3.8G   0% /sys/fs/cgroup
/dev/sda15      105M  3.9M  101M   4% /boot/efi
/dev/loop0       56M   56M     0 100% /snap/core18/1932
/dev/loop1       56M   56M     0 100% /snap/core18/1944
/dev/loop2       32M   32M     0 100% /snap/snapd/10492
/dev/loop3       32M   32M     0 100% /snap/snapd/10707
/dev/loop4       68M   68M     0 100% /snap/lxd/18150
tmpfs           1.0M     0  1.0M   0% /var/snap/lxd/common/ns
/dev/loop6       70M   70M     0 100% /snap/lxd/19032
tmpfs           778M     0  778M   0% /run/user/1001

There is only 36GB left on /dev/sda1 because I already tried to move container2 from the default pool to bigstorage (and for some reason 89GB of data were transferred even though container2 was taking something like 10GB in memory, and then the No space left on device error showed up), before that the available space was roughly 140GB.

Can you show me output of this command from the host

sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- df -h

This should show the utilisation of the mounted BTRFS loopback filesystems mounted inside the snap mount namespace.

$ sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       155G  120G   36G  78% /var/lib/snapd/hostfs
tmpfs           778M 1020K  777M   1% /var/lib/snapd/hostfs/run
tmpfs           5.0M     0  5.0M   0% /var/lib/snapd/hostfs/run/lock
/dev/sda15      105M  3.9M  101M   4% /var/lib/snapd/hostfs/boot/efi
/dev/loop0       56M   56M     0 100% /snap/core18/1932
/dev/loop1       56M   56M     0 100% /
/dev/loop2       32M   32M     0 100% /snap/snapd/10492
/dev/loop3       32M   32M     0 100% /snap/snapd/10707
/dev/loop4       68M   68M     0 100% /snap/lxd/18150
udev            3.8G     0  3.8G   0% /dev
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           3.8G     0  3.8G   0% /sys/fs/cgroup
tmpfs           1.0M     0  1.0M   0% /var/snap/lxd/common/ns
tmpfs           1.0M     0  1.0M   0% /var/snap/lxd/common/shmounts
tmpfs           3.8G     0  3.8G   0% /run
tmpfs           3.8G  112K  3.8G   1% /etc
tmpfs           3.8G     0  3.8G   0% /usr/share/misc
tmpfs           100K     0  100K   0% /var/snap/lxd/common/shmounts/instances
tmpfs           100K     0  100K   0% /var/snap/lxd/common/lxd/devlxd
/dev/loop5       28G   23G  4.6G  84% /var/snap/lxd/common/lxd/storage-pools/default
/dev/loop6       70M   70M     0 100% /snap/lxd/19032
/dev/loop7       94G   24M   86G   1% /var/snap/lxd/common/lxd/storage-pools/bigstorage
tmpfs           778M     0  778M   0% /var/lib/snapd/hostfs/run/user/1001

OK so we have these two storage pools mounted:

Filesystem      Size  Used Avail Use% Mounted on
/dev/loop5       28G   23G  4.6G  84% /var/snap/lxd/common/lxd/storage-pools/default
/dev/loop7       94G   24M   86G   1% /var/snap/lxd/common/lxd/storage-pools/bigstorage

Out of interest, what happens if you do:

lxc copy MycontainerCopy Mycontainer --storage=bigstorage --instance-only

This just copies the instance and not the snapshot.

A progressbar is displayed:

Transferring instance: MycontainerCopy: __GB

where __ increases up to some really high number (last time I stayed in front and it stopped at 89GB, this time I was checking from time to time and saw it went above 50GB --which is already much much bigger than the space taken by MycontainerCopy --, I guess it also increased up to 89GB this time). At the end the progressbar disappears and the following error is returned:

Error: Create instance from copy: Create instance volume from copy failed: [Failed sending volume MycontainerCopy:/rootfs/var/lib/docker/btrfs/subvolumes/e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619: Btrfs send failed: [signal: killed context canceled] (At subvol /var/snap/lxd/common/lxd/storage-pools/default/containers/migration.502227479/.migration-send/rootfs/var/lib/docker/btrfs/subvolumes/e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619
) Failed to run: btrfs receive -e /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/migration.442894474: At subvol e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619
ERROR: writing to opt/pyenv/versions/3.8.6/bin/python3.8 failed: No space left on device]

Can it be related to the content of my container? I don’t know if it is relevant but I have docker containers inside the LXD container.

Yes this seems likely. What storage driver/layer do you use docker with? Perhaps there is some expansion going on with BTRFS subvolumes?

Out of interest, have you tried deleting the new storage pool and creating one with a non-BTRFS driver, say LVM or ZFS instead, and trying copying to that?

It would also be interesting for you to re-try copying using a BTRFS target with debug mode enabled and show the output of the logs:

sudo snap set lxd daemon.debug=true; sudo systemctl reload snap.lxd.daemon
sudo tail -f /var/snap/lxd/common/lxd/logs/lxd.log

According to docker info, I am also using btrfs storage driver for the docker containers inside the lxd container.

I tried with LVM driver and obtained a similar error:

Error: Create instance from copy: Create instance volume from copy failed: [Rsync send failed: MycontainerCopy, /var/snap/lxd/common/lxd/storage-pools/default/containers/MycontainerCopy/: [exit status 11 read unix @lxd/c6aad517-cb61-4e40-aab4-8c5e333048b6->@: use of closed network connection] (rsync: write failed on "/var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/Mycontainer/rootfs/var/lib/docker/btrfs/subvolumes/178a37a7d94c4037f4656fea51117cd5bda6197d7773fc7365c61fef953177ce/openedx/venv/lib/python3.8/site-packages/_sass.cpython-38-x86_64-linux-gnu.so": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2]
) Rsync receive failed: /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/Mycontainer/: [exit status 11] ()]

I am currently trying with zfs, it is taking a lot of time and the amount of data transfer is already bigger than 30GB so I expect the same error.
I aslo tried with debug mode and obtained the following logs:

t=2021-01-27T15:59:53+0000 lvl=dbug msg="Failure for task operation: 517395a3-f316-42cb-a24e-e0d11cec100e: Create instance from copy: Create instance volume from copy failed: [Failed sending volume MycontainerCopy:/rootfs/var/lib/docker/btrfs/subvolumes/e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619: Btrfs send failed: [signal: killed context canceled] (At subvol /var/snap/lxd/common/lxd/storage-pools/default/containers/migration.152091799/.migration-send/rootfs/var/lib/docker/btrfs/subvolumes/e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619\n) Failed to run: btrfs receive -e /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/migration.081846296: At subvol e867eb55145b59ec709732c99bb73fd5d423b5133682d86006c452c938634619\nERROR: writing to openedx/edx-platform/common/test/data/manual-testing-complete/static/1.pdf failed: No space left on device]"

Is there a way to increase the size of the default storage instead of creating a bigger storage and moving the container to that new storage?

So the issue is on the sending side (i.e generating too much data).

Can you send me the full debug logs from the moment you start the operation, rather than just the failure messages (I want to see if it logs anything about BTRFS subvolumes). Thanks

@stgraber any ideas on this one, it feels like an issue with docker BTRFS subvolumes inside the BTRFS container rootfs.

But seems to be occurring even when using rsync between different storage drivers (not just optimized migration between BTRFS pools).

@rfruit another option would be to export the container as a tarball and then reimport.

E.g.

lxc export <container> /path/to/a/tarball.tar.gz

It would be interesting to see how big that file got too.

According to my tests, we should be able to see the effective disk usage (excluding BTRFS optimizations from snapshots) by doing:

sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- du -h /var/snap/lxd/common/lxd/storage-pools/default

This may show us where the problem is.

1 Like

Ok so actually I was wrong: it worked with zfs, but on the new storage bigstorage the space used is 50GB which is really high compared to the storage used by the moved container on the original default container.

lxc storage info bigstorage
info:
  description: ""
  driver: zfs
  name: bigstorage
  space used: 49.98GB
  total space: 96.74GB
used by:
  instances:
  - Mycontainer

I think the size usage your seeing on the source pool is somewhat misleading as its taking into account optimizations used by BTRFS snapshots. These will not necessarily be replicated as efficiently when moving pools.

I would have expected ZFS and LVM to be reasonably similar, but running du should help to see whats happening.

@tomp The debug logs are really long and I do not know if I am able to differentiate the various commands I launched :worried:

The lines output by
sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- du -h --threshold=1GB /var/snap/lxd/common/lxd/storage-pools/default (only greater than 1GB) are:

 1.4G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs/opt/freeswitch
1.4G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs/opt
1.4G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs/usr/lib
1.4G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs/usr/share
3.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs/usr
3.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs/var/lib/docker/btrfs/subvolumes
3.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs/var/lib/docker/btrfs
3.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs/var/lib/docker
4.0G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs/var/lib
4.9G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs/var
9.9G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1/rootfs
9.9G    /var/snap/lxd/common/lxd/storage-pools/default/containers/container1
9.9G    /var/snap/lxd/common/lxd/storage-pools/default/containers
1.4G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-14-2021/rootfs/usr/lib
1.4G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-14-2021/rootfs/usr/share
3.2G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-14-2021/rootfs/usr
1.4G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-14-2021/rootfs/var/cache/apt/archives
1.5G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-14-2021/rootfs/var/cache/apt
1.5G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-14-2021/rootfs/var/cache
2.0G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-14-2021/rootfs/var
5.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-14-2021/rootfs
5.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-14-2021
1.4G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/usr/lib
1.4G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/usr/share
3.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/usr
1.5G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/var/cache/apt/archives
1.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/var/cache/apt
1.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/var/cache
3.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/var/lib/docker/btrfs/subvolumes
3.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/var/lib/docker/btrfs
3.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/var/lib/docker
4.0G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/var/lib
5.6G    /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs/var
11G     /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021/rootfs
11G     /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1/snap-Jan-15-2021
16G     /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots/container1
16G     /var/snap/lxd/common/lxd/storage-pools/default/containers-snapshots
26G     /var/snap/lxd/common/lxd/storage-pools/default

(container1 is the one that stayed on the default storage, now the other container has moved on bigstorage since it succeeded with zfs driver)

Already it is surprising since lxc storage info default is indicating only 10GB of used space… (which only corresponds to the space used by the container without snapshots). On the other hand sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- du -h /var/snap/lxd/common/lxd/storage-pools/bigstorage gives only:

4.0K    /var/snap/lxd/common/lxd/storage-pools/bigstorage/custom-snapshots
4.0K    /var/snap/lxd/common/lxd/storage-pools/bigstorage/virtual-machines-snapshots
4.0K    /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers-snapshots/Mycontainer/snap-Jan-17-2020
8.0K    /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers-snapshots/Mycontainer
12K     /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers-snapshots
4.0K    /var/snap/lxd/common/lxd/storage-pools/bigstorage/custom
4.0K    /var/snap/lxd/common/lxd/storage-pools/bigstorage/virtual-machines
4.0K    /var/snap/lxd/common/lxd/storage-pools/bigstorage/images
4.0K    /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers/Mycontainer
8.0K    /var/snap/lxd/common/lxd/storage-pools/bigstorage/containers
44K     /var/snap/lxd/common/lxd/storage-pools/bigstorage

Whereas lxc storage info default is indicating 49GB of used space. I am a bit lost to be honest :smile: