Btrfs error when moving specific lxd container

Hello,

I am moving my lxd containers to a new storage pool using the following commands:
lxc move $containername tmp -s pool1
lxc move tmp $containername

However, this migration failed on some of the containers. I am using btrfs for both the old and the new storage pool. For most of the containers, lxc move worked fine.

The error i get is the following:

Error: Create instance from copy: Create instance volume from copy failed: [Failed sending volume unifi:/: Btrfs send failed: [exit status 1] (At subvol /var/snap/lxd/common/lxd/storage-pools/default/containers/migration.290847417/.migration-send
ERROR: send ioctl failed with -5: Input/output error
) Failed to run: btrfs receive -e /var/snap/lxd/common/lxd/storage-pools/pool1/containers/migration.229283394: At subvol .migration-send
ERROR: unexpected EOF in stream]

I already tried exporting the container, but this failed, too:

Error: Create backup: Backup create: Error adding “/var/snap/lxd/common/lxd/storage-pools/default/backup.386912943/unifi/rootfs/var/lib/unifi/db/diagnostic.data/metrics.2021-01-14T07-29-22Z-00000” as “backup/container/rootfs/var/lib/unifi/db/diagnostic.data/metrics.2021-01-14T07-29-22Z-00000” to tarball: Failed to copy file content “/var/snap/lxd/common/lxd/storage-pools/default/backup.386912943/unifi/rootfs/var/lib/unifi/db/diagnostic.data/metrics.2021-01-14T07-29-22Z-00000”: read /var/snap/lxd/common/lxd/storage-pools/default/backup.386912943/unifi/rootfs/var/lib/unifi/db/diagnostic.data/metrics.2021-01-14T07-29-22Z-00000: input/output error

Do you have an idea why I can’t move this container or know a fix for this issue?

Thanks

Anything nasty looking in dmesg?

input/output error usually suggests something’s pretty wrong.

Thanks for the quick answer!
dmesg indeed gives me some errors:

[16731.980757] BTRFS warning (device loop7): csum failed root 523 ino 19242 off 4096 csum 0xe3778fec expected csum 0xe51656c6 mirror 1
[16731.980794] BTRFS warning (device loop7): csum failed root 523 ino 19242 off 4096 csum 0xe3778fec expected csum 0xe51656c6 mirror 1

Do your have any idea how to fix this? For me, these lines don’t seem to be very helpful

Those lines indicate data corruption on the disk, as you’re using a loop, this may indicate data corruption on the underlying disk which may not have gotten detected.

Did your system recently crash, suffer power loss or had its disk completely filled up?

In any case, you can try something like:

  • mount /dev/loop7 /mnt
  • btrfs scrub start /mnt

This will scan the entire filesystem to look for any issues.

The commands gave me the following output:

$ sudo btrfs scrub start -B /mnt
scrub done for b58f82af-7fe2-42ce-9066-58d22be8abb2
Scrub started: Sat Mar 20 23:15:28 2021
Status: finished
Duration: 0:00:06
Total to scrub: 12.57GiB
Rate: 453.79MiB/s
Error summary: csum=2
Corrected: 0
Uncorrectable: 2
Unverified: 0
ERROR: there are uncorrectable errors

I didn’t find any verbose output options, is there a way to find and resolve the issues?
The root fs of the container is also mounted read-only, is there a way to fix this?
I did not have any crashes or power loss. The only thing I have recently done is moving my containers to the new storage pool

I tried deleting the container to reinstall it on the new pool with the backup data, but this also gives me a read-only-filesystem error:

sudo lxc delete unifi
Error: Error deleting storage volume: Failed setting subvolume writable “/var/snap/lxd/common/lxd/storage-pools/default/containers/unifi”: Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/default/containers/unifi ro false: ERROR: failed to set flags for /var/snap/lxd/common/lxd/storage-pools/default/containers/unifi: Read-only file system

How can I force the deletion of the container?

You’re going to need to reboot your system, btrfs flipped itself readonly due to the corruption.

Thanks for trying to help!

After rebooting the system, the output of lxc list first looked pretty good, every container was started. The container started successfully, I was able to connect to the vpn running on the container and ping its ip address.
However, I wasn’t to run lxc exec $containername bash or any other command on the containers I was unable to move; it just returned to the host bash without any output.
When I try to run lxc stop $containername or lxc info --show-log $containername, I just get the error

Error: allocating the container failed

lxc list now outputs “ERROR” as the state of the two containers I wasn’t able to move. Because of the error mentioned above, I can not even run lxc delete $containername, is there a way to manually delete the containers or the storage pool as a whole?

I was now able to reboot without the container autostart and got the following errors when trying to delete the container:

$ sudo lxc delete wireguard
Error: Error deleting storage volume: lstat /var/snap/lxd/common/lxd/storage-pools/default/containers/wireguard/rootfs/usr/include/x86_64-linux-gnu/bits/types/struct_sigstack.h: input/output error

dmesg output:

[ 305.861840] BTRFS error (device loop6): parent transid verify failed on 976879616 wanted 124037 found 202015
[ 305.863005] BTRFS error (device loop6): parent transid verify failed on 976879616 wanted 124037 found 202015
[ 305.863059] BTRFS error (device loop6): error loading props for ino 266089 (root 261): -5
[ 305.863243] BTRFS error (device loop6): parent transid verify failed on 976879616 wanted 124037 found 202015
[ 305.863367] BTRFS error (device loop6): parent transid verify failed on 976879616 wanted 124037 found 202015
----[long stacktrace path…]----
[ 331.798368] BTRFS: error (device loop6) in btrfs_run_delayed_refs:2227: errno=-5 IO failure
[ 331.798412] BTRFS info (device loop6): forced readonly