@depam what are the steps to reproduce this?
Hi @monstermunchkin,
I had the error first time but i am able to repro when trying to delete a big container. But easily reproducible on my end. What worked for me is to remove container on /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/storpool1/containers.
Hi @monstermunchkin , sorry after reading the other thread I realized we someone have same case. Containers with snapshots on it. At first i tried to do lxc copy [container] remote:[container] --refresh but it failed with Error: Error transferring instance data: Got error reading source.
I then tried to delete the container and decided to make it clean copy everything with snapshots but got the reported error.
As discussed on the topic Tom linked this affects delete and copy operations.
However so far I am unable to reproduce on freshly created containers or copies of those freshly created containers even when a snapshot has been applied.
OK I think I sussed out what the issue is (at least in my setup).
The path lxc is attempting to use in the btrfs command (as shown in the error) is:
/var/snap/lxd/common/lxd/storage-pools/default/containers/container
However the correct path on my system is:
/srv/lxd/containers/container
As can be seen here:
lxc storage list
+---------+--------+----------+-------------+---------+---------+
| NAME | DRIVER | SOURCE | DESCRIPTION | USED BY | STATE |
+---------+--------+----------+-------------+---------+---------+
| default | btrfs | /srv/lxd | | 16 | CREATED |
+---------+--------+----------+-------------+---------+---------+
/var/snap/lxd/common/lxd/storage-pools/default/ exists but is blank and isn’t even on a btrfs filesystem.
At first I thought maybe the path could be wrong as it was copied from the containers original server but I checked and that was on /srv as well.
@depam can you confirm when you got the error the path used in the command was also wrong?
@monstermunchkin where does it get this path from?
Is it stored in the containers metadata some how?
I’m wondering as it only seems to happen with my older containers not anything I newly create, which is why I’m guessing you can’t reproduce it?
OK here we go, I’ve got closer, this is reproducible with all 15 of my containers.
It seems the error I was missing/dismissing is:
Error: Error transferring instance data: Got error reading migration source: websocket: close 1006 (abnormal closure): unexpected EOF
This appears to completely wipe out the container in the filesystem on the target but not the entry in the lxd database, thus triggering the lxc delete command to use the wrong path. [Edit I don’t think its actually wiping out the data just unmounting the BTRFS subvols from the filesystem]
Just to provide some detail…
Source Machine
⸾ snap list | grep lxd
lxd 5.0.0-b0287c1 22923 latest/stable canonical* -
⸾ lxc stop test
⸾ lxc copy test target:test --mode=relay --storage=default
Destination Machine
⸾ snap list | grep lxd
lxd 5.0.0-b0287c1 22923 latest/stable canonical* -
⸾ lxc storage list
+---------+--------+----------+-------------+---------+---------+
| NAME | DRIVER | SOURCE | DESCRIPTION | USED BY | STATE |
+---------+--------+----------+-------------+---------+---------+
| default | btrfs | /srv/lxd | | 16 | CREATED |
+---------+--------+----------+-------------+---------+---------+
⸾ lxc storage info default
info:
description: ""
driver: btrfs
name: default
space used: 4.17TiB
total space: 10.92TiB
used by:
instances:
- ..
- ..
- test
- ..
- ..
profiles:
- default
⸾ lxc info test
Name: test
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/04/27 19:04 BST
Snapshots:
+--------------------+----------------------+------------+----------+
| NAME | TAKEN AT | EXPIRES AT | STATEFUL |
+--------------------+----------------------+------------+----------+
| 2022-03-01-monthly | 2022/03/01 00:16 GMT | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-01-monthly | 2022/04/01 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-03-weekly | 2022/04/03 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-10-weekly | 2022/04/10 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-17-weekly | 2022/04/17 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-21-daily | 2022/04/21 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-22-daily | 2022/04/22 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-23-daily | 2022/04/23 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-24-daily | 2022/04/24 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-24-weekly | 2022/04/24 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-25-daily | 2022/04/25 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-26-daily | 2022/04/26 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-27-daily | 2022/04/27 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
⸾ ll /var/snap/lxd/common/lxd/storage-pools/default/
total 8.0K
drwx--x--x 2 root root 4.0K Mar 22 2021 ./
drwx--x--x 3 root root 4.0K Apr 13 09:18 ../
⸾ ll /srv/lxd/containers/test
total 44K
d--x------ 1 root root 78 Apr 27 21:19 ./
drwx--x--x 1 root root 180 Apr 27 21:19 ../
-r-------- 1 root root 38K Apr 27 20:50 backup.yaml
-rw-r--r-- 1 root root 1.1K Aug 4 2020 metadata.yaml
drwxr-xr-x 1 1000000 1000000 174 Aug 29 2020 rootfs/
drwxr-xr-x 1 root root 186 Aug 4 2020 templates/
Source Machine
⸾ lxc start test
⸾ lxc copy test target:test --mode=relay --storage=default --refresh
Error: Error transferring instance data: Got error reading migration source: websocket: close 1006 (abnormal closure): unexpected EOF
Destination Machine
⸾ ll /srv/lxd/containers/test
⸾ ls: cannot access '/srv/lxd/containers/test': No such file or directory
⸾ lxc info test
Name: test
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/04/27 20:54 BST
Snapshots:
+--------------------+----------------------+------------+----------+
| NAME | TAKEN AT | EXPIRES AT | STATEFUL |
+--------------------+----------------------+------------+----------+
| 2022-03-01-monthly | 2022/03/01 00:16 GMT | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-01-monthly | 2022/04/01 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-03-weekly | 2022/04/03 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-10-weekly | 2022/04/10 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-17-weekly | 2022/04/17 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-21-daily | 2022/04/21 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-22-daily | 2022/04/22 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-23-daily | 2022/04/23 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-24-daily | 2022/04/24 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-24-weekly | 2022/04/24 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-25-daily | 2022/04/25 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-26-daily | 2022/04/26 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-27-daily | 2022/04/27 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
⸾ ll /srv/lxd/containers/test
⸾ ls: cannot access '/srv/lxd/containers/test': No such file or directory
⸾ lxc delete test
Error: Error deleting storage volume: Failed setting subvolume writable "/var/snap/lxd/common/lxd/storage-pools/default/containers/test": Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/default/containers/test ro false: ERROR: Could not get subvolume flags: Invalid argument
⸾ ll /srv/lxd/containers/test
total 4.0K
d--x------ 1 root root 22 Apr 27 21:29 ./
drwx--x--x 1 root root 180 Apr 27 21:29 ../
-r-------- 1 root root 2.6K Apr 27 21:30 backup.yaml
⸾ rm -fr /srv/lxd/containers/test
⸾ lxc delete test
@monstermunchkin it seems there are a few things going on here.
-
There is still an issue with btrfs --refresh which seems to not affect any new containers I make but affects remote copies of all 15 of my existing containers, perhaps related to this?
LXD init --preseed not working - Error: websocket: close 1006 (abnormal closure): unexpected EOF - #13 by stgraber -
There is an undesirable side effect the first time an attempt is made to delete a container which has no filesystem present (but lxc thinks it should be there) whereby lxc creates a folder in the correct storage pool location but then attempts to perform btrfs operations at the default path (likely because the metadata.yaml file is missing in the folder it just created) even if the container was never in the default location, there is no container currently in the default location and there’s not even a storage pool or btrfs filesystem at this location. I guess a recommendation would be to not create a folder in the path if its missing during delete operations, then it can just delete the container from the database as it does on the second attempt?
-
Its also possible to get the same issue with btrfs operations attempting to be performed on the default path (where there is no container, there never was a container and there’s not even a storage pool or btrfs filesystem) during a --refresh operation if the container has issues (I think maybe caused by point 1).
Hopefully this is helpful.
Sorry its taking so long to get decent info, the containers take ages to copy to my remote.
Hello,
I am unable to reproduce this on new container and even when doing multiple snaps. It only happens on existing container with snapshot attached to it and copying via refresh. @DanielBull Here’s my storage list and i am using block device on btrfs.
lxc@lxc2:~$ lxc storage list
+-----------+--------+----------+-------------+---------+---------+
| NAME | DRIVER | SOURCE | DESCRIPTION | USED BY | STATE |
+-----------+--------+----------+-------------+---------+---------+
| storpool1 | btrfs | /dev/vdb | | 52 | CREATED |
+-----------+--------+----------+-------------+---------+---------+
What I tried:
1.) Copy to remote
lxc@lxc2:~$ for i in `seq 141 143`
> do
> echo "Copying POLOVM153$i..."
> lxc copy POLOVM153$i lxcbackup:POLOVM153$i --refresh --verbose
> echo
> done
Copying POLOVM153141...
Error: Failed instance creation: Error transferring instance data: Got error reading migration source: websocket: close 1000 (normal)
Copying POLOVM153142...
Error: Failed instance creation: Error transferring instance data: Got error reading migration source: websocket: close 1000 (normal)
Copying POLOVM153143...
Error: User signaled us three times, exiting. The remote operation will keep running
2.) When deleting on the remote server, i got below in logs
time="2022-04-28T00:48:16Z" level=debug msg="Event listener server handler started" listener=54056dbd-baf6-49ab-abae-624e8e6558a5 local=/var/snap/lxd/common/lxd/unix.socket remote=@
time="2022-04-28T00:50:08Z" level=debug msg="Receiving volume" driver=btrfs name=POLOVM153143 path=/var/snap/lxd/common/lxd/storage-pools/storpool1/containers/POLOVM153143 pool=storpool1 receivePath=/var/snap/lxd/common/lxd/storage-pools/storpool1/containers/migration.866599223
time="2022-04-28T00:50:10Z" level=debug msg="CreateInstanceFromMigration finished" args="{POLOVM153143 map[] [drop18] {BTRFS [migration_header header_subvolumes header_subvolume_uuids]} true true false 0 }" instance=POLOVM153143 project=default
time="2022-04-28T00:52:03Z" level=debug msg="DeleteInstanceSnapshot finished" instance=POLOVM153141/drop17.5 project=default
time="2022-04-28T00:52:03Z" level=debug msg="UpdateInstanceBackupFile started" instance=POLOVM153141 project=default
time="2022-04-28T00:52:03Z" level=info msg="Deleted container" created="2021-10-15 09:01:06 +0000 UTC" ephemeral=false instance=POLOVM153141/drop17.5 instanceType=container project=default used="0001-01-01 00:00:00 +0000 UTC"
time="2022-04-28T00:52:03Z" level=info msg="Action: instance-snapshot-deleted, Source: /1.0/instances/POLOVM153141/snapshots/drop17.5"
time="2022-04-28T00:52:03Z" level=debug msg="UpdateInstanceBackupFile finished" instance=POLOVM153141 project=default
time="2022-04-28T00:52:03Z" level=debug msg="DeleteInstance started" instance=POLOVM153141 project=default
time="2022-04-28T00:52:03Z" level=debug msg="Deleting instance volume" instance=POLOVM153141 project=default volName=POLOVM153141
time="2022-04-28T00:52:03Z" level=debug msg="DeleteInstance finished" instance=POLOVM153141 project=default
time="2022-04-28T00:52:03Z" level=debug msg="Failure for task operation: 01b5dc4f-63b0-41b2-901c-4a30f0f1c7b1: Error deleting storage volume: Failed setting subvolume writable \"/var/snap/lxd/common/lxd/storage-pools/storpool1/containers/POLOVM153141\": Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/storpool1/containers/POLOVM153141 ro false: ERROR: Could not get subvolume flags: Invalid argument"
time="2022-04-28T00:52:03Z" level=info msg="ID: 01b5dc4f-63b0-41b2-901c-4a30f0f1c7b1, Class: task, Description: Deleting instance" CreatedAt="2022-04-28 00:47:54.308997761 +0000 UTC" Err="Error deleting storage volume: Failed setting subvolume writable \"/var/snap/lxd/common/lxd/storage-pools/storpool1/containers/POLOVM153141\": Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/storpool1/containers/POLOVM153141 ro false: ERROR: Could not get subvolume flags: Invalid argument" Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/containers/POLOVM153141] instances:[/1.0/instances/POLOVM153141]]" Status=Failure StatusCode=Failure UpdatedAt="2022-04-28 00:47:54.308997761 +0000 UTC"
time="2022-04-28T00:52:03Z" level=debug msg="Event listener server handler stopped" listener=55a91c3a-d699-49e7-893b-670ed543f969 local=/var/snap/lxd/common/lxd/unix.socket remote=@
3.) On remote, i removed the container manually in /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/storpool1/containers and issue lxc delete -f [container_name] which worked
4.) Copied the container again using step 2 and it worked without issues
I can probably share the full logs if needed. Or let me know if you need me to do specific tests or information needed so i can reproduce it. Thanks in advance.
I’m able to reproduce this:
- lxc init images:alpine/edge c1
- lxc copy c1 l2:c1
- lxc snapshot c1
- lxc snapshot c1
- lxc copy c1 l2:c1 --refresh
- lxc copy c1 l2:c1 --refresh
On the source I get the following message. Also, LXD crashes.
Error: Failed instance creation: Error transferring instance data: Got error reading migration source: websocket: close 1000 (normal)
When trying to delete the target container (l2:c1), I get:
Error: Error deleting storage volume: Failed setting subvolume writable "/home/thomas/work/lxd/server/storage-pools/btrfs/containers/c1": Failed to run: btrfs property set -f -ts /home/thomas/work/lxd/server/storage-pools/btrfs/containers/c1 ro false: ERROR: Could not get subvolume flags: Invalid argument
That is because the subvolume c1
does not exist on the target anymore, as LXD deletes it before receiving it again. That’s not ideal in an error case. Perhaps we should make a copy of this subvolume before deleting it. In case of an error, we can then restore the old subvolume.
Some more information on this…
I’m now getting this BTRFS attempting to use the wrong path error when deleting regular snapshots on the source machine.
Basic container with snapshorts
⸾ lxc info test
Name: test
Status: RUNNING
Type: container
Architecture: x86_64
PID: 5719
Created: 2021/11/03 20:46 GMT
Last Used: 2022/04/21 02:03 BST
Resources:
Processes: 66
CPU usage:
CPU usage (in seconds): 8880
Memory usage:
Memory (current): 621.41MiB
Memory (peak): 1.37GiB
Network usage:
eth0:
Type: broadcast
State: UP
Host interface: veth220d516d
MAC address: ......
MTU: 1500
Bytes received: 1.39GB
Bytes sent: 3.23GB
Packets received: 3876266
Packets sent: 2170725
IP addresses:
inet: ......
inet6: ......
lo:
Type: loopback
State: UP
MTU: 65536
Bytes received: 3.51MB
Bytes sent: 3.51MB
Packets received: 42090
Packets sent: 42090
IP addresses:
inet: 127.0.0.1/8 (local)
inet6: ::1/128 (local)
Snapshots:
+--------------------+----------------------+------------+----------+
| NAME | TAKEN AT | EXPIRES AT | STATEFUL |
+--------------------+----------------------+------------+----------+
| 2022-03-01-monthly | 2022/03/01 00:16 GMT | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-01-monthly | 2022/04/01 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-10-weekly | 2022/04/10 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-17-weekly | 2022/04/17 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-24-weekly | 2022/04/24 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-25-daily | 2022/04/25 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-26-daily | 2022/04/26 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-27-daily | 2022/04/27 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-28-daily | 2022/04/28 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-29-daily | 2022/04/29 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-04-30-daily | 2022/04/30 00:16 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-05-01-daily | 2022/05/01 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
| 2022-05-01-monthly | 2022/05/01 00:17 BST | | NO |
+--------------------+----------------------+------------+----------+
Two storage pools, one in the default location one on a disk array in /srv
⸾ lxc storage list
+---------+--------+------------------------------------------------+-------------+---------+---------+
| NAME | DRIVER | SOURCE | DESCRIPTION | USED BY | STATE |
+---------+--------+------------------------------------------------+-------------+---------+---------+
| default | btrfs | /var/snap/lxd/common/lxd/storage-pools/default | | 7 | CREATED |
+---------+--------+------------------------------------------------+-------------+---------+---------+
| srv | btrfs | /srv | | 11 | CREATED |
Container in question is on /srv
⸾ lxc storage info srv
info:
description: ""
driver: btrfs
name: srv
space used: 678.39GiB
total space: 1.82TiB
used by:
images:
- 779036dd7c484bfef5f747f53eaffd15b9f38a12f504bb07fc64b13059d815d2
instances:
- ..
- ..
- test
- ..
- ..
Check filesystem, its present and good
⸾ ll /srv/containers-snapshots/test/2022-03-01-monthly/
total 44K
d--x------+ 1 1000000 root 78 Nov 3 21:34 ./
drwx------ 1 root root 468 May 1 00:18 ../
-r-------- 1 root root 40K Mar 1 00:16 backup.yaml
-rw-r--r-- 1 root root 1.1K Sep 11 2018 metadata.yaml
drwxr-xr-x 1 1000000 1000000 152 Sep 28 2018 rootfs/
drwxr-xr-x 1 root root 186 Sep 11 2018 templates/
Attempt to delete
⸾ lxc delete test/2022-03-01-monthly
Error: Failed setting subvolume writable "/var/snap/lxd/common/lxd/storage-pools/srv/containers-snapshots/test/2022-03-01-monthly": Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/srv/containers-snapshots/test/2022-03-01-monthly ro false: ERROR: Could not set subvolume flags: Read-only file system
Wrong path is used?
Unless I’m misunderstanding how the path system works?
⸾ ll /var/snap/lxd/common/lxd/storage-pools/srv
total 0
drwx--x--x 1 root root 0 Nov 1 18:00 ./
drwx--x--x 1 root root 20 Nov 1 18:00 ../
Is it possible I’m just waiting on some already completed fixes?
⸾ snap list | grep lxd
lxd 5.0.0-b0287c1 22923 latest/stable canonical* -
So something to be aware of is that LXD will mount storage pools from other locations into its own storage-pools
directory, when using the snap package this will be at /var/snap/lxd/common/lxd/storage-pools/
. Additionally when using the snap package, that mount will not be directly visible on the host as it will be inside the snap package’s mount namespace.
You should be able to see it using:
sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- mount
Thanks Tom, I did wonder if I was missing something.
I’ve just run that command and I can’t see any applicable mounts but it does explain the incorrect paths.
So if we presume it is mounted, do we know what is triggering these “btrfs property set -ts” errors?
The work you guys do is great and I really appreciate it but unfortunately this new version has left me without offsite backups and unable to delete snapshots.
I’ve managed to clean up the target but this is the source.
All the containers seem to be working OK.
It looks like since I posted this my lxd version updated via snap and all the problems have gone away.
Thank you for you help…
There were a couple of recent optimized BTRFS refresh fixes that may have helped which are in LXD 5.1
I have been facing the very same situation for 3 days (both my local and remote hosts are running LXD 5.3 as a snap on Debian 11). So I am going to manually clean the install. But contrary to what @tomp said, the issue seems to still be there…
If it works on a fresh install it suggests the issue is now fixed, but that your system got into an unknown state due to the bug, that needed manual fixing.