Deleting container throws error ERROR: Could not get subvolume flags: Invalid argument

@depam what are the steps to reproduce this?

Hi @monstermunchkin,

I had the error first time but i am able to repro when trying to delete a big container. But easily reproducible on my end. What worked for me is to remove container on /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/storpool1/containers.

Hi @monstermunchkin , sorry after reading the other thread I realized we someone have same case. Containers with snapshots on it. At first i tried to do lxc copy [container] remote:[container] --refresh but it failed with Error: Error transferring instance data: Got error reading source.

I then tried to delete the container and decided to make it clean copy everything with snapshots but got the reported error.

@depam what are the exact steps to reproduce this issue? (from container creation to error)

As discussed on the topic Tom linked this affects delete and copy operations.
However so far I am unable to reproduce on freshly created containers or copies of those freshly created containers even when a snapshot has been applied.

OK I think I sussed out what the issue is (at least in my setup).

The path lxc is attempting to use in the btrfs command (as shown in the error) is:
/var/snap/lxd/common/lxd/storage-pools/default/containers/container

However the correct path on my system is:
/srv/lxd/containers/container

As can be seen here:

lxc storage list
+---------+--------+----------+-------------+---------+---------+
|  NAME   | DRIVER |  SOURCE  | DESCRIPTION | USED BY |  STATE  |
+---------+--------+----------+-------------+---------+---------+
| default | btrfs  | /srv/lxd |             | 16      | CREATED |
+---------+--------+----------+-------------+---------+---------+

/var/snap/lxd/common/lxd/storage-pools/default/ exists but is blank and isn’t even on a btrfs filesystem.

At first I thought maybe the path could be wrong as it was copied from the containers original server but I checked and that was on /srv as well.

@depam can you confirm when you got the error the path used in the command was also wrong?

@monstermunchkin where does it get this path from?
Is it stored in the containers metadata some how?
I’m wondering as it only seems to happen with my older containers not anything I newly create, which is why I’m guessing you can’t reproduce it?

OK here we go, I’ve got closer, this is reproducible with all 15 of my containers.

It seems the error I was missing/dismissing is:
Error: Error transferring instance data: Got error reading migration source: websocket: close 1006 (abnormal closure): unexpected EOF

This appears to completely wipe out the container in the filesystem on the target but not the entry in the lxd database, thus triggering the lxc delete command to use the wrong path. [Edit I don’t think its actually wiping out the data just unmounting the BTRFS subvols from the filesystem]

Just to provide some detail…

Source Machine

⸾ snap list | grep lxd
lxd     5.0.0-b0287c1  22923  latest/stable  canonical*    -
⸾ lxc stop test
⸾ lxc copy test target:test --mode=relay --storage=default

Destination Machine

⸾ snap list | grep lxd
lxd     5.0.0-b0287c1  22923  latest/stable  canonical*    -
⸾ lxc storage list
+---------+--------+----------+-------------+---------+---------+
|  NAME   | DRIVER |  SOURCE  | DESCRIPTION | USED BY |  STATE  |
+---------+--------+----------+-------------+---------+---------+
| default | btrfs  | /srv/lxd |             | 16      | CREATED |
+---------+--------+----------+-------------+---------+---------+
⸾ lxc storage info default
info:
description: ""
driver: btrfs
name: default
space used: 4.17TiB
total space: 10.92TiB
used by:
instances:
- ..
- ..
- test
- ..
- ..
profiles:
- default
⸾ lxc info test
Name: test
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/04/27 19:04 BST

Snapshots:
+--------------------+----------------------+------------+----------+
|        NAME        |       TAKEN AT       | EXPIRES AT | STATEFUL |
+--------------------+----------------------+------------+----------+
| 2022-03-01-monthly | 2022/03/01 00:16 GMT |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-01-monthly | 2022/04/01 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-03-weekly  | 2022/04/03 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-10-weekly  | 2022/04/10 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-17-weekly  | 2022/04/17 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-21-daily   | 2022/04/21 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-22-daily   | 2022/04/22 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-23-daily   | 2022/04/23 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-24-daily   | 2022/04/24 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-24-weekly  | 2022/04/24 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-25-daily   | 2022/04/25 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-26-daily   | 2022/04/26 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-27-daily   | 2022/04/27 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
⸾ ll /var/snap/lxd/common/lxd/storage-pools/default/
total 8.0K
drwx--x--x 2 root root 4.0K Mar 22  2021 ./
drwx--x--x 3 root root 4.0K Apr 13 09:18 ../
⸾ ll /srv/lxd/containers/test
total 44K
d--x------ 1 root    root      78 Apr 27 21:19 ./
drwx--x--x 1 root    root     180 Apr 27 21:19 ../
-r-------- 1 root    root     38K Apr 27 20:50 backup.yaml
-rw-r--r-- 1 root    root    1.1K Aug  4  2020 metadata.yaml
drwxr-xr-x 1 1000000 1000000  174 Aug 29  2020 rootfs/
drwxr-xr-x 1 root    root     186 Aug  4  2020 templates/

Source Machine

⸾ lxc start test
⸾ lxc copy test target:test --mode=relay --storage=default --refresh
Error: Error transferring instance data: Got error reading migration source: websocket: close 1006 (abnormal closure): unexpected EOF

Destination Machine

⸾ ll /srv/lxd/containers/test
⸾ ls: cannot access '/srv/lxd/containers/test': No such file or directory
⸾ lxc info test
Name: test
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2022/04/27 20:54 BST

Snapshots:
+--------------------+----------------------+------------+----------+
|        NAME        |       TAKEN AT       | EXPIRES AT | STATEFUL |
+--------------------+----------------------+------------+----------+
| 2022-03-01-monthly | 2022/03/01 00:16 GMT |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-01-monthly | 2022/04/01 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-03-weekly  | 2022/04/03 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-10-weekly  | 2022/04/10 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-17-weekly  | 2022/04/17 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-21-daily   | 2022/04/21 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-22-daily   | 2022/04/22 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-23-daily   | 2022/04/23 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-24-daily   | 2022/04/24 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-24-weekly  | 2022/04/24 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-25-daily   | 2022/04/25 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-26-daily   | 2022/04/26 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-27-daily   | 2022/04/27 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
⸾ ll /srv/lxd/containers/test
⸾ ls: cannot access '/srv/lxd/containers/test': No such file or directory
⸾ lxc delete test
Error: Error deleting storage volume: Failed setting subvolume writable "/var/snap/lxd/common/lxd/storage-pools/default/containers/test": Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/default/containers/test ro false: ERROR: Could not get subvolume flags: Invalid argument
⸾ ll /srv/lxd/containers/test
total 4.0K
d--x------ 1 root root   22 Apr 27 21:29 ./
drwx--x--x 1 root root  180 Apr 27 21:29 ../
-r-------- 1 root root 2.6K Apr 27 21:30 backup.yaml
⸾ rm -fr /srv/lxd/containers/test
⸾ lxc delete test

@monstermunchkin it seems there are a few things going on here.

  1. There is still an issue with btrfs --refresh which seems to not affect any new containers I make but affects remote copies of all 15 of my existing containers, perhaps related to this?
    LXD init --preseed not working - Error: websocket: close 1006 (abnormal closure): unexpected EOF - #13 by stgraber

  2. There is an undesirable side effect the first time an attempt is made to delete a container which has no filesystem present (but lxc thinks it should be there) whereby lxc creates a folder in the correct storage pool location but then attempts to perform btrfs operations at the default path (likely because the metadata.yaml file is missing in the folder it just created) even if the container was never in the default location, there is no container currently in the default location and there’s not even a storage pool or btrfs filesystem at this location. I guess a recommendation would be to not create a folder in the path if its missing during delete operations, then it can just delete the container from the database as it does on the second attempt?

  3. Its also possible to get the same issue with btrfs operations attempting to be performed on the default path (where there is no container, there never was a container and there’s not even a storage pool or btrfs filesystem) during a --refresh operation if the container has issues (I think maybe caused by point 1).

Hopefully this is helpful.
Sorry its taking so long to get decent info, the containers take ages to copy to my remote.

Hello,

I am unable to reproduce this on new container and even when doing multiple snaps. It only happens on existing container with snapshot attached to it and copying via refresh. @DanielBull Here’s my storage list and i am using block device on btrfs.

lxc@lxc2:~$ lxc storage list
+-----------+--------+----------+-------------+---------+---------+
|   NAME    | DRIVER |  SOURCE  | DESCRIPTION | USED BY |  STATE  |
+-----------+--------+----------+-------------+---------+---------+
| storpool1 | btrfs  | /dev/vdb |             | 52      | CREATED |
+-----------+--------+----------+-------------+---------+---------+

What I tried:

1.) Copy to remote

lxc@lxc2:~$ for i in `seq 141 143`
> do
> echo "Copying POLOVM153$i..."
> lxc copy POLOVM153$i lxcbackup:POLOVM153$i --refresh --verbose
> echo
> done
Copying POLOVM153141...
Error: Failed instance creation: Error transferring instance data: Got error reading migration source: websocket: close 1000 (normal)

Copying POLOVM153142...
Error: Failed instance creation: Error transferring instance data: Got error reading migration source: websocket: close 1000 (normal)

Copying POLOVM153143...
Error: User signaled us three times, exiting. The remote operation will keep running

2.) When deleting on the remote server, i got below in logs

time="2022-04-28T00:48:16Z" level=debug msg="Event listener server handler started" listener=54056dbd-baf6-49ab-abae-624e8e6558a5 local=/var/snap/lxd/common/lxd/unix.socket remote=@
time="2022-04-28T00:50:08Z" level=debug msg="Receiving volume" driver=btrfs name=POLOVM153143 path=/var/snap/lxd/common/lxd/storage-pools/storpool1/containers/POLOVM153143 pool=storpool1 receivePath=/var/snap/lxd/common/lxd/storage-pools/storpool1/containers/migration.866599223
time="2022-04-28T00:50:10Z" level=debug msg="CreateInstanceFromMigration finished" args="{POLOVM153143  map[] [drop18] {BTRFS [migration_header header_subvolumes header_subvolume_uuids]} true true false 0 }" instance=POLOVM153143 project=default
time="2022-04-28T00:52:03Z" level=debug msg="DeleteInstanceSnapshot finished" instance=POLOVM153141/drop17.5 project=default
time="2022-04-28T00:52:03Z" level=debug msg="UpdateInstanceBackupFile started" instance=POLOVM153141 project=default
time="2022-04-28T00:52:03Z" level=info msg="Deleted container" created="2021-10-15 09:01:06 +0000 UTC" ephemeral=false instance=POLOVM153141/drop17.5 instanceType=container project=default used="0001-01-01 00:00:00 +0000 UTC"
time="2022-04-28T00:52:03Z" level=info msg="Action: instance-snapshot-deleted, Source: /1.0/instances/POLOVM153141/snapshots/drop17.5"
time="2022-04-28T00:52:03Z" level=debug msg="UpdateInstanceBackupFile finished" instance=POLOVM153141 project=default
time="2022-04-28T00:52:03Z" level=debug msg="DeleteInstance started" instance=POLOVM153141 project=default
time="2022-04-28T00:52:03Z" level=debug msg="Deleting instance volume" instance=POLOVM153141 project=default volName=POLOVM153141
time="2022-04-28T00:52:03Z" level=debug msg="DeleteInstance finished" instance=POLOVM153141 project=default
time="2022-04-28T00:52:03Z" level=debug msg="Failure for task operation: 01b5dc4f-63b0-41b2-901c-4a30f0f1c7b1: Error deleting storage volume: Failed setting subvolume writable \"/var/snap/lxd/common/lxd/storage-pools/storpool1/containers/POLOVM153141\": Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/storpool1/containers/POLOVM153141 ro false: ERROR: Could not get subvolume flags: Invalid argument"
time="2022-04-28T00:52:03Z" level=info msg="ID: 01b5dc4f-63b0-41b2-901c-4a30f0f1c7b1, Class: task, Description: Deleting instance" CreatedAt="2022-04-28 00:47:54.308997761 +0000 UTC" Err="Error deleting storage volume: Failed setting subvolume writable \"/var/snap/lxd/common/lxd/storage-pools/storpool1/containers/POLOVM153141\": Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/storpool1/containers/POLOVM153141 ro false: ERROR: Could not get subvolume flags: Invalid argument" Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/containers/POLOVM153141] instances:[/1.0/instances/POLOVM153141]]" Status=Failure StatusCode=Failure UpdatedAt="2022-04-28 00:47:54.308997761 +0000 UTC"
time="2022-04-28T00:52:03Z" level=debug msg="Event listener server handler stopped" listener=55a91c3a-d699-49e7-893b-670ed543f969 local=/var/snap/lxd/common/lxd/unix.socket remote=@

3.) On remote, i removed the container manually in /var/snap/lxd/common/mntns/var/snap/lxd/common/lxd/storage-pools/storpool1/containers and issue lxc delete -f [container_name] which worked

4.) Copied the container again using step 2 and it worked without issues

I can probably share the full logs if needed. Or let me know if you need me to do specific tests or information needed so i can reproduce it. Thanks in advance.

I’m able to reproduce this:

  1. lxc init images:alpine/edge c1
  2. lxc copy c1 l2:c1
  3. lxc snapshot c1
  4. lxc snapshot c1
  5. lxc copy c1 l2:c1 --refresh
  6. lxc copy c1 l2:c1 --refresh

On the source I get the following message. Also, LXD crashes.

Error: Failed instance creation: Error transferring instance data: Got error reading migration source: websocket: close 1000 (normal)

When trying to delete the target container (l2:c1), I get:

Error: Error deleting storage volume: Failed setting subvolume writable "/home/thomas/work/lxd/server/storage-pools/btrfs/containers/c1": Failed to run: btrfs property set -f -ts /home/thomas/work/lxd/server/storage-pools/btrfs/containers/c1 ro false: ERROR: Could not get subvolume flags: Invalid argument

That is because the subvolume c1 does not exist on the target anymore, as LXD deletes it before receiving it again. That’s not ideal in an error case. Perhaps we should make a copy of this subvolume before deleting it. In case of an error, we can then restore the old subvolume.

Some more information on this…
I’m now getting this BTRFS attempting to use the wrong path error when deleting regular snapshots on the source machine.

Basic container with snapshorts

⸾ lxc info test
Name: test
Status: RUNNING
Type: container
Architecture: x86_64
PID: 5719
Created: 2021/11/03 20:46 GMT
Last Used: 2022/04/21 02:03 BST

Resources:
Processes: 66
CPU usage:
CPU usage (in seconds): 8880
Memory usage:
Memory (current): 621.41MiB
Memory (peak): 1.37GiB
Network usage:
eth0:
Type: broadcast
State: UP
Host interface: veth220d516d
MAC address: ......
MTU: 1500
Bytes received: 1.39GB
Bytes sent: 3.23GB
Packets received: 3876266
Packets sent: 2170725
IP addresses:
inet:  ......
inet6: ......
lo:
Type: loopback
State: UP
MTU: 65536
Bytes received: 3.51MB
Bytes sent: 3.51MB
Packets received: 42090
Packets sent: 42090
IP addresses:
inet:  127.0.0.1/8 (local)
inet6: ::1/128 (local)

Snapshots:
+--------------------+----------------------+------------+----------+
|        NAME        |       TAKEN AT       | EXPIRES AT | STATEFUL |
+--------------------+----------------------+------------+----------+
| 2022-03-01-monthly | 2022/03/01 00:16 GMT |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-01-monthly | 2022/04/01 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-10-weekly  | 2022/04/10 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-17-weekly  | 2022/04/17 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-24-weekly  | 2022/04/24 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-25-daily   | 2022/04/25 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-26-daily   | 2022/04/26 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-27-daily   | 2022/04/27 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-28-daily   | 2022/04/28 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-29-daily   | 2022/04/29 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-04-30-daily   | 2022/04/30 00:16 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-05-01-daily   | 2022/05/01 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+
| 2022-05-01-monthly | 2022/05/01 00:17 BST |            | NO       |
+--------------------+----------------------+------------+----------+

Two storage pools, one in the default location one on a disk array in /srv

⸾ lxc storage list
+---------+--------+------------------------------------------------+-------------+---------+---------+
|  NAME   | DRIVER |                     SOURCE                     | DESCRIPTION | USED BY |  STATE  |
+---------+--------+------------------------------------------------+-------------+---------+---------+
| default | btrfs  | /var/snap/lxd/common/lxd/storage-pools/default |             | 7       | CREATED |
+---------+--------+------------------------------------------------+-------------+---------+---------+
| srv     | btrfs  | /srv                                           |             | 11      | CREATED |

Container in question is on /srv

⸾ lxc storage info srv
info:
description: ""
driver: btrfs
name: srv
space used: 678.39GiB
total space: 1.82TiB
used by:
images:
- 779036dd7c484bfef5f747f53eaffd15b9f38a12f504bb07fc64b13059d815d2
instances:
- ..
- ..
- test
- ..
- ..

Check filesystem, its present and good

⸾ ll /srv/containers-snapshots/test/2022-03-01-monthly/
total 44K
d--x------+ 1 1000000 root      78 Nov  3 21:34 ./
drwx------  1 root    root     468 May  1 00:18 ../
-r--------  1 root    root     40K Mar  1 00:16 backup.yaml
-rw-r--r--  1 root    root    1.1K Sep 11  2018 metadata.yaml
drwxr-xr-x  1 1000000 1000000  152 Sep 28  2018 rootfs/
drwxr-xr-x  1 root    root     186 Sep 11  2018 templates/

Attempt to delete

⸾ lxc delete test/2022-03-01-monthly
Error: Failed setting subvolume writable "/var/snap/lxd/common/lxd/storage-pools/srv/containers-snapshots/test/2022-03-01-monthly": Failed to run: btrfs property set -ts /var/snap/lxd/common/lxd/storage-pools/srv/containers-snapshots/test/2022-03-01-monthly ro false: ERROR: Could not set subvolume flags: Read-only file system

Wrong path is used?
Unless I’m misunderstanding how the path system works?

⸾ ll /var/snap/lxd/common/lxd/storage-pools/srv
total 0
drwx--x--x 1 root root  0 Nov  1 18:00 ./
drwx--x--x 1 root root 20 Nov  1 18:00 ../

Is it possible I’m just waiting on some already completed fixes?

⸾ snap list | grep lxd
lxd     5.0.0-b0287c1  22923  latest/stable  canonical*    -

So something to be aware of is that LXD will mount storage pools from other locations into its own storage-pools directory, when using the snap package this will be at /var/snap/lxd/common/lxd/storage-pools/. Additionally when using the snap package, that mount will not be directly visible on the host as it will be inside the snap package’s mount namespace.

You should be able to see it using:

sudo nsenter --mount=/run/snapd/ns/lxd.mnt -- mount

Thanks Tom, I did wonder if I was missing something.
I’ve just run that command and I can’t see any applicable mounts but it does explain the incorrect paths.

So if we presume it is mounted, do we know what is triggering these “btrfs property set -ts” errors?
The work you guys do is great and I really appreciate it but unfortunately this new version has left me without offsite backups and unable to delete snapshots.

@monstermunchkin are you able to help with the clean up of a failed btrfs optimized sync? Thanks

I’ve managed to clean up the target but this is the source.
All the containers seem to be working OK.

It looks like since I posted this my lxd version updated via snap and all the problems have gone away.
Thank you for you help…

There were a couple of recent optimized BTRFS refresh fixes that may have helped which are in LXD 5.1

I have been facing the very same situation for 3 days (both my local and remote hosts are running LXD 5.3 as a snap on Debian 11). So I am going to manually clean the install. But contrary to what @tomp said, the issue seems to still be there…

If it works on a fresh install it suggests the issue is now fixed, but that your system got into an unknown state due to the bug, that needed manual fixing.

If you still have issues please file a bug over at Issues · lxc/lxd · GitHub