Btrfs storage pool reverted to /dev/loopX device on system restart(?!)

mingus · January 2, 2025, 8:48pm

Hopefully that was a clear enough title.
I have a 7 node incus cluster with a default btrfs storage pool per node created by incus admin init and extended as below by adding a storage device then removing the default loop mount:

btrfs device add /dev/mapper/ContainerGroup1-IncusVol /var/lib/incus/storage-pools/local1
btrfs device remove /dev/loop0 /var/lib/incus/storage-pools/local1

and this has worked fine on 6 hosts for around 6 months:

incus storage show local1 --target host-4
config:
  source: 385bd68f-7740-4848-8ce2-4319664bc0b3
  volatile.initial_source: /dev/mapper/ContainerGroup1-IncusVol
description: Default store. Can be extended
name: local1
driver: btrfs
used_by:
- /1.0/images/148785f452598166182874850047ff2244476425b8e98a234fc88afb6f5fbdb3?target=host-4
- /1.0/images/70b0411e6a7d29c2a6e3cb7e684d077d173fc07813dae9d22a351dedb796d108?target=host-4
- /1.0/instances/brean
- /1.0/instances/brean2
- /1.0/instances/bam-u2204c118-2
- /1.0/instances/instance-7279-g4
- /1.0/instances/instance-7279-g4-0
- /1.0/instances/instance-VE-4771-g4
- /1.0/instances/instance-VE-5342-g4
- /1.0/instances/instance-VE-5420-g4
- /1.0/instances/jf-benchmark
- /1.0/instances/trt8
- /1.0/profiles/cuda11_8
- /1.0/profiles/default
- /1.0/profiles/packer-base
- /1.0/profiles/ttest
- /1.0/profiles/bamboo
- /1.0/profiles/vs
- /1.0/storage-pools/local1/volumes/image/1351cc82f499466ce338276e18e321b2dd27f42385526a9c472d8c2015280e01?target=host-4
status: Created
locations:
- host-7
- host-1
- host-2
- host-3
- host-4
- host-5
- host-6

etc. but, after a platform reboot one of the nodes (and only one) is failing with the logs full of:

time="2025-01-02T16:58:02Z" level=error msg="Failed mounting storage pool" err="Failed to mount \"/dev/loop0\" on \"/var/lib/incus/storage-pools/local1\" using \"btrfs\": invalid argument" pool=local1
time="2025-01-02T16:59:12Z" level=error msg="Failed mounting storage pool" err="Failed to mount \"/dev/loop0\" on \"/var/lib/incus/storage-pools/local1\" using \"btrfs\": invalid argument" pool=local1

The error is correct, at least if I’m reading it correctly, there is no /dev/loop0 device in that btrfs filesystem:

btrfs filesystem show local1
Label: 'local1'  uuid: 2252cf3f-aa1e-425c-a318-3f5010cc6599
	Total devices 1 FS bytes used 304.25GiB
	devid    2 size 600.00GiB used 372.06GiB path /dev/mapper/ContainerGroup1-IncusVol

The correct device is mounted though! I have files under /var/lib/incus/storage-pools/local1 as expected. The incus database has (actually had) the original image file location as the source for the pool on that node (node-id 7):

+----+-----------------+---------+-------------------------+--------------------------------------+
| id | storage_pool_id | node_id |           key           |                value                 |
+----+-----------------+---------+-------------------------+--------------------------------------+
| 9  | 1               | 5       | source                  | 385bd68f-7740-4848-8ce2-4319664bc0b3 |
| 10 | 1               | 5       | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 11 | 1               | 3       | source                  | /dev/mapper/ContainerGroup1-IncusVol |
| 12 | 1               | 3       | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 13 | 1               | 2       | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 14 | 1               | 2       | source                  | /dev/mapper/ContainerGroup1-IncusVol |
| 15 | 1               | 4       | source                  | 047ce48b-719c-4dd2-84be-2dc2938b9e25 |
| 16 | 1               | 4       | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 21 | 1               | 7       | size                    | 30GiB                                |
| 22 | 1               | 7       | source                  | /var/lib/incus/disks/local1.img      |
| 31 | 1               | 12      | source                  | 3a322826-457a-4b74-adbe-ae380fbd1117 |
| 32 | 1               | 12      | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 38 | 1               | 14      | source                  | fbb501f4-03d1-4297-aabc-0b0ac7408b72 |
| 39 | 1               | 14      | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
+----+-----------------+---------+-------------------------+--------------------------------------+

I manually set those to the correct uuid and logical volume and restarted but the error message persists - even though the database now has:

incus admin sql global "SELECT * FROM storage_pools_config;"
+----+-----------------+---------+-------------------------+--------------------------------------+
| id | storage_pool_id | node_id |           key           |                value                 |
+----+-----------------+---------+-------------------------+--------------------------------------+
| 9  | 1               | 5       | source                  | 385bd68f-7740-4848-8ce2-4319664bc0b3 |
| 10 | 1               | 5       | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 11 | 1               | 3       | source                  | /dev/mapper/ContainerGroup1-IncusVol |
| 12 | 1               | 3       | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 13 | 1               | 2       | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 14 | 1               | 2       | source                  | /dev/mapper/ContainerGroup1-IncusVol |
| 15 | 1               | 4       | source                  | 047ce48b-719c-4dd2-84be-2dc2938b9e25 |
| 16 | 1               | 4       | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 21 | 1               | 7       | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 22 | 1               | 7       | source                  | 2252cf3f-aa1e-425c-a318-3f5010cc6599 |
| 31 | 1               | 12      | source                  | 3a322826-457a-4b74-adbe-ae380fbd1117 |
| 32 | 1               | 12      | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
| 38 | 1               | 14      | source                  | fbb501f4-03d1-4297-aabc-0b0ac7408b72 |
| 39 | 1               | 14      | volatile.initial_source | /dev/mapper/ContainerGroup1-IncusVol |
+----+-----------------+---------+-------------------------+--------------------------------------+

Any pointers on where to look (I have no idea where the loop0 device setting is coming from) gratefully received!
If it’s relevant I’m running incus 6.0.2 everywhere.