Incus export container with a snapshot not working: "structure needs cleaning"

Hello,

we have one container that we try to take a snapshot and then export.
It is not working, like ext4 FS of the snapshot is having problems.
We tried many times (with the container running and stopped).

Below are the commands and the error we are getting.

incus snapshot create CONTAINERNAME snapforbackup

incus export CONTAINERNAME /mnt/foo/CONTAINERNAME.tar --compression=none
Error: Create backup: Backup create: Failed to mount LVM snapshot volume: Failed to mount "/dev/default/containers_CONTAINERNAME-snapforbackup" on "/var/lib/incus/storage-pools/default/containers-snapshots/CONTAINERNAME/snapforbackup" using "ext4": structure needs cleaning

If we look at dmesg output while the incus export is running, we see many messages like those below:

kernel: EXT4-fs (dm-7): recovery complete
kernel: EXT4-fs error (device dm-7): ext4_mark_recovery_complete:6249: comm incusd: Orphan file not empty on read-only fs.
kernel: EXT4-fs (dm-7): mount failed
kernel: EXT4-fs (dm-7): write access unavailable, skipping orphan cleanup
kernel: EXT4-fs (dm-7): recovery complete
kernel: EXT4-fs error (device dm-7): ext4_mark_recovery_complete:6249: comm incusd: Orphan file not empty on read-only fs.
kernel: EXT4-fs (dm-7): mount failed
kernel: EXT4-fs (dm-7): write access unavailable, skipping orphan cleanup
kernel: EXT4-fs (dm-7): recovery complete
kernel: EXT4-fs error (device dm-7): ext4_mark_recovery_complete:6249: comm incusd: Orphan file not empty on read-only fs.
kernel: EXT4-fs (dm-7): mount failed

We are using LVM storage.

Is this something known?
Any ideas on how to fix this?

Thank you very much.

This looks like an issue with your LV or underlying disk.
First you’ll want to make sure that your VG still has some space available as it running out of space could cause issues like those.

If it has plenty of free space, then you’ll likely need to stop the container, manually activate the LV and run fsck.ext4 against it, that may find some issues which will need to be fixed, at which point making a new snapshot and export should work.

It’s not impossible that your LV is currently in good shape and you just had some bad luck on snapshot timing, basically creating the snapshot when some data was in-flight, but LVM has some logic to usually prevent that from happening.

We have space on the VG (first thing to check), there is plenty.

We did fsck -fy on the volume ( incus stop CONTAINERNAME; lvchange -ay -Ky default/containers_CONTAINERNAME; fsck -fy /dev/default/containers_CONTAINERNAME )
We get the same errors afterwards on retry.
(actually my first message here was after fsck-ing the lv of the snapshot)

What we did and seems working so far without this problem is the following:

incus copy CONTAINERNAME CONTAINERNAME-COPY
incus export CONTAINERNAME-COPY ...

This time export was OK.
Afterwards we tried incus import using the exported .tar and the imported container seems to be working fine.

After the above, we also tried to keep the copy as original and try snapshot on this one, but again we get the same errors.

We also have no errors on incus export when there is no snapshot taken before.

So seems like the problem is always when we get a snapshot (and export is trying to mount the snapshot).

Thank you.

I wonder if ext4 somehow introduced a property similar to xfs where it effectively prevents two concurrent mount of the same fsid even if they’re distinct superblocks, though if that was the case, you’d really expect a better error…

Can you try incus copy CONTAINERNAME/SNAPSHOT CONTAINERNAME-COPY and then incus export CONTAINERNAME-COPY? Basically the goal is to see if the snapshot becomes fine when it is actually copied into a new instance and can be written to.

Just tried that and incus export completed without any errors this way!

(note: this is incus running on Fedora 40 on kernel 6.8.7-300.fc40.x86_64 / incus version 6.0.0-0.1.fc40)