What is the right way to fix ext4 errors/corruptions on volumes on the Ceph storage?

After problem with my cluster I found that the 4 volumes (3 rootfs and one data volumes) in the 3 container instances have multiple errors like below (dmesq output from the incus cluster node):

[ 9055.177230] EXT4-fs error (device rbd7): ext4_lookup:1857: inode #786433: comm prometheus: deleted inode referenced: 786440
[ 9150.545493] EXT4-fs error (device rbd7): ext4_validate_block_bitmap:423: comm ext4lazyinit: bg 127: bad block bitmap checksum
[ 9152.366853] EXT4-fs error (device rbd7): ext4_lookup:1857: inode #786433: comm prometheus: deleted inode referenced: 786440
[ 9152.380677] EXT4-fs error (device rbd7): __ext4_find_entry:1693: inode #786568: comm prometheus: checksumming directory block 0
[ 9152.503874] EXT4-fs error (device rbd7): ext4_validate_inode_bitmap:105: comm prometheus: Corrupt inode bitmap - block_group = 96, inode_bitmap = 3145744
[ 9152.979961] EXT4-fs error (device rbd7): ext4_lookup:1857: inode #786433: comm prometheus: deleted inode referenced: 786440
[ 9152.991895] EXT4-fs error (device rbd7): __ext4_find_entry:1693: inode #786568: comm prometheus: checksumming directory block 0
[ 9157.503979] EXT4-fs error (device rbd7): ext4_validate_block_bitmap:423: comm kworker/u8:5: bg 111: bad block bitmap checksum
[ 9159.739104] EXT4-fs error (device rbd7): ext4_lookup:1857: inode #786433: comm prometheus: deleted inode referenced: 786440
[  266.401467] EXT4-fs error (device rbd2): ext4_validate_block_bitmap:423: comm incusd: bg 1: bad block bitmap checksum
[  266.412380] EXT4-fs error (device rbd2) in ext4_mb_clear_bb:6542: Filesystem failed CRC
[  266.423542] EXT4-fs error (device rbd2): ext4_validate_block_bitmap:423: comm incusd: bg 15: bad block bitmap checksum
[  268.710419] EXT4-fs error (device rbd2): ext4_validate_block_bitmap:423: comm ext4lazyinit: bg 49: bad block bitmap checksum
[  271.081652] EXT4-fs error (device rbd3): ext4_validate_block_bitmap:423: comm ext4lazyinit: bg 1: bad block bitmap checksum
[  273.589060] EXT4-fs error (device rbd7): ext4_validate_block_bitmap:423: comm ext4lazyinit: bg 127: bad block bitmap checksum
[  277.987183] EXT4-fs error (device rbd7): ext4_lookup:1857: inode #786433: comm prometheus: deleted inode referenced: 786440
[  278.004281] EXT4-fs error (device rbd7): __ext4_find_entry:1693: inode #786568: comm prometheus: checksumming directory block 0
[  278.131252] EXT4-fs error (device rbd7): ext4_validate_inode_bitmap:105: comm prometheus: Corrupt inode bitmap - block_group = 96, inode_bitmap = 3145744
[  278.531097] EXT4-fs error (device rbd3): ext4_validate_inode_bitmap:105: comm samba: Corrupt inode bitmap - block_group = 0, inode_bitmap = 137
[  278.550398] EXT4-fs error (device rbd3) in ext4_free_inode:362: Filesystem failed CRC
[  278.721024] EXT4-fs error (device rbd7): ext4_lookup:1857: inode #786433: comm prometheus: deleted inode referenced: 786440
[  278.747239] EXT4-fs error (device rbd7): __ext4_find_entry:1693: inode #786568: comm prometheus: checksumming directory block 0
[  281.316261] EXT4-fs error (device rbd3): ext4_lookup:1857: inode #16: comm samba: deleted inode referenced: 25
[  281.326675] EXT4-fs error (device rbd3): ext4_lookup:1857: inode #16: comm samba: deleted inode referenced: 25
[  282.974984] EXT4-fs error (device rbd7): ext4_validate_block_bitmap:423: comm kworker/u8:22: bg 111: bad block bitmap checksum
[  339.957217] EXT4-fs error (device rbd7): ext4_lookup:1857: inode #786433: comm prometheus: deleted inode referenced: 786440

All impacted rootfs and data volumes attached to the instances are located on the ceph storage (actaully microceph).

What is the right way to fix these filesystem errors?

Stop the instance, manually map the instance using rbd map and then run fsck.ext4 on the /dev/rbdX device.

Steps for fixing filesystem errors on ceph block device:

  1. Identify rbd image you need to repair. Use sudo rbd ls --pool <pool name for incus storage>.
  2. Stop the instance.
  3. Map image to the incus host using sudo rbd map <image name> --pool <pool name for incus storage>. Command will return a block device name /dev/rbdX
  4. Run e2fsck /dev/rbdX or 'fsck -t ext4 /dev/rbdX` and follow to instructions.
  5. Unmap image using sudo rbd unmap /dev/rbdX
  6. Start the instance.
1 Like