Unable to repair bad disk using fsck

I had a VM that was acting strangely so I rebooted it. It no longer would come up using incus exec. So I used incus console. It dropped me into an initramfs. From there I tried to run fsck /dev/sda2 -y. It seems to see files and things to repair but at the end it says errors and it can’t write the changes. I’ve tried running fsck many times and can’t get past that.

11 ref count is 156, should be 135.  Fix? yes

Inode 1584842 ref count is 69, should be 56.  Fix? yes

Pass 5: Checking group summary information
Free blocks count wrong (11599040, counted=11602313).
Fix? yes

Free inodes count wrong (6998947, counted=7000323).
Fix? yes

Error writing file system info: Input/output error

rootfs: ***** FILE SYSTEM WAS MODIFIED *****
(initramfs)

Is there a way to repair the disk by accessing the drive using zfs? I can see the VM filesystem using zfs list but I can’t mount it (or I should say, I don’t know how to mount it).

I was actually trying to back this VM up right before this failure occurred, so I don’t have any backups, sadly.

Any suggestions on how to recover that drive?

You can use zfs set volmode=full XYZ/XYZ.block which will then make it show up under /dev/zvol/POOL/, you can then try your host’s fsck.ext4 against that to see if you’ve got any more luck.

You may also want to look at the dmesg output on the host as well as zpool status, it could be that you’re dealing with a ZFS error. Also should make sure that you’re not running out of disk space, either in the zfs pool or if your pool uses a loop backed device, on your host’s filesystem too.

1 Like

OK, I tried the zfs set... command, and the behavior of fsck.ext4 seems to be the same. It says:

Free inodes count wrong (6998947, counted=7000323).
Fix? yes

Error writing file system info: Input/output error

rootfs: ***** FILE SYSTEM WAS MODIFIED *****

I did as you suggested and tried looking at zpool status. It does indicate errors, but I don’t know what/if anything can be done. I’ve already tried zpool scrub and zpool clear but it does not appear that helped.

# zpool status
  pool: default2
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 02:56:15 with 133 errors on Sun Dec  1 00:13:09 2024
config:

	NAME                                                                  STATE     READ WRITE CKSUM
	default2                                                              ONLINE       0     0     0
	  /media/xrd/734204/737204/var/snap/lxd/common/lxd/disks/default.img  ONLINE       0     0 2.59K


@stgraber How do you tell if you are out of disk space? When I run df -h I see nothing even close to 100%, and zfs list does not show me anything that appears to indicate out of disk space. I am not sure if I understand what being out of disk space means for something like zfs/zpool. What is the correct zfs/zpool command to see disk space?

df -h /media/xrd/734204/737204/var/snap/lxd/common/lxd/disks/ shows that you’ve got plenty of space left?

Also note that if you’re using LXD, you probably should reach out to LXD support instead :slight_smile:

Thanks so much for this troubleshooting. I tried to move the storage device to a different pool. That failed. Then, I restarted, dropped into the initramfs, and ran fsck again. This time it worked. And, I was able to boot the device and backup the files. I’m not sure what I did differently. but I appreciate the support.