Disk health information in IncusOS?

Hi,

on our IncusOS test server I set up an additional ZFS pool consisting of multiple disks in a zfs-raidz1 pool. I can see the state of the pool via storage API - no problem.
Is it possible to get some additional health data regarding the disks from that storage pool? E.g. SMART data or information from smartctl etc.? On our production servers we monitor our nvmes to see when they reach the max. TBW values and exchange before they fail. I know there is no SSH on access, but is there some additional (health) information about the disks via API?

Thanks.

We have some pretty basic (did tests pass) SMART data available in the storage API (drives list). We can certainly extend that to include other commonly useful information.

Otherwise, /1.0/metrics on the Incus API includes all of the node-exporter metrics for the system too, so assuming that node-exporter picked up the SMART data for your drives, it should be in there.

Yea, currently this is only some very basic information. I would be happy to see the following additions to be able to take preventive actions if necessary:

For nvme drives
smartctl -a /dev/disk/by-id/NVME

  • power_on_hours
  • Data Units Read
  • Data Units Written
  • available_spare
  • percentage_used

For SSD
smartctl -a /dev/disk/by-id/SSD

  • power_on_hours
  • Data Units Read
  • Data Units Written
  • available_spare
  • percentage_used

For HDDs
smartctl -a /dev/disk/by-id/HDD

  • Raw_Read_Error_Rate
  • Seek_Error_Rate
  • Power_On_Hours

Would be great to see some of the values in future releases. Thanks.

1 Like