on our IncusOS test server I set up an additional ZFS pool consisting of multiple disks in a zfs-raidz1 pool. I can see the state of the pool via storage API - no problem.
Is it possible to get some additional health data regarding the disks from that storage pool? E.g. SMART data or information from smartctl etc.? On our production servers we monitor our nvmes to see when they reach the max. TBW values and exchange before they fail. I know there is no SSH on access, but is there some additional (health) information about the disks via API?
We have some pretty basic (did tests pass) SMART data available in the storage API (drives list). We can certainly extend that to include other commonly useful information.
Otherwise, /1.0/metrics on the Incus API includes all of the node-exporter metrics for the system too, so assuming that node-exporter picked up the SMART data for your drives, it should be in there.
Yea, currently this is only some very basic information. I would be happy to see the following additions to be able to take preventive actions if necessary:
For nvme drives smartctl -a /dev/disk/by-id/NVME
power_on_hours
Data Units Read
Data Units Written
available_spare
percentage_used
For SSD smartctl -a /dev/disk/by-id/SSD
power_on_hours
Data Units Read
Data Units Written
available_spare
percentage_used
For HDDs smartctl -a /dev/disk/by-id/HDD
Raw_Read_Error_Rate
Seek_Error_Rate
Power_On_Hours
Would be great to see some of the values in future releases. Thanks.