I believe I’ve tracked down what’s causing this to happen, and I suspect it’s related to the holes in my metrics I’m seeing. If I graph node_scrape_collector_duration_seconds I’m seeing regular jumps for the hwmon collector to 2.5+ seconds, which I suspect the timeout is causing the conneciton to be aborted. I’m not sure weather this should be filed against Incus or IncusOS, so I’m making this topic instead.
Yeah, looking at the code, Incus expects node-exporter to provide the beginning of a response (HTTP headers) within 3s of the request starting.
Is there a reason for the timeout at all? Is the metrics gathering time-senstiive or similar?
Yeah, the metrics gathering happens whenever we’re being scraped.
We often see scrape intervals of around 30s and support a minimum of 15s.
That means we need to be able to return a full set within 15s and the Incus side of it can take a little while, so allowing 5s here is actually pretty generous ![]()
If we see that this still doesn’t work, we’ll have to go for something fancier like trying to pull that data in parallel to Incus putting together its own metrics, effectively giving it closer to 12s as a timeout.
I’m pretty sure that node-exporter also only gathers the metrics when it is scraped, which might mean it’d be better to make it so they’re done in parallel. Alternately, reconfigure node-exporter to be proxied via another /os endpoint, allowing seperate scraping in prometheus.