LXD 4.19 introduce support for a metrics interface which exposes an endpoint that’s compatible with the OpenMetrics format.
Combined with a tool like Prometheus, a standalone or clustered LXD deployment can be scrapped at a fixed interval and those metrics be stored in a time series database.
This enables all kind of interesting queries to find performance issues, potential misbehaving workloads, abuses or even gather usage metrics for billing purposes.
Details on how to setup the LXD side of it can be found here:
We’ve then been hard at work on a Grafana dashboard as a showcase of what can be done with those metrics and a good starting point for anyone looking at integrating our metrics with their dashboards.
Starting today, we have something that we feel is of pretty good quality and it can be found at:
It provides a per-project resource consumption overview:
As well as detailed per-instance data:
To set it up, you can import the dashboard id
15726 into your Grafana and pick your prometheus server configured per the instructions above as the data source.
- You need Grafana 8.4 or higher (required by our disk/network graphs)
- You need prometheus 2.22 or higher (required for proper handling of some intervals)
Let us know how well that works for your setup and if there’s anything that we should consider adding. We’ve tried to focus on what a LXD cluster operator may need to identify problems in a multi-project environment.