Official Grafana dashboard for LXD

LXD 4.19 introduce support for a metrics interface which exposes an endpoint that’s compatible with the OpenMetrics format.

Combined with a tool like Prometheus, a standalone or clustered LXD deployment can be scrapped at a fixed interval and those metrics be stored in a time series database.

This enables all kind of interesting queries to find performance issues, potential misbehaving workloads, abuses or even gather usage metrics for billing purposes.

Details on how to setup the LXD side of it can be found here:

We’ve then been hard at work on a Grafana dashboard as a showcase of what can be done with those metrics and a good starting point for anyone looking at integrating our metrics with their dashboards.

Starting today, we have something that we feel is of pretty good quality and it can be found at:

It provides a per-project resource consumption overview:

As well as detailed per-instance data:

To set it up, you can import the dashboard id 15726 into your Grafana and pick your prometheus server configured per the instructions above as the data source.

Note that:

  • You need Grafana 8.4 or higher (required by our disk/network graphs)
  • You need prometheus 2.22 or higher (required for proper handling of some intervals)

Let us know how well that works for your setup and if there’s anything that we should consider adding. We’ve tried to focus on what a LXD cluster operator may need to identify problems in a multi-project environment.

8 Likes

Wow, nice job ! Will try it soon :wink:

When using the separate metric listener core.metrics_address, can we have an option that makes the metrics public so that the playing with certs is not necessary ?

That’d probably be fine to add though indeed restricted to the metrics-only listener.
Feel free to file a feature request at https://github.com/lxc/lxd/issues

3 Likes

Excellent job with this!

Just set mine up with Prometheus & Grafana as docker containers inside a nested LXC container.

Looks as though these metrics are going to be quite useful.

Thanks for all the work you do around here. Very much appreciated!

3 Likes

Looks great now if only there was a LXD image or cloud-init template for setting up grafana/Prometheus services… :slight_smile:

I’m running 4.0.9 through the snap.
I’m running prometheus through snap:

sudo snap list
lxd         4.0.9-8e2046b  22753  4.0/stable/…   canonical✓  -
prometheus  2.28.1         53     2/stable       canonical✓  -

I’ve tried this: Instance metrics - LXD documentation

There are a number of issues:

  • lxc config trust add metrics.crt --type=metrics
    … Doesn’t work since --type isnt recognized.

  • ca_file: ‘tls/lxd.crt’
    … is a typo and should be ca_file: ‘tls/server.crt’

After fixing those issues, I’m till not having any luck and my prometheus shows this after a restart:

Also, can’t grab stats from curl:

 curl -k https://127.0.0.1:8443/1.0/metrics/
{"type":"error","status":"","status_code":0,"operation":"","error_code":404,"error":"not found","metadata":null}

Prometheus seems OK (also from logs)

systemctl is-active snap.prometheus.prometheus.service
active

What am I doing wrong?

4.0.9 doesn’t have the metrics API.

On focal, what’s the process of switching to lxd 5 through snapd? Can this be done with running containers?

snap refresh lxd --channel=5.0 will move you to the 5.0 LTS track from the 4.0 LTS track.
Instances will keep running during the update, only thing that goes offline is the LXD API itself, so not different from a normal minor bugfix update.

1 Like

Thanx @stgraber - I can confirm it worked now. I tried the dashboard (LXD dashboard for Grafana | Grafana Labs) and it also loads up.

The values are however a bit strange, but I’ll open a separate thread for that. Thanx!