Retrieve LXD metrics with Telegraf and InfluxDB 2.x

ruskofd · October 3, 2021, 3:22pm

Hello

With the arrival of the LXD metrics endpoint feature in LXD 4.19, I thought it would be interesting to share a basic configuration to retrieve these metrics via the Telegraf agent. The metrics will be stored in an InfluxDB 2.x instance inside of an Ubuntu 21.10 container.

LXD configuration

Configure the LXD metrics endpoint to listen locally :

$ lxc config set core.metrics_address "127.0.0.1:9100"

InfluxDB configuration

Since I used Influx to test this new LXD feature, here is the basic steps to ensure you can send metrics into this timeseries database.

Install InfluxDB : Install InfluxDB | InfluxDB OSS 2.0 Documentation (influxdata.com)
Once InfluxDB 2 is installed, connect to the web interface on port 8086 (you way want to use an LXD proxy to access it) and initialize the instance. From my side, my organization name is home.lab and my bucket name is metrics.
You need to generate a token to allow Telegraf to send metrics into your InfluxDB instance :
- In the web interface, go to Data, then Tokens
  
  image1477×765 105 KB
- Then select Read/Write Token in the Generate Token drop-down menu :
  
  image2253×637 97.2 KB
- Create the token, and then click on Save
  
  image962×735 67.5 KB
- Finally, you can view you token by clicking on it on the Tokens page
  
  image1131×445 42.9 KB

Telegraf configuration

Install Telegraf on your LXD host(s) : Install Telegraf | Telegraf 1.20 Documentation (influxdata.com)

Generate a metrics certificate to allow Telegraf to query LXD metrics endpoint :

$ openssl req -x509 -newkey rsa:2048 -keyout /etc/telegraf/lxd-metrics.key -nodes -out /etc/telegraf/lxd-metrics.crt -subj "/CN=lxd.local"

Import the newly generated key and certificate into LXD trust store :

$ lxc config trust add /etc/telegraf/lxd-metrics.crt --type=metrics

Copy LXD server certificate into the Telegraf configuration directory :

$ cp /var/snap/lxd/common/lxd/server.crt /etc/telegraf/lxd-metrics-ca.crt

Change permissions on the previous files to ensure Telegraf can read them :

$ chown telegraf:telegraf /etc/telegraf/{lxd-metrics-ca.crt,lxd-metrics.key,lxd-metrics.crt}

Create an input and an output configuration to scrape and send LXD metrics to InfluxDB :

# /etc/telegraf/telegraf.d/input_lxd.conf
[[inputs.prometheus]]
  urls = ["https://127.0.0.1:9100/1.0/metrics"]
  tls_ca = "/etc/telegraf/lxd-metrics-ca.crt"
  tls_cert = "/etc/telegraf/lxd-metrics.crt"
  tls_key = "/etc/telegraf/lxd-metrics.key"

# /etc/telegraf/telegraf.d/output_influxdb.conf
[[outputs.influxdb_v2]]
  urls = ["http://<IP address of your InfluxDB instance>:8086"]
  token = "<token generated on the web interface>"
  organization = "home.lab"
  bucket = "metrics"

Restart Telegraf to apply configurations :
```
$ systemctl restart telegraf
```
Wait few minutes and metrics must be reported in the InfluxDB Data Explorer. Go to Explore in the InfluxDB interface to create a basic query :

image2436×1131 269 KB

That’s it, hope this tutorial will be useful

stgraber · October 3, 2021, 3:26pm

Thanks!
I moved it over to our Tutorials section.

On my side, I’ve configured my Prometheus server to start scraping my LXD servers in my production cluster. I still need to figure out a good Grafana dashboard to show the data though, then we can publish another tutorial for a setup using Prometheus and Grafana

Dnegreira · October 15, 2021, 3:53pm

@stgraber I’m actually looking into monitoring LXD with Prometheus atm, can you share what you already achieved on Prometheus side? I’m starting from 0, so any pointers / configs to look at would be nice.

stgraber · October 15, 2021, 5:52pm

For the scrape config I’m using:

-   job_name: lxd
    metrics_path: /1.0/metrics
    scheme: https
    scrape_interval: 30s
    scrape_timeout: 5s
    static_configs:
    -   targets:
        - '[2602:fc62:a:101::100]:8444'
        - '[2602:fc62:a:101::101]:8444'
        - '[2602:fc62:a:101::102]:8444'
    tls_config:
        cert_file: /var/snap/prometheus/current/keys/lxd-metrics.crt
        insecure_skip_verify: true
        key_file: /var/snap/prometheus/current/keys/lxd-metrics.key

Which then scrapes my servers every 30s (you need to scrape slower than 15s or you may get the same values twice from the cache).

On the grafana front, I’ve not done much yet. I have a dashboard which lists all projects and instances and shows the process and network usage. So still a long long way to go. I’d love for it to show an overview of resource usage for the selected projects at the top and then show the per-instance usage underneath.

https://dl.stgraber.org/LXD-grafana.json

Dnegreira · October 15, 2021, 8:07pm

Thanks much, I’ll explore the new metrics with prometheus and see what I can do on the grafana side of things.

derrickmehaffy · January 26, 2022, 5:15am

ruskofd:

# /etc/telegraf/telegraf.d/input_lxd.conf
[[inputs.prometheus]]
  urls = ["https://127.0.0.1:9100/1.0/metrics"]
  tls_ca = "/etc/telegraf/lxd-metrics-ca.crt"
  tls_cert = "/etc/telegraf/lxd-metrics.crt"
  tls_key = "/etc/telegraf/lxd-metrics.key"

If I might make one suggestion here and add in insecure_skip_verify = true as the ssl certs we just generated aren’t technically valid since they are self-signed. Took me a few minutes to figure out how to get telegraf to actually ask lxd for the metrics

Something like this:

[[inputs.prometheus]]
  urls = ["https://127.0.0.1:9100/1.0/metrics"]
  insecure_skip_verify = true
  tls_ca = "/etc/telegraf/lxd-metrics-ca.crt"
  tls_cert = "/etc/telegraf/lxd-metrics.crt"
  tls_key = "/etc/telegraf/lxd-metrics.key"

Also if you don’t want telegraf spamming your syslog with useless messages about not being to connect to local influxdb you can comment out the default [[outputs.influxdb]] in /etc/telegraf/telegraf.conf:

stgraber · January 26, 2022, 3:25pm

Ah yeah. It’s definitely possible to have LXD run a valid certificate but it’s pretty uncommon so I suspect most folks will have to rely on disabling the verification

deviantintegral · April 10, 2023, 12:12pm

I followed this guide and it worked great… for 30 days! By default, openssl is generating certs with a 30 day expiry. You can extend this when generating the certificate with -days, such as:

openssl req -x509 -days 3650 -newkey rsa:2048 -keyout ./lxd-metrics.key -nodes -out ./lxd-metrics.crt -subj "/CN=lxd.local"