Best Practices for Monitoring Incus via Prometheus and Grafana?

AJRepo · December 12, 2024, 8:51am

I was reading the documentation about monitoring incus which was recommending Prometheus and Grafana ( How to monitor metrics - Incus documentation ) and was wondering if there were some “best practices” for Incus+Prometheus+Grafana?

I see in the discussion some were using docker inside a nested container, some via snap, etc. Official Grafana dashboard for LXD - #6 by stgraber

Any thoughts about which is best?

Thanks in advance!

candlerb · December 12, 2024, 9:23am

There’s no real best practice: just run Prometheus and Grafana wherever is most convenient for you. I put them in their own incus containers, Ubuntu with snapd removed, with Prometheus installed from the release tarball, and Grafana installed from their apt repository.

Then point prometheus at your incus instance to collect data.

For ease of use, I just expose the metrics on 8444 without authentication or certificate verification. Prometheus scrape config:

  - job_name: incus
    scrape_interval: 1m
    metrics_path: /1.0/metrics
    scheme: https
    tls_config:
      insecure_skip_verify: true
    static_configs:
      - targets:
          - myhost.example.com:8444

This is arguably “worst practice” but it’s on a private network, and it’s only metrics. (The API on 8443 is secured properly)

jarrodu · December 12, 2024, 10:10am

I also just run them in Incus containers. It is pretty easy.

Just makes sure you backup your data.

AJRepo · December 12, 2024, 2:37pm

Do you put Grafana and Prometheus each in their own container or one container for both?

jarrodu · December 12, 2024, 2:50pm

I put them into different containers.

candlerb · December 12, 2024, 4:28pm

Currently I use two different containers, but in the past I’ve had them in the same container. Either works, but since Grafana can be used with other backends (e.g. Loki) I think it makes sense to keep them separate, so that for example I could rebuild prometheus without affecting grafana, or vice versa.

AJRepo · December 14, 2024, 5:08am

Are you setting up the backups of just the data (e.g. Integrations | Prometheus ) or the full container?

Did you setup an OCI (application container) or a full incus container?

Are you backing up just prometheus data via remote-write or the entire container as a snapshot?

jarrodu · December 14, 2024, 8:41am

I am using Incus System Containers.

I backup the full container by copying it to a different Incus remote. This might not be the best solution but works for me. A better way of doing it would be to create a data volume and add it to the container. Then you could copy the volume to a different Incus remote or export it as an archive.

Snapshots are not really a backup. They are used for a different purpose and don’t help you if your storage fails.

Luken · December 16, 2024, 11:16am

I found it amazing to configure Grafana to use a single stat panel to show me the lowest available RAM among containers (with the container’s name) and the lowest available disk space among containers. That way I can have only a few alerts, that monitor all containers to see if one of the most important resources fell below the threshold.

pycaw · December 17, 2024, 2:41pm

You might consider for simpler monitoring just doing this, with the marvelous bpytop utility installed:

ctname=dmain; screen -dmS $ctname-mon bash -c "incus exec $ctname -- bpytop -lc"

Check out how things going:

screen -r $ctname-mon

Detach with ctrl-a d.

Luken · December 17, 2024, 5:01pm

I just came up with a query that shows cpu usage per container/vm relative to the cpu limit set on the container/vm.

sum by (name) (rate(incus_cpu_seconds_total{mode=~"user|system", instance="${incus_instance}"}[$__rate_interval])) / count without (cpu) (sum by (name, cpu) (incus_cpu_seconds_total{mode=~"user|system", instance="${incus_instance}"} > 10))

This query filters out unused cpus to get the number of cpus per container/vm, and calculates the cpu usage in relation to active cpus per container. I hope it’ll be useful to someone .