Question on LXD dashboard values

I’m trying out the official LXD dashboard for Grafana | Grafana Labs

But I can’t understand the storage parts of this.

I have a disk of about 500GB but the graphs seems to show a quota of 4TB and I don’t seem to be able to get any I/O information.

$ zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
juju-zfs    80M   159K  79.8M        -         -     5%     0%  1.00x    ONLINE  -
lxdhosts   556G   356G   200G        -         -    46%    64%  1.00x    ONLINE  -

Am I doing something wrong?

I/O reporting doesn’t work on ZFS unfortunately as ZFS doesn’t use the correct kernel codepath for that.

The source of the high total storage is likely that your containers don’t have a disk limit and so each report the total pool size as their disk size. The dashboard then aggregates that which adds it up and ends up being significantly larger than the actual underlying storage.

1 Like

Thanx alot for this information!

The followup questions would be:

  • Is there a way to get these zfs I/O metrics somehow given a ubuntu os?
  • Is there a prometheus exporter for zfs which you would recommend which I can use with prometheus snap?

I’m really impressed about how polished lxd is getting. I’m using it alot in many areas at the moment.

I’ve made some progress here. But I’m looking for advice as described here.

Hi!

We run prometheus in a LXD container and want to scrape from the LXD host. We have created certificates according to Instance metrics - LXD documentation but get this error about the certificate:


Do you know what we should do differently to make this work?

Thanks!

prometheus tries to do proper CN/SAN validation against the name provided in the targets list. When that doesn’t march what’s in the cert, you can provide a server_name to prometheus. I have a similar setup where the LXD server cert looks like:

# openssl x509 -noout -text -in c2d.crt | grep c2d
        Issuer: O = linuxcontainers.org, CN = root@c2d
        Subject: O = linuxcontainers.org, CN = root@c2d
                DNS:c2d, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1

And I configure prometheus that way:

  - job_name: "lxd-c2d"
    metrics_path: '/1.0/metrics'
    scheme: 'https'
    static_configs:
      - targets: ['c2d.mgmt.sdeziel.info:9101']
    tls_config:
      ca_file: '/var/snap/prometheus/common/certs/c2d.crt'
      cert_file: '/var/snap/prometheus/common/certs/metrics.crt'
      key_file: '/var/snap/prometheus/common/certs/metrics.key'
      server_name: 'c2d'

PR: doc: document how to handle SAN vs target name mismatch in Prometheus sample by simondeziel · Pull Request #10368 · lxc/lxd · GitHub

1 Like

Thanks @sdeziel ! Adding the server_name solved that error.

Now, when I click the endpoint e.g. https://192.168.1.2:8443/1.0/metrics I get not authorized.
image

I guess that is something wrong with the metrics certificate. How did you create that?

If you followed the instruction on Instance metrics - LXD documentation, I’d think the cert would be good. Can you check if the cert shows up with the right type with:

lxc config trust ls

The type is correct. I have used the command from the documentation to create it.

Should I change "/CN=metrics.local"? I did try the server name, same as I put as server_name, but that did no difference.

Could you share the metrics.crt content as well as the output of lxc config trust ls? I’d like to try and import it here.

lxc config trust ls

metrics_dwellir1.crt

-----BEGIN CERTIFICATE-----
MIIBwTCCAUigAwIBAgIUXTMJP2qyRy1B6cTUW+8OwqVAIP8wCgYIKoZIzj0EAwMw
GDEWMBQGA1UEAwwNbWV0cmljcy5sb2NhbDAeFw0yMjA1MDkxOTIzMzZaFw0zMjA1
MDYxOTIzMzZaMBgxFjAUBgNVBAMMDW1ldHJpY3MubG9jYWwwdjAQBgcqhkjOPQIB
BgUrgQQAIgNiAATpKlMy2blge/aseDuixG/bNzQXxlORbjxrHptsLkLrhd5khhoX
Wk6yDeCFLKI18nhSw8QwDOJyvyPbmME1AZIvBGsV1+BcVpHBhWXjO/Nzpoc9FnlH
LllqKJd8qan/LkajUzBRMB0GA1UdDgQWBBS7aEnj+JykJhCGhHBygYYfun/CtTAf
BgNVHSMEGDAWgBS7aEnj+JykJhCGhHBygYYfun/CtTAPBgNVHRMBAf8EBTADAQH/
MAoGCCqGSM49BAMDA2cAMGQCMGRPftolgMyX33QvhjDtE7WX7MMdWv3cXpXg6BSP
qh1VdERVDorkGKBX3hX6cq/apQIwexlO6U6jCZKnm/UimS3O5HiVwRzJ9WUH/Gku
LA/BqZ2F5zC9MFrHKIkqNyQygllg
-----END CERTIFICATE-----

OK it imports just fine here:

$ lxc config trust ls | grep 900e79c7b44d
| metrics | metrics_dwellir1.crt | metrics.local | 900e79c7b44d | May 9, 2022 at 7:23pm (UTC) | May 6, 2032 at 7:23pm (UTC) |

Since you are getting a not authorized error, maybe the private key you are using doesn’t match that public key? You can share the output of running this on the private key:

openssl ec -pubout -in metrics_dwellir1.key

The output is OK to share as it’s just the public portion.

openssl ec -pubout -in metrics_dwellir1.key

output:

read EC key
writing EC key
-----BEGIN PUBLIC KEY-----
MHYwEAYHKoZIzj0CAQYFK4EEACIDYgAE6SpTMtm5YHv2rHg7osRv2zc0F8ZTkW48
ax6bbC5C64XeZIYaF1pOsg3ghSyiNfJ4UsPEMAzicr8j25jBNQGSLwRrFdfgXFaR
wYVl4zvzc6aHPRZ5Ry5ZaiiXfKmp/y5G
-----END PUBLIC KEY-----

OK so the .crt and .key files do match. Could you show the output of lxc config trust show 900e79c7b44d?

Sorry for the back and forth, I’m really puzzled by the not authorized you are getting :confused:

No worry! I’m really grateful for the help with this!
lxc config trust show 900e79c7b44d

name: metrics_dwellir1.crt
type: metrics
restricted: false
projects: []
certificate: |
  -----BEGIN CERTIFICATE-----
  MIIBwTCCAUigAwIBAgIUXTMJP2qyRy1B6cTUW+8OwqVAIP8wCgYIKoZIzj0EAwMw
  GDEWMBQGA1UEAwwNbWV0cmljcy5sb2NhbDAeFw0yMjA1MDkxOTIzMzZaFw0zMjA1
  MDYxOTIzMzZaMBgxFjAUBgNVBAMMDW1ldHJpY3MubG9jYWwwdjAQBgcqhkjOPQIB
  BgUrgQQAIgNiAATpKlMy2blge/aseDuixG/bNzQXxlORbjxrHptsLkLrhd5khhoX
  Wk6yDeCFLKI18nhSw8QwDOJyvyPbmME1AZIvBGsV1+BcVpHBhWXjO/Nzpoc9FnlH
  LllqKJd8qan/LkajUzBRMB0GA1UdDgQWBBS7aEnj+JykJhCGhHBygYYfun/CtTAf
  BgNVHSMEGDAWgBS7aEnj+JykJhCGhHBygYYfun/CtTAPBgNVHRMBAf8EBTADAQH/
  MAoGCCqGSM49BAMDA2cAMGQCMGRPftolgMyX33QvhjDtE7WX7MMdWv3cXpXg6BSP
  qh1VdERVDorkGKBX3hX6cq/apQIwexlO6U6jCZKnm/UimS3O5HiVwRzJ9WUH/Gku
  LA/BqZ2F5zC9MFrHKIkqNyQygllg
  -----END CERTIFICATE-----
fingerprint: 900e79c7b44dfc780e3acda7e507f27c2def3e49a91dae15c0d5598d40abc87a

This is how the job is configured in the prometheus.yaml:

  - job_name: lxd-dwellir1
    metrics_path: '/1.0/metrics'
    scheme: 'https'
    static_configs:
      - targets: ['192.168.111.2:8443']
    tls_config:
      ca_file: 'tls/server_dwellir1.crt'
      cert_file: 'tls/metrics_dwellir1.crt'
      key_file: 'tls/metrics_dwellir1.key'
      server_name: 'dwellir1'

The json you showed made me realize that you might be going to your prometheus webUI under /targets and clicking the link to 192.168.111.2:8443/1.0/metrics. Is that right?

If so, that can’t work because it tells your browser to make a direct connection to the target in question and your browser doesn’t have the cert/key.

You are totally right! I realize now that it actually started working after fixing the server cert. I can see all the metrics for the containers in prometheus and grafana now. Thanks a lot for the help and sorry for the unnecessary troubleshooting with the metrics cert!

1 Like

I’m glad you’ve got it working in the end!

1 Like

@sdeziel this is huge and we’ll need to write up some kind of guide on this. Its massively useful. If we now only could get that zfs I/O metrics in there…

1 Like