I have a LXD host with 128G RAM monitored by prometheus + grafana.
Grafana threw an alarm that memory usage if high (94%).
When I query all running containers on the host, they report a total of 40G used.
When I query the host however, it reports 115G used.
While I understand this would be a tricky situation to sort out, what metric or tool should I use as a way to monitor my host for potential memory capacity problems?
- Grafana - reports 94%
- Host reports 115G = 90%
- Sum of all containers reports 40/128 = 31%
Here is a picture that describes the situation a bit better.