Lxd monitoring using check_mk

Hi,
I am looking for a way to monitor lxd cluster performance using check_mk. Maybe someone already faced this task? I could use some hints, what is important to check.

What are you looking at monitoring specifically?

I personally tend to just treat containers as machines and have them monitored from the inside, monitoring the services they run as well as basics (mem, cpu, disk, …).

From the LXD point of view, you’d want to monitor the LXD cluster port (8443) to make sure it’s not down and maybe look at memory and cpu consumption of LXD itself or execute and time a pretty standard request like lxc info but there aren’t any request rate or average response time metrics currently collected by LXD so if that’s what you’re after, it’s not something that we have at this point.

Do you know if the node_exporter from Prometheus is working well inside a LXC container ?

I do not know. I am in nagios and graylog land.

Thank you,

Well I was thinking about requests and speed and something like SCVMM panel. Just containers and IP . I am noob at programming. So I fallowed your idea.

#!/bin/bash
ls /var/lib/lxd/containers/> file
NOW=$(date +"%m-%d-%Y")
FILENAME=“file”
while IFS=’’ read -r LINE || [[ -n “$LINE” ]]; do
CONT=$LINE
PID=$(/usr/bin/lxc info $CONT | grep “Pid” |awk ‘{print $2}’)
STATUS=$(/usr/bin/lxc info $CONT | grep “Status” |awk ‘{print $2}’)
IP=$(/usr/bin/lxc info $CONT | grep “inet” |awk ‘{print $3}’| head -1)
PROC=$(/usr/bin/lxc info $CONT | grep “Processes” |awk ‘{print $2}’)
MEMORY=$(/usr/bin/lxc info $CONT | grep “Memory”| tail -n 2 |head -1 |awk ‘{print $3}’)
if [ “$STATUS” != “Running” ]; then
STATE=“0”
echo “0 CONTAINER_$LINE POWER_ON_TIME=$STATE NAME:$LINE;PID:$PID;STATUS:$STATUS;IP:$IP;PROCESSES:$PROC;RAM:$MEMORY”
else
STATE=“1”
echo “0 CONTAINER_$LINE POWER_ON_TIME=$STATE NAME:$LINE;PID:$PID;STATUS:$STATUS;IP:$IP;PROCESSES:$PROC;RAM:$MEMORY”
fi

done < “$FILENAME”
rm file

It’s horrible but works.

It would be nice if users could get load averages for individual containers,

Here in 2017 you said it was to costly an operation but has anything changed since then ?

It would allow user applications to auto scale infrastructure as required, I understand on AWS you can do it through many metrics including;

  • CPU Utilization (%)
  • Memory Utilization (%)
  • Network Out Utilization (MB)
  • Memory Used (MB)
  • Memory Available (MB)
  • Swap Utilization (%)
  • Swap Used (MB)
  • Disk Space Utilization (%)
  • Disk Space Used (GB)
  • Disk Space Available (GB)

Of which many could be done by querying LXD, but CPU utilization seems the most insightful for typical work loads

We have loadavg support in the current git version of lxcfs, albeit under a flag as that feature may be costly.

I was hoping to see it “nativly” in lxd/c as i dont currently use lxcfs (maybe i do and just dont know it but all my backends are zfs), also most installs probably wont us lxcfs as their backend (or maybe they do, subjectivly i see alot more posts about zfs + btrfs)

lxcfs isn’t a storage backend, it’s the virtual filesystem that’s used by lxc/lxd to render files in /proc which the kernel doesn’t render properly for containers. That’s what currently gets you your per-container uptime, memory information, cpu, …

Thanks for explain that, I have managed to get it lxcfs showing the container cpu loads

  1. Is the intended method of running LXCFS to have it run as a service on boot ?

  2. It appears to work by mounting the folder on the container with lxc config device add <contianer> proc disk source=/var/lib/lxcfs/proc/ path=/proc/ is this correct ?

  3. The cpu information isn’t available with lxc info <container> will it ever be ?

Thanks

lxcfs is auto-detected and used by LXC and LXD, so no LXD configuration should be needed for it at all.

lxc info does show CPU information, it shows you the used CPU time in second.
Once we can really rely on LXCFS’ availability of the container loadavg, we may be able to pull that into that API too, but right now it’s not something that we can rely on.

When running lxcfs it seems to always run in the foreground (unless I am miss understanding again) I run the command sudo lxcfs -s -l allow_other /var/lib/lxcfs/

I get the output;

mount namespace: 5
hierarchies:
  0: fd:   6: hugetlb
  1: fd:   7: cpuset
  2: fd:   8: rdma
  3: fd:   9: devices
  4: fd:  10: pids
  5: fd:  11: blkio
  6: fd:  12: memory
  7: fd:  13: freezer
  8: fd:  14: perf_event
  9: fd:  15: cpu,cpuacct
 10: fd:  16: net_cls,net_prio
 11: fd:  17: name=systemd
 12: fd:  18: unified

LXD doesn’t find lxcfs I have to manually add the devices to /proc on the container

Yes, Node_exporter is working well from lxc container.