LXD - Per-container stats

rkelleyrtp · May 10, 2018, 4:20pm

Looking to see if any headway has been made to provide per-container stats using LXD. Given the new 3.0 clustering option, we can now pool lots of physical servers into a large cluster potentially running hundreds/thousands of containers. As a result, tracking usage for so many containers is essential to managing large clusters.

stgraber · May 10, 2018, 6:05pm

We still don’t have any more than what’s available in lxc info NAME. Recording historical value is tricky because the kernel APIs to retrieve those values are very expensive, making recording the data cause more load than the container would on its own.

simos · May 10, 2018, 6:23pm

There is a third-party tool called Sysdig that can collect aggregate data and understands containers.
See https://support.sysdig.com/hc/en-us/articles/115002455243-Basics-Understanding-how-Sysdig-Monitor-aggregates-data

Note that sysdig installs a kernel module (through DKMS), and uses that kernel module to get visibility into the processes.

rkelleyrtp · May 11, 2018, 3:57pm

Thanks for the sysdig info. I will check it out.

As you guys can imagine, operational maintenance and life-cycle management of large scale clusters will become a hot topic in the future. This is why tools that can get container stats to identify misbehaving containers (“lxd-top”) are so important.

As we know, it is easy to spin up a few LXD nodes and start running workloads. The problems start when you run lots of workloads and users start complaining of performance issues.

Maybe these tools already exist, and I just have not seem them yet?

stgraber · May 14, 2018, 2:36am

My understanding is that sysdig is meant to do exactly that and they have a kernel module to grab the needed information without all the overhead that LXD would otherwise have when interacting with a clean mainline kernel.

roka · May 17, 2018, 2:05am

You can create a plugin for glances using LXD python API. You can do it by extending the docker monitoring code. For details take a look at http://glances.readthedocs.io/en/stable/aoa/docker.html

idef1x · May 17, 2018, 9:23am

no link to your own blog posting about sysdig? Nice short explanation focused on LXD containers

simos · May 17, 2018, 5:19pm

I am longing for the moment to see someone else post any of my blog posts :-).

idef1x · May 17, 2018, 6:32pm

on request here is a blog post about sysdig to monitor containers… or actually troubleshoot HOWTO: Install and use Sysdig/Falco (troubleshooting and monitoring)

in the meantime I am trying to figure out which cgroup data from /sys/fs/cgroup/… and beyond can be used to get the interesting data. Should be in there some where right? Than it would be a matter to read those files, wich should even be less intrusive I guess