Looking for advice on monitoring resource consumption of LXC containers

Dear LXD colleagues,

What is a recommended way of tracking per-container resource usage?

I am hosting a bunch of servers as LXC containers on a single host, and am noticing that over the course of a few days of starting the containers, CPU usage is growing until it hits a ceiling of around 95%.

With the classic Linux system tools like top, htop, etc, I can see that /sbin/init processes, the root process of every container, consume CPU, but cannot tell which containers or which tasks within these containers are the source, and therefore am struggling to decide where to invest into optimizations.

Can anybody here recommend any monitoring tools supporting cgroups, or ways to query LXD?

Someone else can go in the all the different options / upcoming features intended to help here but a “poor mans solution” to help right now might be to look at “CPU Usage (In seconds)” a bash script like the following might give a hint

HOSTS=($(lxc list -c n --format csv))
for HOST in "${HOSTS[@]}"
do
  echo "${HOST} ..." 

  lxc info ${HOST} | grep "CPU usage (in seconds)"
  
done

You can also use htop and enable the ‘cgroups’ column so that you can see which container is using which process.

3 Likes

We’re currently working on a new /1.0/metrics API which will be compatible with solutions like prometheus/grafana to track usage over time.

Thank you all!

Very much looking forward to the upcoming monitoring features. I wasn’t aware htop could enable a CGROUP column, which is a good start when it comes to tracking down the worst CPU hogs at least.

There is also the python-based tool ctop which can be installed via pip. However the console display flickers pretty intensively :slight_smile: