Our API exposes a number of performance metrics, specifically:
Total used CPU time
Disk usage (for root device)
Memory usage (current and peak)
Swap usage (current and peak)
Network usage (bytes/packets sent/received)
That’s at the /1.0/containers/NAME/state endpoint and exposed in CLI by “lxc info NAME”.
We can add more stuff to this API as needed, so long as it’s something we can reasonably extract from CGroups or similar. Note that fetching this stuff is unfortunately very expensive, so fetching the information every 5 minutes or so is fine, but fetching it every second or so (like top would do) would cause so much load on your system that your results would be seriously affected.
I understand the issue about getting data every second. However, is it possible to create a mechanism (like RRD tool) to collect container stats (ie: every 3 or 5mins) that can be viewed offline?
Sure, you can use the “lxc info” output or the raw data from our API, poll that every x minutes and feed that to munin/prometheus/whatever other system you may want to use for graphing.
My containers are using a non-default storage pool in ZFS; is that why this is blank? Also, what does CPU usage represent: total processing time in seconds? Is there a way to get load, such as what top might show?
Ok, so looking at the code, the way disk usage works on btrfs is that it requires a quota group and as those can be expensive, LXD only sets them up when you set a size on your root device.
So it looks like that if you were to set a size on your root device, then the usage reporting should work.
This is btrfs-specific, zfs doesn’t have that problem and all other backend types just plain lack usage reporting.
Right as usual! I thought I was running my containers on the ZFS storage pool, but it was int the btrfs one. Moving to the ZFS pool shows the disk usage. Thanks again!