Per container performance metrics

rkelleyrtp · April 28, 2017, 2:07pm

Anybody know how can we get per-container performance metrics? I have a large number of containers and need to get metrics like:

Top container users (CPU, RAM, memory)
Sorted list of all containers based on the criteria above
Container usage as a relative percentage of overall activity
etc

This is on LXD 2.12 (U16.04 and U17.04)

stgraber · April 28, 2017, 3:16pm

Our API exposes a number of performance metrics, specifically:

Total used CPU time
Disk usage (for root device)
Memory usage (current and peak)
Swap usage (current and peak)
Network usage (bytes/packets sent/received)

That’s at the /1.0/containers/NAME/state endpoint and exposed in CLI by “lxc info NAME”.

We can add more stuff to this API as needed, so long as it’s something we can reasonably extract from CGroups or similar. Note that fetching this stuff is unfortunately very expensive, so fetching the information every 5 minutes or so is fine, but fetching it every second or so (like top would do) would cause so much load on your system that your results would be seriously affected.

rkelleyrtp · April 28, 2017, 6:53pm

Thanks Stephane,

I understand the issue about getting data every second. However, is it possible to create a mechanism (like RRD tool) to collect container stats (ie: every 3 or 5mins) that can be viewed offline?

stgraber · April 28, 2017, 7:19pm

Sure, you can use the “lxc info” output or the raw data from our API, poll that every x minutes and feed that to munin/prometheus/whatever other system you may want to use for graphing.

aaronvegh · January 27, 2019, 9:35pm

I’m interested in this as well. Unfortunately I’m not getting everything from the API call to /containers/NAME/state that I’m hoping for:

"disk": {},
"memory": {
		"usage": 512438272,
		"usage_peak": 671420416,
		"swap_usage": 0,
		"swap_usage_peak": 0
	},
"cpu": {
		"usage": 229143956967
	}

My containers are using a non-default storage pool in ZFS; is that why this is blank? Also, what does CPU usage represent: total processing time in seconds? Is there a way to get load, such as what top might show?

stgraber · January 28, 2019, 6:19pm

@aaronvegh can you show lxc config show --expanded NAME and lxc storage list?

aaronvegh · January 28, 2019, 6:31pm

Sure thing!

$ lxc config show --expanded aaron-project-manager 
architecture: x86_64
config:
  limits.cpu: "2"
  limits.memory: 256MB
  limits.memory.enforce: soft
  user.token: 26547235-6aff-4e06-9f02-fbf7d555f63d
  volatile.base_image: 00fda1c45adafff66d5cdd3e8917b29083e17a18a1d0befb8ae8beb82cd44734
  volatile.eth0.hwaddr: 00:16:3e:bc:2d:7a
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

$ lxc storage list
+---------+-------------+--------+--------------------------------------------+---------+
|  NAME   | DESCRIPTION | DRIVER |                   SOURCE                   | USED BY |
+---------+-------------+--------+--------------------------------------------+---------+
| beta    |             | zfs    | BetaPool                                   | 2       |
+---------+-------------+--------+--------------------------------------------+---------+
| default |             | btrfs  | /var/snap/lxd/common/lxd/disks/default.img | 10      |
+---------+-------------+--------+--------------------------------------------+---------+

stgraber · January 28, 2019, 6:41pm

Ok, so looking at the code, the way disk usage works on btrfs is that it requires a quota group and as those can be expensive, LXD only sets them up when you set a size on your root device.

So it looks like that if you were to set a size on your root device, then the usage reporting should work.

This is btrfs-specific, zfs doesn’t have that problem and all other backend types just plain lack usage reporting.

aaronvegh · January 28, 2019, 6:57pm

Right as usual! I thought I was running my containers on the ZFS storage pool, but it was int the btrfs one. Moving to the ZFS pool shows the disk usage. Thanks again!

A.