I will use lxd containers to share my 8-gpu-cards server resources. Each user will have a container, which all gpu cards are mounted. As the server administrator, I want to monitor who are using the most gpu resources. I find the following ways to solve it, but have some problems.
nvidia-smiin the host machine. I can find the
pidof process, but the
container pidmapping is not direct. I cannot convert them in a formula.
usernameof the process. But those process’s username are all
296608, different container’s process in the host machine have the same
lxc info container_name. The results list CPU, Memory and Network usage, and in the LXD 3.12, GPU resource are also included. However, GPU item only have some basic info, no GPU memory, Temp, or GPU-Util.
Card 0: Vendor: NVIDIA Corporation (10de) Product: GK208B [GeForce GT 730] (1287) PCI address: 0000:00:07.0 Driver: nvidia (418.56) NUMA node: 0 NVIDIA information: Architecture: 3.5 Brand: GeForce Model: GeForce GT 730 CUDA Version: 10.1 NVRM Version: 418.56 UUID: GPU-6ddadebd-dafe-2db9-f10f-125719770fd3
Any suggestions? Thanks in advance.