I will use lxd containers to share my 8-gpu-cards server resources. Each user will have a container, which all gpu cards are mounted. As the server administrator, I want to monitor who are using the most gpu resources. I find the following ways to solve it, but have some problems.
- Use
nvidia-smi
in the host machine. I can find thepid
of process, but thehost pid
andcontainer pid
mapping is not direct. I cannot convert them in a formula. - Find
username
of the process. But those process’s username are all296608
, different container’s process in the host machine have the sameuser
property. - Use
lxc info container_name
. The results list CPU, Memory and Network usage, and in the LXD 3.12, GPU resource are also included. However, GPU item only have some basic info, no GPU memory, Temp, or GPU-Util.
Card 0:
Vendor: NVIDIA Corporation (10de)
Product: GK208B [GeForce GT 730] (1287)
PCI address: 0000:00:07.0
Driver: nvidia (418.56)
NUMA node: 0
NVIDIA information:
Architecture: 3.5
Brand: GeForce
Model: GeForce GT 730
CUDA Version: 10.1
NVRM Version: 418.56
UUID: GPU-6ddadebd-dafe-2db9-f10f-125719770fd3
Any suggestions? Thanks in advance.