Hi,
I have an issue with the container and its assigned devices.
I have more available GPUs on the server and I want to assign just one of them to the container. I have more of them and it works just fine, but I’m getting issues with this one.
Drivers are OK for the Nvidia.
How I do it:
lxc config device add <container> gpu2 gpu id=2
lxc config show
architecture: x86_64
config:
image.architecture: amd64
image.description: ubuntu 20.04 LTS amd64 (release) (20211021)
image.label: release
image.os: ubuntu
image.release: focal
image.serial: “20211021”
image.type: squashfs
image.version: “20.04”
volatile.base_image: 5fc94479f588171282beb094da96bb83eb51420d6cf13b223c737d1fda9169cd
volatile.eth0.host_name: veth1d232190
volatile.eth0.hwaddr: 00:16:3e:0a:17:fb
volatile.idmap.base: “0”
volatile.idmap.current: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.idmap.next: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.last_state.idmap: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.last_state.power: RUNNING
volatile.uuid: 1a283105-f789-4c6c-bd68-3f3514495238
devices:
gpu2:
id: “2”
type: gpu
ephemeral: false
profiles:
-
stateful: false
description: “”
When I run nvidia-smi on the server (this output is the same for the container where should be just one GPU):
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00 Driver Version: 450.142.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M10 Off | 00000000:88:00.0 Off | N/A |
| N/A 36C P0 16W / 53W | 0MiB / 8129MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M10 Off | 00000000:89:00.0 Off | N/A |
| N/A 36C P0 16W / 53W | 0MiB / 8129MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M10 Off | 00000000:8A:00.0 Off | N/A |
| N/A 25C P0 16W / 53W | 0MiB / 8129MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M10 Off | 00000000:8B:00.0 Off | N/A |
| N/A 28C P0 16W / 53W | 0MiB / 8129MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 Tesla M10 Off | 00000000:B1:00.0 Off | N/A |
| N/A 36C P0 16W / 53W | 0MiB / 8129MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 Tesla M10 Off | 00000000:B2:00.0 Off | N/A |
| N/A 36C P0 16W / 53W | 0MiB / 8129MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 Tesla M10 Off | 00000000:B3:00.0 Off | N/A |
| N/A 24C P0 16W / 53W | 0MiB / 8129MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 Tesla M10 Off | 00000000:B4:00.0 Off | N/A |
| N/A 30C P0 16W / 53W | 0MiB / 8129MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
When I list the /dev I get more GPUs, has someone encountered this before, or where am I doing mistakes?
Container restarted, the only thing left to be restarted is the server.