Hey, it seems that when we use lxc profile device add default gpu gpu id=1
to add one GPU to the default profile, the id counting from 1, not 0. However, for CPU, we use lxc profile set limits.cpu 0-16
. It’s not consistent.
I try to add gpu with id=0, and no error or warning is reported, but I just can’t get the GPU inside the container. And later, I tried id=1 and found it works.
That id is the kernel DRI id. Normally 0 is the first GPU.
What does lxc info --resources
get you?
$ lxc info my-container --resources
Name: my-container
Remote: unix://
Architecture: x86_64
Created: 2019/09/28 11:53 UTC
Status: Running
Type: persistent
Profiles: 515
Pid: 17232
Ips:
eth0: inet 172.21.18.13
eth0: inet6 2001:***:adc7
eth0: inet6 fe80::216:3eff:fece:adc7
lo: inet 127.0.0.1
lo: inet6 ::1
Resources:
Processes: 60
CPU usage:
CPU usage (in seconds): 8
Memory usage:
Memory (current): 168.07MB
Memory (peak): 176.30MB
Network usage:
eth0:
Bytes received: 376.94kB
Bytes sent: 47.02kB
Packets received: 3663
Packets sent: 279
lo:
Bytes received: 1.44kB
Bytes sent: 1.44kB
Packets received: 16
Packets sent: 16
I’m using lxd 3.0.3 in Ubuntu 18.04.
Ah, okay, 3.0.3 doesn’t have the fancy GPU reporting we have since 3.16 which would have made it clear what id points to what.
So you’d need to manually dig through /dev/dri
and /sys/class/drm
to see the different cards on the system and get an idea of what they are.
On 3.17, it would look something like this:
CPUs (x86_64):
Socket 0:
Vendor: GenuineIntel
Name: Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz
Caches:
- Level 1 (type: Data): 33kB
- Level 1 (type: Instruction): 33kB
- Level 2 (type: Unified): 262kB
- Level 3 (type: Unified): 31MB
Cores:
- Core 0
Frequency: 0Mhz
NUMA node: 0
Threads:
- 0 (id: 0, online: true)
- Core 1
Frequency: 0Mhz
NUMA node: 0
Threads:
- 0 (id: 1, online: true)
Socket 1:
Vendor: GenuineIntel
Name: Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz
Caches:
- Level 1 (type: Data): 33kB
- Level 1 (type: Instruction): 33kB
- Level 2 (type: Unified): 262kB
- Level 3 (type: Unified): 31MB
Cores:
- Core 0
Frequency: 0Mhz
NUMA node: 1
Threads:
- 0 (id: 2, online: true)
- Core 1
Frequency: 0Mhz
NUMA node: 1
Threads:
- 0 (id: 3, online: true)
Memory:
NUMA nodes:
Node 0:
Free: 3.25GB
Used: 890.56MB
Total: 4.14GB
Node 1:
Free: 3.52GB
Used: 703.33MB
Total: 4.23GB
Free: 7.93GB
Used: 434.72MB
Total: 8.36GB
GPUs:
Card 0:
NUMA node: 0
Vendor: Red Hat, Inc. (1af4)
PCI address: 0000:00:02.0
Driver: virtio_gpu (4.15.0-64-generic)
DRM:
ID: 0
Card: card0 (226:0)
Control: controlD64 (226:0)
Render: renderD128 (226:128)
Card 1:
NUMA node: 1
Vendor: NVIDIA Corporation (10de)
Product: GK208B [GeForce GT 730] (1287)
PCI address: 0000:fd:07.0
Driver: nvidia (390.116)
DRM:
ID: 1
Card: card1 (226:1)
Render: renderD129 (226:129)
NVIDIA information:
Architecture: 3.5
Brand: GeForce
Model: GeForce GT 730
CUDA Version: 9.1
NVRM Version: 390.116
UUID: GPU-6ddadebd-dafe-2db9-f10f-125719770fd3
Card 2:
NUMA node: 0
Vendor: NVIDIA Corporation (10de)
Product: GK208B [GeForce GT 730] (1287)
PCI address: 0000:ff:09.0
Driver: nvidia (390.116)
DRM:
ID: 2
Card: card2 (226:2)
Render: renderD130 (226:130)
NVIDIA information:
Architecture: 3.5
Brand: GeForce
Model: GeForce GT 730
CUDA Version: 9.1
NVRM Version: 390.116
UUID: GPU-253db1df-f725-a174-99d4-a8933288c39e
NIC:
NUMA node: 0
Vendor: Red Hat, Inc. (1af4)
PCI address: 0000:00:03.0
Driver: virtio_net (4.15.0-64-generic)
Ports:
- Port 0 (ethernet)
ID: ens3
Address: 52:54:00:f7:c1:10
Port type: other
Transceiver type: internal
Auto negotiation: false
Link detected: true
Disk:
NUMA node: 0
ID: sda
Device: 8:0
Model: QEMU HARDDISK
Type: scsi
Size: 53.69GB
Read-Only: false
Removable: false
Partitions:
- Partition 1
ID: sda1
Device: 8:1
Read-Only: false
Size: 53.69GB
So I need to upgrade lxd to 3.16? lxd 3.26 is available only in snap? Hmm, I can’t install it via snap due to poor network connection and there is no mirrors for snap.
Recent LXD won’t actually change the behavior of the gpu device, just make it easier to figure out what id points to what, so I don’t think there’s any reason for you to switch away from 3.0.x for that.
The id= field directly refers to the cardX entry in /dev/dri, so if you know you want card1, then you need id=1.
OK, I found that card0
is ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
. And my NVIDIA card is card1 to card8.