GPU id from 1 not 0?

Hey, it seems that when we use lxc profile device add default gpu gpu id=1 to add one GPU to the default profile, the id counting from 1, not 0. However, for CPU, we use lxc profile set limits.cpu 0-16. It’s not consistent.
I try to add gpu with id=0, and no error or warning is reported, but I just can’t get the GPU inside the container. And later, I tried id=1 and found it works.

That id is the kernel DRI id. Normally 0 is the first GPU.

What does lxc info --resources get you?

$ lxc info my-container --resources
Name: my-container
Remote: unix://
Architecture: x86_64
Created: 2019/09/28 11:53 UTC
Status: Running
Type: persistent
Profiles: 515
Pid: 17232
Ips:
  eth0:	inet	172.21.18.13
  eth0:	inet6	2001:***:adc7
  eth0:	inet6	fe80::216:3eff:fece:adc7
  lo:	inet	127.0.0.1
  lo:	inet6	::1
Resources:
  Processes: 60
  CPU usage:
    CPU usage (in seconds): 8
  Memory usage:
    Memory (current): 168.07MB
    Memory (peak): 176.30MB
  Network usage:
    eth0:
      Bytes received: 376.94kB
      Bytes sent: 47.02kB
      Packets received: 3663
      Packets sent: 279
    lo:
      Bytes received: 1.44kB
      Bytes sent: 1.44kB
      Packets received: 16
      Packets sent: 16

I’m using lxd 3.0.3 in Ubuntu 18.04.

Ah, okay, 3.0.3 doesn’t have the fancy GPU reporting we have since 3.16 which would have made it clear what id points to what.

So you’d need to manually dig through /dev/dri and /sys/class/drm to see the different cards on the system and get an idea of what they are.

On 3.17, it would look something like this:

CPUs (x86_64):
  Socket 0:
    Vendor: GenuineIntel
    Name: Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz
    Caches:
      - Level 1 (type: Data): 33kB
      - Level 1 (type: Instruction): 33kB
      - Level 2 (type: Unified): 262kB
      - Level 3 (type: Unified): 31MB
    Cores:
      - Core 0
        Frequency: 0Mhz
        NUMA node: 0
        Threads:
          - 0 (id: 0, online: true)
      - Core 1
        Frequency: 0Mhz
        NUMA node: 0
        Threads:
          - 0 (id: 1, online: true)
  Socket 1:
    Vendor: GenuineIntel
    Name: Intel(R) Xeon(R) CPU E5-2695 v2 @ 2.40GHz
    Caches:
      - Level 1 (type: Data): 33kB
      - Level 1 (type: Instruction): 33kB
      - Level 2 (type: Unified): 262kB
      - Level 3 (type: Unified): 31MB
    Cores:
      - Core 0
        Frequency: 0Mhz
        NUMA node: 1
        Threads:
          - 0 (id: 2, online: true)
      - Core 1
        Frequency: 0Mhz
        NUMA node: 1
        Threads:
          - 0 (id: 3, online: true)

Memory:
  NUMA nodes:
    Node 0:
      Free: 3.25GB
      Used: 890.56MB
      Total: 4.14GB
    Node 1:
      Free: 3.52GB
      Used: 703.33MB
      Total: 4.23GB
  Free: 7.93GB
  Used: 434.72MB
  Total: 8.36GB

GPUs:
  Card 0:
    NUMA node: 0
    Vendor: Red Hat, Inc. (1af4)
    PCI address: 0000:00:02.0
    Driver: virtio_gpu (4.15.0-64-generic)
    DRM:
      ID: 0
      Card: card0 (226:0)
      Control: controlD64 (226:0)
      Render: renderD128 (226:128)
  Card 1:
    NUMA node: 1
    Vendor: NVIDIA Corporation (10de)
    Product: GK208B [GeForce GT 730] (1287)
    PCI address: 0000:fd:07.0
    Driver: nvidia (390.116)
    DRM:
      ID: 1
      Card: card1 (226:1)
      Render: renderD129 (226:129)
    NVIDIA information:
      Architecture: 3.5
      Brand: GeForce
      Model: GeForce GT 730
      CUDA Version: 9.1
      NVRM Version: 390.116
      UUID: GPU-6ddadebd-dafe-2db9-f10f-125719770fd3
  Card 2:
    NUMA node: 0
    Vendor: NVIDIA Corporation (10de)
    Product: GK208B [GeForce GT 730] (1287)
    PCI address: 0000:ff:09.0
    Driver: nvidia (390.116)
    DRM:
      ID: 2
      Card: card2 (226:2)
      Render: renderD130 (226:130)
    NVIDIA information:
      Architecture: 3.5
      Brand: GeForce
      Model: GeForce GT 730
      CUDA Version: 9.1
      NVRM Version: 390.116
      UUID: GPU-253db1df-f725-a174-99d4-a8933288c39e

NIC:
  NUMA node: 0
  Vendor: Red Hat, Inc. (1af4)
  PCI address: 0000:00:03.0
  Driver: virtio_net (4.15.0-64-generic)
  Ports:
    - Port 0 (ethernet)
      ID: ens3
      Address: 52:54:00:f7:c1:10
      Port type: other
      Transceiver type: internal
      Auto negotiation: false
      Link detected: true

Disk:
  NUMA node: 0
  ID: sda
  Device: 8:0
  Model: QEMU HARDDISK
  Type: scsi
  Size: 53.69GB
  Read-Only: false
  Removable: false
  Partitions:
    - Partition 1
      ID: sda1
      Device: 8:1
      Read-Only: false
      Size: 53.69GB

So I need to upgrade lxd to 3.16? lxd 3.26 is available only in snap? Hmm, I can’t install it via snap due to poor network connection and there is no mirrors for snap.

Recent LXD won’t actually change the behavior of the gpu device, just make it easier to figure out what id points to what, so I don’t think there’s any reason for you to switch away from 3.0.x for that.

The id= field directly refers to the cardX entry in /dev/dri, so if you know you want card1, then you need id=1.

OK, I found that card0 is ASPEED Technology, Inc. ASPEED Graphics Family (rev 30). And my NVIDIA card is card1 to card8.