Using GUI containers with no window manager on the host - problem with `nvidia.runtime=true`

Host: Ubuntu Server 20.04
LXC 4.20

I am trying to configure as setup where GUI desktops and apps are run in containers, with the GUI displayed on a specified video output.
I have followed this tutorial, and it works for me when I already on a GUI desktop. However I’d rather not run a window manager on my host and just use Xorg to target the desired displays.
I am also able to get xeyes running on the host video output using commands over ssh.
If I install a desktop on the host and then create a container with nvidia.runtime=false, I can then ssh into the host, run the container, and then run glxgears, which pops up in a desktop window on the host.

It seems that the sticking point is the use of nvidia.runtime=true. That seems to be needed for the GUI, and I am unable to start containers that use that option.

  • Am I correct that if I can create the container with nvidia.runtime=true, I should be able to target glxgears (or any other X11 app, or an entire desktop) to a display without recourse to a window manager on the host?

This is what I get when I try to create containers with nvidia.runtime=true:

~$ lxc launch ubuntu:18.04 --profile default --profile x11 mycontainer                                                                                                                                                        
Creating mycontainer                                                                                                                                                                                                                        
Starting mycontainer                                                                                                                                                                                                                        
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart mycontainer /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/mycontainer/lxc.conf:                                                                               
Try `lxc info --show-log local:mycontainer` for more info                                                                                                                                                                                   
~$ lxc info --show-log local:mycontainer      
$ lxc info --show-log local:mycontainer                                                                                                                                                                                [7/52]
Name: mycontainer                                                                                                                                                                                                                           
Status: STOPPED                                                                                                                                                                                                                             
Type: container                                                                                                                                                                                                                             
Architecture: x86_64                                                                                                                                                                                                                        
Created: 2021/11/13 19:43 EST                                                                                                                                                                                                               
Last Used: 2021/11/13 19:43 EST                                                                                                                                                                                                             
                                                                                                                                                                                                                                            
Log:                                                       

lxc mycontainer 20211114004327.995 WARN     conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc mycontainer 20211114004327.996 WARN     conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc mycontainer 20211114004327.997 WARN     conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc mycontainer 20211114004327.997 WARN     conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc mycontainer 20211114004327.997 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1251 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc mycontainer 20211114004328.178 ERROR    conf - conf.c:run_buffer:321 - Script exited with status 1
lxc mycontainer 20211114004328.178 ERROR    conf - conf.c:lxc_setup:4386 - Failed to run mount hooks
lxc mycontainer 20211114004328.178 ERROR    start - start.c:do_start:1275 - Failed to setup container "mycontainer"
lxc mycontainer 20211114004328.178 ERROR    sync - sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc mycontainer 20211114004328.183 WARN     network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 0 from "eth0" to its initial name "veth3a4c754a"
lxc mycontainer 20211114004328.183 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:867 - Received container state "ABORTING" instead of "RUNNING"
lxc mycontainer 20211114004328.184 ERROR    start - start.c:__lxc_start:2074 - Failed to spawn container "mycontainer"
lxc mycontainer 20211114004328.184 WARN     start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 41 for process 88802
lxc mycontainer 20211114004333.266 WARN     conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc mycontainer 20211114004333.266 WARN     conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc 20211114004333.300 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response

Can you show lxc config show --expanded mycontainer?

1 Like
~$ lxc config show --expanded mycontainer
architecture: x86_64
config:
  environment.DISPLAY: :0
  image.architecture: amd64
  image.description: ubuntu 18.04 LTS amd64 (release) (20211109)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20211109"
  image.type: squashfs
  image.version: "18.04"
  nvidia.driver.capabilities: graphics, compute, display, utility, video
  nvidia.runtime: "true"
  raw.idmap: both 1000 1000
  user.user-data: |
    #cloud-config
    runcmd:
      - 'sed -i "s/; enable-shm = yes/enable-shm = no/g" /etc/pulse/client.conf'
      - 'echo export PULSE_SERVER=unix:/tmp/.pulse-native | tee --append /home/ubuntu/.profile'
    packages:
      - x11-apps
      - x11-utils
      - mesa-utils
      - pulseaudio
  volatile.base_image: d1b447d815ffaba341a8e3018f031bf3e5e2c1ed66f095e9f34318fb6f6fbf8c
  volatile.eth0.host_name: veth5c792fd2
  volatile.eth0.hwaddr: 00:16:3e:dd:bb:4c
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":true,"Isgid":false,"Hostid":1001001,"Nsid":1001,"Maprange":999998999},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000},{"Isuid":true,"Isgid":true,"Hostid":1000,"Nsid":1000,"Maprange":1},{"Isuid":false,"Isgid":true,"Hostid":1001001,"Nsid":1001,"Maprange":999998999}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.power: RUNNING
  volatile.uuid: b8010bca-8d8f-413a-8220-2194469e1d59
devices:
  PASocket1:
    bind: container
    connect: unix:/run/user/1000/pulse/native
    gid: "1000"
    listen: unix:/home/ubuntu/pulse-native
    mode: "0777"
    security.gid: "1000"
    security.uid: "1000"
    type: proxy
    uid: "1000"
  X0:
    bind: container
    connect: unix:/tmp/.X11-unix/X1
    gid: "1000"
    listen: unix:/tmp/.X11-unix/X0
    mode: "0777"
    security.gid: "1000"
    security.uid: "1000"
    type: proxy
    uid: "1000"
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  mygpu:
    type: gpu
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
- x11
stateful: false
description: ""

That looks correct. On the host, does nvidia-smi work properly and do you have libnvidia-compute-VERSION installed (where VERSION matches your driver)?

Hmm no, the only thing I installed on the host was nvidia-tools-495.

Ok, from what I remember, the nvidia container tooling requires the cuda library on the host, which I believe comes from the compute lib package.

In all our NVIDIA tests we installed nvidia-utils-VERSION linux-modules-nvidia-VERSION-generic libnvidia-compute-VERSION

@stgraber is there something we can add to LXD to check for the minimum requirements to get nvidia.runtime working?

1 Like

OK I installed all those (and also nvidia-compute-utils-495) on the host but unfortunately the errors are the same as before.

Not really, it boils down to a set of libraries and kernel APIs which nvidia-container-cli accesses, but the exact set and paths depend on the driver version used on the system.

LXD validates that we do have nvidia-container-cli and calls it as part of the resources API but after that it’s really a black box as to what the proprietary libraries from NVIDIA may need present.

1 Like

Odd, can you show lxc info --resources and the nvidia-smi output?

1 Like

It does look like nvidia-smi is the issue. I’ll try to get that working and report back when I do.

boss@virtland:~$ lxc info --resources 
CPU (x86_64):
  Vendor: AuthenticAMD
  Name: AMD Ryzen Threadripper 1950X 16-Core Processor
  Caches:
    - Level 1 (type: Data): 33kB
    - Level 1 (type: Instruction): 66kB
    - Level 2 (type: Unified): 524kB
    - Level 3 (type: Unified): 8MB
  Cores:
    - Core 0
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 0, online: true, NUMA node: 0)
        - 1 (id: 16, online: true, NUMA node: 0)
    - Core 1
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 1, online: true, NUMA node: 0)
        - 1 (id: 17, online: true, NUMA node: 0)
    - Core 2
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 18, online: true, NUMA node: 0)
        - 1 (id: 2, online: true, NUMA node: 0)
    - Core 3
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 19, online: true, NUMA node: 0)
        - 1 (id: 3, online: true, NUMA node: 0)
    - Core 4
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 20, online: true, NUMA node: 0)
        - 1 (id: 4, online: true, NUMA node: 0)
    - Core 5
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 21, online: true, NUMA node: 0)
        - 1 (id: 5, online: true, NUMA node: 0)
    - Core 6
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 22, online: true, NUMA node: 0)
        - 1 (id: 6, online: true, NUMA node: 0)
    - Core 7
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 23, online: true, NUMA node: 0)
        - 1 (id: 7, online: true, NUMA node: 0)
    - Core 8
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 24, online: true, NUMA node: 1)
        - 1 (id: 8, online: true, NUMA node: 1)
    - Core 9
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 25, online: true, NUMA node: 1)
        - 1 (id: 9, online: true, NUMA node: 1)
    - Core 10
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 10, online: true, NUMA node: 1)
        - 1 (id: 26, online: true, NUMA node: 1)
    - Core 11
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 11, online: true, NUMA node: 1)
        - 1 (id: 27, online: true, NUMA node: 1)
    - Core 12
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 12, online: true, NUMA node: 1)
        - 1 (id: 28, online: true, NUMA node: 1)
    - Core 13
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 13, online: true, NUMA node: 1)
        - 1 (id: 29, online: true, NUMA node: 1)
    - Core 14
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 14, online: true, NUMA node: 1)
        - 1 (id: 30, online: true, NUMA node: 1)
    - Core 15
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 15, online: true, NUMA node: 1)
        - 1 (id: 31, online: true, NUMA node: 1)
  Frequency: 2195Mhz (min: 2200Mhz, max: 3400Mhz)

Memory:
  NUMA nodes:
    Node 0:
      Free: 67.37GB
      Used: 1.35GB
      Total: 68.72GB
    Node 1:
      Free: 67.29GB
      Used: 1.43GB
      Total: 68.72GB
  Free: 132.41GB
  Used: 5.03GB
  Total: 137.44GB

GPUs:
  Card 0:
    NUMA node: 0
    Vendor: Advanced Micro Devices, Inc. [AMD/ATI] (1002)
    Product: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (67df)
    PCI address: 0000:09:00.0
    Driver: amdgpu (5.11.0-40-generic)
    DRM:
      ID: 0
      Card: card0 (226:0)
      Control: controlD64 (226:0)
      Render: renderD128 (226:128)
  Card 1:
    NUMA node: 1
    Vendor: NVIDIA Corporation (10de)
    Product: GP104 [GeForce GTX 1070] (1b81)
    PCI address: 0000:43:00.0
  Card 2:
    NUMA node: 1
    Vendor: NVIDIA Corporation (10de)
    Product: GK208B [GeForce GT 710] (128b)
    PCI address: 0000:44:00.0

NICs:
  Card 0:
    NUMA node: 0
    Vendor: Intel Corporation (8086)
    Product: I211 Gigabit Network Connection (1539)
    PCI address: 0000:04:00.0
    Driver: igb (5.11.0-40-generic)
    Ports:
      - Port 0 (ethernet)
        ID: enp4s0
        Address: 70:85:c2:65:9f:1b
        Supported modes: 10baseT/Half, 10baseT/Full, 100baseT/Half, 100baseT/Full, 1000baseT/Full
        Supported ports: twisted pair
        Port type: twisted pair
        Transceiver type: internal
        Auto negotiation: true
        Link detected: true
        Link speed: 100Mbit/s (full duplex)
  Card 1:
    NUMA node: 0
    Vendor: Intel Corporation (8086)
    Product: I211 Gigabit Network Connection (1539)
    PCI address: 0000:06:00.0
    Driver: igb (5.11.0-40-generic)
    Ports:
      - Port 0 (ethernet)
        ID: enp6s0
        Address: 70:85:c2:65:9f:19
        Supported modes: 10baseT/Half, 10baseT/Full, 100baseT/Half, 100baseT/Full, 1000baseT/Full
        Supported ports: twisted pair
        Port type: twisted pair
        Transceiver type: internal
        Auto negotiation: true
        Link detected: false
  Card 2:
    NUMA node: 0
    Vendor: Intel Corporation (8086)
    Product: Dual Band Wireless-AC 3168NGW [Stone Peak] (24fb)
    PCI address: 0000:05:00.0
    Driver: iwlwifi (5.11.0-40-generic)
    Ports:
      - Port 0 (ethernet)
        ID: wlp5s0
        Address: 28:c6:3f:15:64:27
        Auto negotiation: false
        Link detected: false

Disks:
  Disk 0:
    NUMA node: 0
    ID: nvme0n1
    Device: 259:1
    Model: Samsung SSD 970 EVO 1TB
    Type: nvme
    Size: 1.00TB
    WWN: eui.0025385581b40e03
    Read-Only: false
    Removable: false
  Disk 1:
    NUMA node: 1
    ID: nvme1n1
    Device: 259:0
    Model: Samsung SSD 960 EVO 1TB
    Type: nvme
    Size: 1.00TB
    WWN: eui.0025385481b1ea19
    Read-Only: false
    Removable: false
  Disk 2:
    NUMA node: 1
    ID: nvme2n1
    Device: 259:2
    Model: Samsung SSD 970 EVO 1TB
    Type: nvme
    Size: 1.00TB
    WWN: eui.0025385581b40c9c
    Read-Only: false
    Removable: false
  Disk 3:
    NUMA node: 0
    ID: sda
    Device: 8:0
    Model: Samsung SSD 860 EVO 250GB
    Type: sata
    Size: 250.06GB
    Read-Only: false
    Removable: false
    Partitions:
      - Partition 1
        ID: sda1
        Device: 8:1
        Read-Only: false
        Size: 68.72GB
      - Partition 2
        ID: sda2
        Device: 8:2
        Read-Only: false
        Size: 181.34GB
  Disk 4:
    NUMA node: 0
    ID: sdb
    Device: 8:16
    Model: Samsung SSD 860 EVO 250GB
    Type: sata
    Size: 250.06GB
    Read-Only: false
    Removable: false
    Partitions:
      - Partition 1
        ID: sdb1
        Device: 8:17
        Read-Only: false
        Size: 68.72GB
      - Partition 2
        ID: sdb2
        Device: 8:18
        Read-Only: false
        Size: 181.34GB
  Disk 5:
    NUMA node: 0
    ID: sdc
    Device: 8:32
    Model: Samsung SSD 850 EVO 500GB
    Type: sata
    Size: 500.11GB
    Read-Only: false
    Removable: false
    Partitions:
      - Partition 1
        ID: sdc1
        Device: 8:33
        Read-Only: false
        Size: 500.11GB
  Disk 6:
    NUMA node: 0
    ID: sdd
    Device: 8:48
    Model: Samsung SSD 850 PRO 256GB
    Type: sata
    Size: 256.06GB
    Read-Only: false
    Removable: false
  Disk 7:
    NUMA node: 0
    ID: sde
    Device: 8:64
    Model: Samsung SSD 850 EVO 120GB
    Type: sata
    Size: 120.03GB
    Read-Only: false
    Removable: false
    Partitions:
      - Partition 1
        ID: sde1
        Device: 8:65
        Read-Only: false
        Size: 1.07GB
      - Partition 2
        ID: sde2
        Device: 8:66
        Read-Only: false
        Size: 4.29GB
      - Partition 3
        ID: sde3
        Device: 8:67
        Read-Only: false
        Size: 114.66GB
  Disk 8:
    NUMA node: 0
    ID: sdf
    Device: 8:80
    Model: Samsung SSD 850 EVO 120GB
    Type: sata
    Size: 120.03GB
    Read-Only: false
    Removable: false
    Partitions:
      - Partition 1
        ID: sdf1
        Device: 8:81
        Read-Only: false
        Size: 1.07GB
      - Partition 2
        ID: sdf2
        Device: 8:82
        Read-Only: false
        Size: 4.29GB
      - Partition 3
        ID: sdf3
        Device: 8:83
        Read-Only: false
        Size: 114.66GB
  Disk 9:
    NUMA node: 0
    ID: sdg
    Device: 8:96
    Model: Cruzer Glide
    Type: usb
    Size: 15.79GB
    Read-Only: false
    Removable: true
    Partitions:
      - Partition 1
        ID: sdg1
        Device: 8:97
        Read-Only: false
        Size: 3.07GB
      - Partition 2
        ID: sdg2
        Device: 8:98
        Read-Only: false
        Size: 4.10MB
      - Partition 3
        ID: sdg3
        Device: 8:99
        Read-Only: false
        Size: 12.72GB
boss@virtland:~$ 
boss@virtland:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Okay, yeah, getting the nvidia driver working first (including nvidia-smi) should get you the NVIDIA specific information in lxc info --resources and will make nvidia.runtime work as expected.

OK, so I got the nvidia drivers working (I had to use the ones from the NVIDIA website, I was not able to make proprietary drivers from the repos work). Unfortunately I still get the same errors when I create the container.

~$  nvidia-smi 
Mon Nov 15 14:33:39 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:43:00.0 Off |                  N/A |
|  0%   41C    P5     9W / 151W |      0MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:44:00.0 N/A |                  N/A |
| 40%   36C    P0    N/A /  N/A |      0MiB /   973MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
~$ lxc info --resources
CPU (x86_64):
  Vendor: AuthenticAMD
  Name: AMD Ryzen Threadripper 1950X 16-Core Processor
  Caches:
    - Level 1 (type: Data): 33kB
    - Level 1 (type: Instruction): 66kB
    - Level 2 (type: Unified): 524kB
    - Level 3 (type: Unified): 8MB
  Cores:
    - Core 0
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 0, online: true, NUMA node: 0)
        - 1 (id: 16, online: true, NUMA node: 0)
    - Core 1
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 1, online: true, NUMA node: 0)
        - 1 (id: 17, online: true, NUMA node: 0)
    - Core 2
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 18, online: true, NUMA node: 0)
        - 1 (id: 2, online: true, NUMA node: 0)
    - Core 3
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 19, online: true, NUMA node: 0)
        - 1 (id: 3, online: true, NUMA node: 0)
    - Core 4
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 20, online: true, NUMA node: 0)
        - 1 (id: 4, online: true, NUMA node: 0)
    - Core 5
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 21, online: true, NUMA node: 0)
        - 1 (id: 5, online: true, NUMA node: 0)
    - Core 6
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 22, online: true, NUMA node: 0)
        - 1 (id: 6, online: true, NUMA node: 0)
    - Core 7
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 23, online: true, NUMA node: 0)
        - 1 (id: 7, online: true, NUMA node: 0)
    - Core 8
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 24, online: true, NUMA node: 1)
        - 1 (id: 8, online: true, NUMA node: 1)
    - Core 9
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 25, online: true, NUMA node: 1)
        - 1 (id: 9, online: true, NUMA node: 1)
    - Core 10
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 10, online: true, NUMA node: 1)
        - 1 (id: 26, online: true, NUMA node: 1)
    - Core 11
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 11, online: true, NUMA node: 1)
        - 1 (id: 27, online: true, NUMA node: 1)
    - Core 12
      Frequency: 2194Mhz
      Threads:
        - 0 (id: 12, online: true, NUMA node: 1)
        - 1 (id: 28, online: true, NUMA node: 1)
    - Core 13
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 13, online: true, NUMA node: 1)
        - 1 (id: 29, online: true, NUMA node: 1)
    - Core 14
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 14, online: true, NUMA node: 1)
        - 1 (id: 30, online: true, NUMA node: 1)
    - Core 15
      Frequency: 2195Mhz
      Threads:
        - 0 (id: 15, online: true, NUMA node: 1)
        - 1 (id: 31, online: true, NUMA node: 1)
  Frequency: 2194Mhz (min: 2200Mhz, max: 3400Mhz)

Memory:
  NUMA nodes:
    Node 0:
      Free: 66.86GB
      Used: 1.86GB
      Total: 68.72GB
    Node 1:
      Free: 67.62GB
      Used: 1.10GB
      Total: 68.72GB
  Free: 132.28GB
  Used: 5.16GB
  Total: 137.44GB

GPUs:
  Card 0:
    NUMA node: 0
    Vendor: Advanced Micro Devices, Inc. [AMD/ATI] (1002)
    Product: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (67df)
    PCI address: 0000:09:00.0
    Driver: amdgpu (5.11.0-40-generic)
    DRM:
      ID: 0
      Card: card0 (226:0)
      Control: controlD64 (226:0)
      Render: renderD128 (226:128)
  Card 1:
    NUMA node: 1
    Vendor: NVIDIA Corporation (10de)
    Product: GP104 [GeForce GTX 1070] (1b81)
    PCI address: 0000:43:00.0
    Driver: nvidia (470.86)
    DRM:
      ID: 1
      Card: card1 (226:1)
      Render: renderD129 (226:129)
    NVIDIA information:
      Architecture: 6.1
      Brand: GeForce
      Model: NVIDIA GeForce GTX 1070
      CUDA Version: 11.4
      NVRM Version: 470.86
      UUID: GPU-0060e8a6-e62c-d88b-12a9-eaeaf2959694
  Card 2:
    NUMA node: 1
    Vendor: NVIDIA Corporation (10de)
    Product: GK208B [GeForce GT 710] (128b)
    PCI address: 0000:44:00.0
    Driver: nvidia (470.86)
    DRM:
      ID: 2
      Card: card2 (226:2)
      Render: renderD130 (226:130)
    NVIDIA information:
      Architecture: 3.5
      Brand: GeForce
      Model: NVIDIA GeForce GT 710
      CUDA Version: 11.4
      NVRM Version: 470.86
      UUID: GPU-ce01ce48-a2bd-a7dc-4ca7-9848a9e05b75

NICs:
  Card 0:
    NUMA node: 0
    Vendor: Intel Corporation (8086)
    Product: I211 Gigabit Network Connection (1539)
    PCI address: 0000:04:00.0
    Driver: igb (5.11.0-40-generic)
    Ports:
      - Port 0 (ethernet)
        ID: enp4s0
        Address: 70:85:c2:65:9f:1b
        Supported modes: 10baseT/Half, 10baseT/Full, 100baseT/Half, 100baseT/Full, 1000baseT/Full
        Supported ports: twisted pair
        Port type: twisted pair
        Transceiver type: internal
        Auto negotiation: true
        Link detected: true
        Link speed: 100Mbit/s (full duplex)
  Card 1:
    NUMA node: 0
    Vendor: Intel Corporation (8086)
    Product: I211 Gigabit Network Connection (1539)
    PCI address: 0000:06:00.0
    Driver: igb (5.11.0-40-generic)
    Ports:
      - Port 0 (ethernet)
        ID: enp6s0
        Address: 70:85:c2:65:9f:19
        Supported modes: 10baseT/Half, 10baseT/Full, 100baseT/Half, 100baseT/Full, 1000baseT/Full
        Supported ports: twisted pair
        Port type: twisted pair
        Transceiver type: internal
        Auto negotiation: true
        Link detected: false
  Card 2:
    NUMA node: 0
    Vendor: Intel Corporation (8086)
    Product: Dual Band Wireless-AC 3168NGW [Stone Peak] (24fb)
    PCI address: 0000:05:00.0
    Driver: iwlwifi (5.11.0-40-generic)
    Ports:
      - Port 0 (ethernet)
        ID: wlp5s0
        Address: 28:c6:3f:15:64:27
        Auto negotiation: false
        Link detected: false

Disks:
  Disk 0:
    NUMA node: 0
    ID: nvme0n1
    Device: 259:2
    Model: Samsung SSD 970 EVO 1TB
    Type: nvme
    Size: 1.00TB
    WWN: eui.0025385581b40e03
    Read-Only: false
    Removable: false
  Disk 1:
    NUMA node: 1
    ID: nvme1n1
    Device: 259:0
    Model: Samsung SSD 960 EVO 1TB
    Type: nvme
    Size: 1.00TB
    WWN: eui.0025385481b1ea19
    Read-Only: false
    Removable: false
  Disk 2:
    NUMA node: 1
    ID: nvme2n1
    Device: 259:1
    Model: Samsung SSD 970 EVO 1TB
    Type: nvme
    Size: 1.00TB
    WWN: eui.0025385581b40c9c
    Read-Only: false
    Removable: false
  Disk 3:
    NUMA node: 0
    ID: sda
    Device: 8:0
    Model: Samsung SSD 860 EVO 250GB
    Type: sata
    Size: 250.06GB
    Read-Only: false
    Removable: false
    Partitions:
      - Partition 1
        ID: sda1
        Device: 8:1
        Read-Only: false
        Size: 68.72GB
      - Partition 2
        ID: sda2
        Device: 8:2
        Read-Only: false
        Size: 181.34GB
  Disk 4:
    NUMA node: 0
    ID: sdb
    Device: 8:16
    Model: Samsung SSD 860 EVO 250GB
    Type: sata
    Size: 250.06GB
    Read-Only: false
    Removable: false
    Partitions:
      - Partition 1
        ID: sdb1
        Device: 8:17
        Read-Only: false
        Size: 68.72GB
      - Partition 2
        ID: sdb2
        Device: 8:18
        Read-Only: false
        Size: 181.34GB
  Disk 5:
    NUMA node: 0
    ID: sdc
    Device: 8:32
    Model: Samsung SSD 850 EVO 500GB
    Type: sata
    Size: 500.11GB
    Read-Only: false
    Removable: false
    Partitions:
      - Partition 1
        ID: sdc1
        Device: 8:33
        Read-Only: false
        Size: 500.11GB
  Disk 6:
    NUMA node: 0
    ID: sdd
    Device: 8:48
    Model: Samsung SSD 850 PRO 256GB
    Type: sata
    Size: 256.06GB
    Read-Only: false
    Removable: false
  Disk 7:
    NUMA node: 0
    ID: sde
    Device: 8:64
    Model: Samsung SSD 850 EVO 120GB
    Type: sata
    Size: 120.03GB
    Read-Only: false
    Removable: false
    Partitions:
      - Partition 1
        ID: sde1
        Device: 8:65
        Read-Only: false
        Size: 1.07GB
      - Partition 2
        ID: sde2
        Device: 8:66
        Read-Only: false
        Size: 4.29GB
      - Partition 3
        ID: sde3
        Device: 8:67
        Read-Only: false
        Size: 114.66GB
  Disk 8:
    NUMA node: 0
    ID: sdf
    Device: 8:80
    Model: Cruzer Glide
    Type: usb
    Size: 15.79GB
    Read-Only: false
    Removable: true
    Partitions:
      - Partition 1
        ID: sdf1
        Device: 8:81
        Read-Only: false
        Size: 3.07GB
      - Partition 2
        ID: sdf2
        Device: 8:82
        Read-Only: false
        Size: 4.10MB
      - Partition 3
        ID: sdf3
        Device: 8:83
        Read-Only: false
        Size: 12.72GB
  Disk 9:
    NUMA node: 0
    ID: sdg
    Device: 8:96
    Model: Samsung SSD 850 EVO 120GB
    Type: sata
    Size: 120.03GB
    Read-Only: false
    Removable: false
    Partitions:
      - Partition 1
        ID: sdg1
        Device: 8:97
        Read-Only: false
        Size: 1.07GB
      - Partition 2
        ID: sdg2
        Device: 8:98
        Read-Only: false
        Size: 4.29GB
      - Partition 3
        ID: sdg3
        Device: 8:99
        Read-Only: false
        Size: 114.66GB

~$ 
~$ lxc launch ubuntu:18.04 --profile default --profile x11 mycontainer
Creating mycontainer
Starting mycontainer
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart mycontainer /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/mycontainer/lxc.conf: 
Try `lxc info --show-log local:mycontainer` for more info


~$ lxc info --show-log local:mycontainer
Name: mycontainer
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2021/11/15 14:34 EST
Last Used: 2021/11/15 14:34 EST

Log:

lxc mycontainer 20211115193405.407 WARN     conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc mycontainer 20211115193405.408 WARN     conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc mycontainer 20211115193405.409 WARN     conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc mycontainer 20211115193405.409 WARN     conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc mycontainer 20211115193405.409 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1251 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc mycontainer 20211115193406.178 ERROR    conf - conf.c:run_buffer:321 - Script exited with status 1
lxc mycontainer 20211115193406.178 ERROR    conf - conf.c:lxc_setup:4386 - Failed to run mount hooks
lxc mycontainer 20211115193406.178 ERROR    start - start.c:do_start:1275 - Failed to setup container "mycontainer"
lxc mycontainer 20211115193406.178 ERROR    sync - sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc mycontainer 20211115193406.185 WARN     network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 0 from "eth0" to its initial name "veth9112102f"
lxc mycontainer 20211115193406.185 ERROR    start - start.c:__lxc_start:2074 - Failed to spawn container "mycontainer"
lxc mycontainer 20211115193406.185 WARN     start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 41 for process 207028
lxc mycontainer 20211115193406.185 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:867 - Received container state "ABORTING" instead of "RUNNING"
lxc mycontainer 20211115193411.319 WARN     conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc mycontainer 20211115193411.319 WARN     conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc 20211115193411.361 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20211115193411.361 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors

~$ 

I just rechecked on our test server with current LXD (4.20) and the 470.82.00 driver from Ubuntu.

root@vm12:~# lxc launch images:ubuntu/20.04 u1 -c nvidia.runtime=true
Creating u1
Starting u1                                   
root@vm12:~# lxc exec u1 nvidia-smi
No devices were found
root@vm12:~# nvidia-smi
Tue Nov 16 05:09:02 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.00    Driver Version: 470.82.00    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:07:00.0 N/A |                  N/A |
| 40%   61C    P0    N/A /  N/A |      0MiB /  2002MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:08:00.0 N/A |                  N/A |
| 39%   58C    P0    N/A /  N/A |      0MiB /  2002MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Can you run:

nsenter --mount=/run/snapd/ns/lxd.mnt env PATH=/snap/lxd/current/bin:${PATH} LD_LIBRARY_PATH=/snap/lxd/current/lib nvidia-container-cli info

Here this gets me:

NVRM version:   470.82.00
CUDA version:   11.4

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce GT 730
Brand:          GeForce
GPU UUID:       GPU-6ddadebd-dafe-2db9-f10f-125719770fd3
Bus Location:   00000000:07:00.0
Architecture:   3.5

Device Index:   1
Device Minor:   1
Model:          NVIDIA GeForce GT 730
Brand:          GeForce
GPU UUID:       GPU-253db1df-f725-a174-99d4-a8933288c39e
Bus Location:   00000000:08:00.0
Architecture:   3.5

Strange, according to that the driver still is not loaded

$ sudo nsenter --mount=/run/snapd/ns/lxd.mnt env PATH=/snap/lxd/current/bin:${PATH} LD_LIBRARY_PATH=/snap/lxd/current/lib nvidia-container-cli info
nvidia-container-cli.real: initialization error: nvml error: driver not loaded

According to lspci -knn the driver is loaded:

44:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 710] [10de:128b] (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. GK208B [GeForce GT 710] [19da:5360]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Edit: Also the output of nvidia-smi is different now as I configured a different GPU for passthrough:

s$ nvidia-smi
Tue Nov 16 05:27:06 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:44:00.0 N/A |                  N/A |
| 40%   38C    P0    N/A /  N/A |      0MiB /   973MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Update:

Some progress! I was able to create the VM with:

DISPLAY=:0 lxc launch ubuntu:18.04 --profile default --profile x11 mycontainer

However it still isn’t working the way I want it to.

ubuntu@mycontainer:~$ xeyes
Error: Can't open display: :0

However if I start the desktop on the host, xeyes from within the container puts xeyes on the desktop.

I thought that perhaps creating .xinitrc with lxc exec mycontainer xeyes might do the trick, but all that got me was an xterm console on the display.

I am still struggling with this. Although I have managed to make xeyes run in a container window on a desktop environment, this has not been consistent and I have not figured out the reason. In my current round of installation, I am again unable to create the container, although now my GPU is recognized.

:~$ lspci -knn | grep -A 4 VGA
09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev cf)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1462:3414]
        Kernel driver in use: vfio-pci
        Kernel modules: amdgpu
09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]
--
43:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070] [10de:1b81] (rev a1)
        Subsystem: eVga.com. Corp. GP104 [GeForce GTX 1070] [3842:5173]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
43:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
--
44:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 710] [10de:128b] (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. GK208B [GeForce GT 710] [19da:5360]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
44:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)
~$ nvidia-smi
Thu Nov 18 09:34:06 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:44:00.0 N/A |                  N/A |
| 40%   40C    P0    N/A /  N/A |      0MiB /   973MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
~$ lxc launch ubuntu:18.04 --profile default --profile x11 mycontainer
Creating mycontainer
Starting mycontainer
Error: Failed to run: /snap/lxd/current/bin/lxd forkstart mycontainer /var/snap/lxd/common/lxd/containers /var/snap/lxd/common/lxd/logs/mycontainer/lxc.conf: 
Try `lxc info --show-log local:mycontainer` for more info
~$ lxc info --show-log local:mycontainer
Name: mycontainer
Status: STOPPED
Type: container
Architecture: x86_64
Created: 2021/11/18 09:34 EST
Last Used: 2021/11/18 09:34 EST

Log:

lxc mycontainer 20211118143446.664 WARN     conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc mycontainer 20211118143446.664 WARN     conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc mycontainer 20211118143446.665 WARN     conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc mycontainer 20211118143446.665 WARN     conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc mycontainer 20211118143446.665 WARN     cgfsng - cgroups/cgfsng.c:fchowmodat:1251 - No such file or directory - Failed to fchownat(40, memory.oom.group, 1000000000, 0, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW )
lxc mycontainer 20211118143447.160 ERROR    conf - conf.c:run_buffer:321 - Script exited with status 1
lxc mycontainer 20211118143447.160 ERROR    conf - conf.c:lxc_setup:4386 - Failed to run mount hooks
lxc mycontainer 20211118143447.160 ERROR    start - start.c:do_start:1275 - Failed to setup container "mycontainer"
lxc mycontainer 20211118143447.160 ERROR    sync - sync.c:sync_wait:34 - An error occurred in another process (expected sequence number 4)
lxc mycontainer 20211118143447.165 WARN     network - network.c:lxc_delete_network_priv:3617 - Failed to rename interface with index 0 from "eth0" to its initial name "vethf4a81b28"
lxc mycontainer 20211118143447.166 ERROR    start - start.c:__lxc_start:2074 - Failed to spawn container "mycontainer"
lxc mycontainer 20211118143447.166 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:867 - Received container state "ABORTING" instead of "RUNNING"
lxc mycontainer 20211118143447.166 WARN     start - start.c:lxc_abort:1039 - No such process - Failed to send SIGKILL via pidfd 41 for process 159006
lxc mycontainer 20211118143452.316 WARN     conf - conf.c:lxc_map_ids:3579 - newuidmap binary is missing
lxc mycontainer 20211118143452.316 WARN     conf - conf.c:lxc_map_ids:3585 - newgidmap binary is missing
lxc 20211118143452.336 ERROR    af_unix - af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20211118143452.336 ERROR    commands - commands.c:lxc_cmd_rsp_recv_fds:127 - Failed to receive file descriptors

I suspect it may have something to do with the order in which I install things (desktop, drivers, lxd).

However, my goal is still to run GUI containers with just X and no window manager.

OK I got it sorted. The missing link was this:


sudo -i

xhost si:localuser:MYUSERID
exit
lxc exec mycontainer -- sudo --user ubuntu --login #and now can run xeyes and glxgears

I had previously tried (as a regular user)


sudo xhost +localhost

but that did not work for me.