Status amdgpu support

The status of amdgpu support is unclear to me.
I’m running 202601172317.
If I’m correct, there are commits for amdgpu support in version.
However:
’incus admin os system resources show’ shows:

gpu:
cards:

  • driver: vfio-pci
    driver_version: 6.18.5-zabbly+
    numa_node: 0
    pci_address: 0000:c1:00.0
    product: Strix Halo [Radeon Graphics / Radeon 8050S Graphics / Radeon 8060S Graphics]
    product_id: “1586”
    vendor: Advanced Micro Devices, Inc. [AMD/ATI]
    vendor_id: “1002”

It shows vfio-pci driver is used and I expected amdgpu driver here.

Do I need to do a new install of incus-os?

Or are there still more commits/configuration needed to get this working?

(I want to get amdgpu/rocm working in a VM for running LLMs)

Thanks

vfio-pci is correct if you want to pass it through to a VM.
amdgpu is what you’d want to have it be used by a container.

OK, if that is correct, lets give some details what fails:

in my ubuntu VM

dmesg | grep amdgpu

delivers

[ 5.135315] [drm] amdgpu kernel modesetting enabled.
[ 5.135493] amdgpu: Virtual CRAT table created for CPU
[ 5.135506] amdgpu: Topology: Add CPU node
[ 5.139256] amdgpu 0000:06:00.0: amdgpu: detected ip block number 0 <soc21_common>
[ 5.139259] amdgpu 0000:06:00.0: amdgpu: detected ip block number 1 <gmc_v11_0>
[ 5.139261] amdgpu 0000:06:00.0: amdgpu: detected ip block number 2 <ih_v6_1>
[ 5.139262] amdgpu 0000:06:00.0: amdgpu: detected ip block number 3
[ 5.139264] amdgpu 0000:06:00.0: amdgpu: detected ip block number 4
[ 5.139265] amdgpu 0000:06:00.0: amdgpu: detected ip block number 5
[ 5.139266] amdgpu 0000:06:00.0: amdgpu: detected ip block number 6 <gfx_v11_0>
[ 5.139268] amdgpu 0000:06:00.0: amdgpu: detected ip block number 7 <sdma_v6_0>
[ 5.139269] amdgpu 0000:06:00.0: amdgpu: detected ip block number 8 <vcn_v4_0_5>
[ 5.139270] amdgpu 0000:06:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0_5>
[ 5.139271] amdgpu 0000:06:00.0: amdgpu: detected ip block number 10 <mes_v11_0>
[ 5.139272] amdgpu 0000:06:00.0: amdgpu: detected ip block number 11 <vpe_v6_1>
[ 5.139273] amdgpu 0000:06:00.0: amdgpu: detected ip block number 12 <isp_ip>
[ 5.363657] amdgpu 0000:06:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 5.363669] amdgpu: ATOM BIOS: 113-STRXLGEN-001
[ 5.378668] amdgpu 0000:06:00.0: amdgpu: VPE: collaborate mode true
[ 5.388876] amdgpu 0000:06:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 5.388959] amdgpu 0000:06:00.0: amdgpu: VRAM: 512M 0x0000008000000000 - 0x000000801FFFFFFF (512M used)
[ 5.388962] amdgpu 0000:06:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[ 5.389189] [drm] amdgpu: 512M of VRAM memory ready
[ 5.389193] [drm] amdgpu: 40960M of GTT memory ready.
[ 29.332666] amdgpu 0000:06:00.0: amdgpu: psp gfx command UNKNOWN CMD(0xFFFFFFFF) failed and response status is (0xFFFFFFFF)
[ 29.332674] amdgpu 0000:06:00.0: amdgpu: Failed to load toc
[ 29.333171] amdgpu 0000:06:00.0: amdgpu: PSP tmr init failed!
[ 29.353821] amdgpu 0000:06:00.0: amdgpu: PSP firmware loading failed
[ 29.354891] [drm:amdgpu_device_fw_loading [amdgpu]] ERROR hw_init of IP block failed -22
[ 29.355882] amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
[ 29.356441] amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
[ 29.357010] amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.

So the initialization of the GPU fails.

also rocm-smi does not detect AMD-GPUs:
WARNING: No AMD GPUs specified
WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status

I’m running with
Linux 6.14.0-1017-oem #17-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov 24 08:52:02 UTC 2025 GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash amd_iommu=-on iommu=pt”
and
devices:
eth0:
network: enp191s0
type: nic
gpu0:
pci: 0000:c1:00.0
type: gpu

Any tips what could be wrong?
For so far I can see, is the error the same as with an older version of incusos without gpu-support.
(is 0000:06:00.0 correct?)

gpu-support is there for containers, it doesn’t do anything for VMs, so it’s normal that you wouldn’t see a difference.

The error appears to be related to firmware loading, does the guest OS have a recent enough version of linux-firmware?

Thanks for your fast reply.
I ‘ve installed a newer version of ROCM (7.2). The install procedure for ubuntu specifies ubuntu 24.0.03. and to install linux-oem-24.04c. This installs Linux 6.14.0-1019-oem.
In the ubuntu vm I’ve set
GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash amd_iommu=-on iommu=pt”
(incusOS is running secure boot, the VM is not).

The result and errors are the same as given earlier.

Is there any other configuration needed in incusOS to get this working (for proxmox I see they are blacklisting the gpu, etc)?