AMD ROCm is officially supported only on a few consumer-grade GPUs, mainly Radeon RX 7900 GRE and above. But ROCm consists of many things: compilers, runtime libraries, Ai-related libraries, etc. Often we just need a subset of this for our purposes. Fortunately, we don’t even need the DKMS module to use LLMs, which means we can install ROCm in a container and run any model using llama.cpp or Ollama on almost any AMD GPU, including APUs.
This guide is the basis for subsequent tutorials on how to run highly dangerous, potentially world-ending Ai in 100% secure and guaranteed Ai-proof Incus containers:
- Ai tutorial: llama.cpp and Ollama servers + plugins for VS Code / VS Codium and IntelliJ
- Ai tutorial: Stable Diffusion SDXL with Fooocus
- Ai tutorial: LLMs in LM Studio
Note: I’m using AMD 5600G APU, but most of what you see here also applies to discrete GPUs. Whenever something is APU specific, I will mark it as such.
Table of Content
- Preparing container
- ROCm
- Environment variables
- VRAM (for APUs only)
- PyTorch (optional)
Preparing container
On the host I’m using vanilla Ubuntu 22.04 with HWE kernel, without additional amdgpu
driver and DKMS module. The containers are also Ubuntu 22.04, and require access to the GPU. If you intend to run GUI applications in them, use a GUI profile, otherwise only pass your GPU. The value 44
in gid=44
is the GID of the video
group in the container:
incus launch images:ubuntu/jammy/cloud <container_name>
incus config device add <container_name> gpu gpu gid=44
If you have two GPUs, you should only pass one to avoid confusion. To do this, first check the PCI addresses of available GPUs and use pci=
option:
incus info --resources | grep "PCI address: " -B 4
incus config device add <container_name> gpu gpu gid=44 pci="0000:XX:XX.X"
We also need access to /dev/kfd
with GID of the render
group:
incus config device add <container_name> dev_kfd unix-char source=/dev/kfd path=/dev/kfd gid=110
Now let’s log in to the container using the default ubuntu user:
incus exec <container_name> -- sudo --login --user ubuntu
Make sure ubuntu user is in video
and render
groups:
groups
sudo usermod -a -G render,video $LOGNAME
This may require a restart for it to take effect.
ROCm
We need to decide which version of ROCm we’re going to install. Even though ROCm 6.0+ has discontinued support for GCN-based cards like the one in my 5600G APU, I still use the latest version. I just had to set two environment variables instead of one, which I describe in the next section.
To run ROCm, we need to download and install the AMD Linux drivers in a container. It’s around 30 GB in size, so don’t be surprised. The most up-to-date link can be found on the official website (also look here). At the time of writing, it was 6.1.60100-1
:
sudo apt install wget
wget https://repo.radeon.com/amdgpu-install/6.1/ubuntu/jammy/amdgpu-install_6.1.60100-1_all.deb
sudo apt install ./amdgpu-install_6.1.60100-1_all.deb
Now let’s install ROCm, but without DKMS module:
sudo amdgpu-install --usecase=rocm --no-dkms
If ROCm 6.1 doesn’t work for you, you can try last 5.7 version:
wget https://repo.radeon.com/amdgpu-install/5.7.3/ubuntu/jammy/amdgpu-install_5.7.50703-1_all.deb
sudo apt install ./amdgpu-install_5.7.50703-1_all.deb
sudo amdgpu-install --usecase=rocm --no-dkms
The update procedure via the installation script is exactly the same as when installing ROCm for the first time.
After installation, we have access to rocm-smi
, rocminfo
and clinfo
commands, which should detect our GPU. To check which version of ROCm you have, use the command apt show rocm-libs -a
.
Environment variables
Before we can run an application that depends on ROCm, we need to present our GPU as supported. This requires setting HSA_OVERRIDE_GFX_VERSION
environment variable (values are taken from here):
- for GCN 5th gen based GPUs and APUs
HSA_OVERRIDE_GFX_VERSION=9.0.0
- for RDNA 1 based GPUs and APUs
HSA_OVERRIDE_GFX_VERSION=10.1.0
- for RDNA 2 based GPUs and APUs
HSA_OVERRIDE_GFX_VERSION=10.3.0
- for RDNA 3 based GPUs and APUs
HSA_OVERRIDE_GFX_VERSION=11.0.0
I’m not entirely sure if this also applies to discrete GPUs, but on ROCm 6.1 my APU requires an additional environment variable HSA_ENABLE_SDMA=0
(skip it when using ROCm 5.7). Both can be added to the .profile
file:
echo "export HSA_OVERRIDE_GFX_VERSION=9.0.0" >> .profile
echo "export HSA_ENABLE_SDMA=0" >> .profile
VRAM (for APUs only)
The most foolproof method for running LLM models is to assign fixed amount of VRAM to your APU. This amount will be subtracted from your RAM. On my Asrock board, the VRAM value can be set in the UEFI/BIOS, and it should be approximately at least 0,5 GiB more than the size of the downloaded model you will use.
UEFI/BIOS -> Advanced -> AMD CBS -> NBIO -> GFX -> iGPU -> UMA_SPECIFIED
Some laptops do not have access to this option. Then you can try UniversalAMDFormBrowser. With this tool, you can access and modify AMD PBS/AMD CBS Menu. Simply extract UniversalAMDFormBrowser.zip to FAT32 formatted USB stick and boot from it (disable Secure Boot first).
Ollama (unless compiled by hand) and LM Studio will not work without sufficiently large VRAM, but reserving fixed amount of VRAM is a bit of a waste because the RAM is permanently reduced even if we don’t use the model.
Fortunately, other apps such as llama.cpp and Stable Diffusion GUIs like Fooocus can use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU. Also Ollama can be compiled with just two changed lines of code to support UMA, so we don’t have to assign fixed amount of VRAM to the APU. In that case set iGPU to UMA_AUTO
in UEFI/BIOS:
UEFI/BIOS -> Advanced -> AMD CBS -> NBIO -> GFX -> iGPU -> UMA_AUTO
Llama.cpp supports UMA this natively, and you can read about it in this tutorial. But for Fooocus and other PyTorch GUIs for Stable Diffusion (more about Fooocus in this tutorial) we have to use force-host-alloction-APU. The way it works, it uses LD_PRELOAD
to load the functions hipMalloc
and hipFree
before ROCm runtime and therefore is able to intercept those function calls and then forward them to hipHostMalloc
and hipHostFree
.
You can compile force-host-alloction-APU
once in a container with ROCm, and copy it to other containers, but it has to be recompiled for every new ROCm version. First, check if hipcc
is installed:
apt list --installed hipcc
sudo apt install git
git clone https://github.com/segurac/force-host-alloction-APU
cd force-host-alloction-APU
CUDA_PATH=/usr/ HIP_PLATFORM="amd" hipcc forcegttalloc.c -o libforcegttalloc.so -shared -fPIC
Now we can start Fooocus or other PyTorch apps with LD_PRELOAD
. Notice a ./
before libforcegttalloc.so
:
LD_PRELOAD=~/force-host-alloction-APU/./libforcegttalloc.so python3 ~/Fooocus/entry_with_update.py --listen --port 8888 --always-high-vram
PyTorch (optional)
You should only install PyTorch if your application explicitly requires it. You don’t need it for llama.cpp, Ollama or LM Studio. Links to the latest versions are on the official website. Generally you want PyTorch for the version of ROCm you have installed. At the time of writing, for ROCm 6.1 that was a nightly 6.0
(soon it will be nightly 6.1
), but we need to install pip
first:
sudo apt install python3 python3-pip
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0
For ROCm 5.7:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
That’s all. As I said, this tutorial is the basis for subsequent tutorials on running generative models in Incus. You will find the links at the beginning of this post. If you have any questions, feel free to ask. Feedback, corrections and tips are greatly appreciated.
P.S.
Incus is such a wonderful tool for doing experiments like this. Without it, I would have messed up my system many times and potentially unleashed rogue Ai. So thank you for developing Incus and saving the world.