My initial, brief experiments with incus and cloud-init were unsuccessful and then I wanted to see how much I could automate when creating new containers on a single incus server with a single shell script without using any other automation tools. There were several steps involved in adding NVIDIA MIG GPUs to containers so I at least wanted to get that part automated, which I have, with some restrictions.
From the README:
"incus-mig.sh is a script to automate the creation of incus Ubuntu containers using NVIDIA GPUs, optionally attaching a NVIDIA MIG or a full NVIDIA PCI express attached GPU.
This script requires that you have incus installed and configured. You must have the CUDA toolkit installed on the incus server and you need to have configured at least one MIG device (using mig-manager) before running the script if you wish to use the -g option."
This script is pretty bespoke to my works GPU server and I don’t have any intention of trying to make it much more general purpose than it already is. Anyone wanting to use this script will need to modify the IP calculation code to suit their network, at least.
The only feature I currently plan to add is to replace the “Full CUDA install” code with code that fetches and installs the latest CUDA toolkit from NVIDIA’s site rather than installing the version from the Ubuntu repos. This feature is only needed for those wanting to run nvcc and build CUDA software within containers.
I’m not taking any feature requests as I’d prefer to keep it small and simple but I’ll probably accept sensible PR’s if they make sense and don’t risk breaking anything.
I’ve also added my nvitop GPU logging script (which has been submitted to the nvitop repo) and the cron command I use to log incus stats for those who think Prometheus is overkill.
Has anyone else written something like this already?