Is it possible to have different CUDA versions in different containers?
I need one container with CUDA 11.8 and the other with current 12.x. working together on one host.
if it’s possible, how to do it? Do I need install any CUDA on host? Can the containers be unprivileged?
Based on what I have experimented with CUDA so far, it seems for GPU processing its better to use same CUDA versions on host and containers use GPU pass through to get best performance.
When the CUDA version in host and container mismatch, I get errors in NVCC and some python libraries using it especially when using triton or ray for large language model inference server. Not looked into details on how to run different CUDA versions in host and container yet.
You can look at GPU data processing inside LXD, bit old but should give you general idea.
I think I solved the problem with a pretty nasty but efficient hack.
I installed the newest CUDA in the host and delegated to containers with the usual way. In the containers I need an older CUDA 11.x I just brutally symlinked libraries 12.x to 11.x and voila! It seems that older software is able to find all it needs from the newer libraries. Pytorch 11.x works as it should.