-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Currently, we store a small file to keep track of the CUDA driver version that was symlinked:
$ cat /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/cuda_version.txt
9.0
The disadvantage is that if a new driver is installed, the host site needs to rerun link_nvidia_host_libraries.sh, even if the driver location hasn't changed. I.e. the symlinks don't need any update - just this hardcoded driver version. To me, this feels like an unnecessary burdon on the host site to have to remember to rerun this every time.
If we could somehow get the driver version dynamically, the only reason a host site would need to rerun the script is if the driver location on the host ever changed - and that's something I'd consider rare (much rarer than a driver update).
Option 1
This print the version:
strings /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/libcuda.so | grep "CUDA version"
{"format":1,"CUDA version":"13.1","driver":[535,550,570,575,580,590],"device":[1,2,7,8,9,10,11,12,13,14]}
CUDA forward compatibility is disabled, but CUDA version [%d.%d] does not match RM version [%d.%d].
Advantages:
- Easy
Disadvantages
- it takes about a second to execute (that's quite slow if you're just trying to load a module).
- You'd need to create a regex, and it's quite sensitive to changes. Of course there is no guarantee that this pattern is consistent accross versions, or even that this string will always be there.
Option 2
We can use a tiny (CPU-only) program
$ cat driver_version.c
#include <cuda.h>
#include <stdio.h>
int main() {
int driverVersion;
cuDriverGetVersion(&driverVersion);
printf("CUDA Driver Version: %d\n", driverVersion);
return 0;
}
compiled with
gcc -o driver_version driver_version.c -lcuda
to dump the driver version. The advantage is that it relies on the runtime linker to find CUDA, and thus will print the version of whatever driver is actually found by the runtime linker in the compatibility layer. That's good, because that is the version we are interested in. Running it is very fast (O(ms)).
One question is how/when we compile this. At compile time, it needs a CUDA module to be loaded (because it needs the CUDA header). At runtime, it doesn't.
Advantages:
- Always get the driver version of the one actually found by the runtime linker
- Relies on stability in the API - but is a very basic API function. Unlikely to ever change.
Disadvantages:
- Compiling on the fly is difficult, because it needs the CUDA module to be already loaded (and this check runs prior to that).
- It may theoretically create a chicken-and-egg situation because you need some degree of compatibility (call signature of cuDriverGetVersion in the header needs to match the one in the driver), to do what is essentially a compatiblity check
Option 3
Just run nvidia-smi --version:
$ time nvidia-smi --version
NVIDIA-SMI version : 590.48.01
NVML version : 590.48
DRIVER version : 590.48.01
CUDA Version : 13.1
It takes ~15 ms on my current system. Fast, but maybe you still want to cache the result.
Advantage:
- No questions about compatiblity, as this comes packaged with the driver (as far as I know)
Disadvantage:
- Slightly slower, so maybe we need to cache the result so it never gets run more than once.