Get CUDA driver version dynamically

Currently, we store a small file to keep track of the CUDA driver version that was symlinked:

```
$ cat /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/cuda_version.txt
9.0
```

The disadvantage is that if a new driver is installed, the host site needs to rerun `link_nvidia_host_libraries.sh`, even if the driver _location_ hasn't changed. I.e. the symlinks don't need any update - just this hardcoded driver version. To me, this feels like an unnecessary burdon on the host site to have to remember to rerun this every time.

If we could somehow get the driver version dynamically, the _only_ reason a host site would need to rerun the script is if the driver location on the host ever changed - and that's something I'd consider rare (much rarer than a driver update).

**Option 1**
This print the version:
```
strings /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/libcuda.so | grep "CUDA version"
{"format":1,"CUDA version":"13.1","driver":[535,550,570,575,580,590],"device":[1,2,7,8,9,10,11,12,13,14]}
CUDA forward compatibility is disabled, but CUDA version [%d.%d] does not match RM version [%d.%d].
```

Advantages:
- Easy

Disadvantages

- it takes about a second to execute (that's quite slow if you're just trying to load a module).
- You'd need to create a regex, and it's quite sensitive to changes. Of course there is no guarantee that this pattern is consistent accross versions, or even that this string will _always_ be there.

**Option 2**
We can use a tiny (CPU-only) program
```
$ cat driver_version.c
#include <cuda.h>
#include <stdio.h>

int main() {
    int driverVersion;
    cuDriverGetVersion(&driverVersion);
    printf("CUDA Driver Version: %d\n", driverVersion);
    return 0;
}
```
compiled with
```
gcc -o driver_version driver_version.c -lcuda
```
to dump the driver version. The advantage is that it relies on the runtime linker to find CUDA, and thus will print the version of whatever driver is _actually_ found by the runtime linker in the compatibility layer. That's good, because _that_ is the version we are interested in. Running it is very fast (O(ms)).

One question is how/when we compile this. At compile time, it needs a CUDA module to be loaded (because it needs the CUDA header). At runtime, it doesn't. 

Advantages:

- Always get the driver version of the one _actually_ found by the runtime linker
- Relies on stability in the API - but is a very basic API function. Unlikely to ever change.

Disadvantages:

- Compiling on the fly is difficult, because it needs the CUDA module to be already loaded (and this check runs prior to that).
- It may theoretically create a chicken-and-egg situation because you need some degree of compatibility (call signature of cuDriverGetVersion in the header needs to match the one in the driver), to do what is essentially a compatiblity check

**Option 3**

Just run `nvidia-smi --version`:

```
$ time nvidia-smi --version
NVIDIA-SMI version  : 590.48.01
NVML version        : 590.48
DRIVER version      : 590.48.01
CUDA Version        : 13.1
```
It takes ~15 ms on my current system. Fast, but maybe you still want to cache the result.

Advantage:

- No questions about compatiblity, as this comes packaged with the driver (as far as I know)

Disadvantage:

- Slightly slower, so maybe we need to cache the result so it never gets run more than once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get CUDA driver version dynamically #189

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Get CUDA driver version dynamically #189

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions