Skip to content

Get CUDA driver version dynamically #189

@casparvl

Description

@casparvl

Currently, we store a small file to keep track of the CUDA driver version that was symlinked:

$ cat /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/cuda_version.txt
9.0

The disadvantage is that if a new driver is installed, the host site needs to rerun link_nvidia_host_libraries.sh, even if the driver location hasn't changed. I.e. the symlinks don't need any update - just this hardcoded driver version. To me, this feels like an unnecessary burdon on the host site to have to remember to rerun this every time.

If we could somehow get the driver version dynamically, the only reason a host site would need to rerun the script is if the driver location on the host ever changed - and that's something I'd consider rare (much rarer than a driver update).

Option 1
This print the version:

strings /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/libcuda.so | grep "CUDA version"
{"format":1,"CUDA version":"13.1","driver":[535,550,570,575,580,590],"device":[1,2,7,8,9,10,11,12,13,14]}
CUDA forward compatibility is disabled, but CUDA version [%d.%d] does not match RM version [%d.%d].

Advantages:

  • Easy

Disadvantages

  • it takes about a second to execute (that's quite slow if you're just trying to load a module).
  • You'd need to create a regex, and it's quite sensitive to changes. Of course there is no guarantee that this pattern is consistent accross versions, or even that this string will always be there.

Option 2
We can use a tiny (CPU-only) program

$ cat driver_version.c
#include <cuda.h>
#include <stdio.h>

int main() {
    int driverVersion;
    cuDriverGetVersion(&driverVersion);
    printf("CUDA Driver Version: %d\n", driverVersion);
    return 0;
}

compiled with

gcc -o driver_version driver_version.c -lcuda

to dump the driver version. The advantage is that it relies on the runtime linker to find CUDA, and thus will print the version of whatever driver is actually found by the runtime linker in the compatibility layer. That's good, because that is the version we are interested in. Running it is very fast (O(ms)).

One question is how/when we compile this. At compile time, it needs a CUDA module to be loaded (because it needs the CUDA header). At runtime, it doesn't.

Advantages:

  • Always get the driver version of the one actually found by the runtime linker
  • Relies on stability in the API - but is a very basic API function. Unlikely to ever change.

Disadvantages:

  • Compiling on the fly is difficult, because it needs the CUDA module to be already loaded (and this check runs prior to that).
  • It may theoretically create a chicken-and-egg situation because you need some degree of compatibility (call signature of cuDriverGetVersion in the header needs to match the one in the driver), to do what is essentially a compatiblity check

Option 3

Just run nvidia-smi --version:

$ time nvidia-smi --version
NVIDIA-SMI version  : 590.48.01
NVML version        : 590.48
DRIVER version      : 590.48.01
CUDA Version        : 13.1

It takes ~15 ms on my current system. Fast, but maybe you still want to cache the result.

Advantage:

  • No questions about compatiblity, as this comes packaged with the driver (as far as I know)

Disadvantage:

  • Slightly slower, so maybe we need to cache the result so it never gets run more than once.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions