feat: default GPU endpoints to minCudaVersion 12.8 by KAJdev · Pull Request #277 · runpod/flash

KAJdev · 2026-03-17T21:36:19Z

GPU endpoints default to minCudaVersion = "12.8" to ensure workers only run on hosts with a recent CUDA driver. The value can be overridden per-endpoint via Endpoint(min_cuda_version=...) or directly on resource classes. CPU endpoints always have minCudaVersion cleared and excluded from their API payload.

Validation

minCudaVersion is validated against the CudaVersion enum. Invalid values raise a ValueError listing the accepted versions.

Closes AE-2408

…provisioner

…t detection docs

runpod-Henrik

Question: Two test gaps in the plumbing

endpoint.py and resource_provisioner.py both changed to carry minCudaVersion through, but neither has a test:

No test for Endpoint(min_cuda_version="12.4") → _build_resource_config() → resource gets minCudaVersion="12.4"
No test for create_resource_from_manifest with {"minCudaVersion": "12.4"} in the manifest data → resource gets the value

The field-level tests on ServerlessResource are solid, but the plumbing from Endpoint decorator to provisioner is untested end-to-end.

Verdict

Clean, well-structured change with good test coverage on the field itself. The two plumbing tests above would close the remaining coverage gap.

runpod-Henrik

1. Core change — clean

minCudaVersion added to ServerlessResource (default "12.8"), exposed as min_cuda_version on Endpoint, cleared for CPU, validated against the CudaVersion enum, plumbed through manifest → provisioner → GraphQL query, and included in _hashed_fields / _has_structural_changes. Test coverage is solid — 10 tests in TestMinCudaVersion covering defaults, overrides, validation, CPU clearing, hash and structural-change behaviour.

2. Issue: Existing GPU endpoints get silently re-provisioned on next deploy

minCudaVersion is in _hashed_fields. Existing deployed endpoints have no minCudaVersion in their stored config. After upgrading, the first flash deploy sees the new "12.8" default as a structural change and triggers re-provisioning for every GPU endpoint — even if nothing else changed. For busy production endpoints that's an unexpected rolling restart with no warning.

3. Issue: No way to opt out of the `"12.8"` floor

In _build_resource_config:

if self.min_cuda_version is not None:
    kwargs["minCudaVersion"] = self.min_cuda_version

None means "don't include in kwargs", which causes ServerlessResource to fall back to its "12.8" default. A user who passes Endpoint(min_cuda_version=None) expecting to remove the constraint gets "12.8" silently. If there are workloads that need to run on older drivers, they have no path to opt out.

Nit: SDK Reference shows `None` as the constructor default

min_cuda_version: Optional[str] = None

But the table note says "GPU endpoints default to "12.8" when not set." The effective default for GPU is "12.8" — the None in the Endpoint signature is an implementation detail. Users who see = None and try passing it explicitly to clear the constraint will be confused when it doesn't work. Consider documenting the signature as min_cuda_version: Optional[str] = None # GPU endpoints default to "12.8" or updating the table default column to show "12.8" for GPU.

Verdict: PASS WITH NITS

The implementation is correct. Items 2 and 3 are worth a quick look before merge — particularly whether the silent re-provision on upgrade is acceptable or needs a migration note in the changelog.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

KAJdev added 4 commits March 17, 2026 14:41

feat: default GPU endpoints to minCudaVersion 12.8

09dc76b

feat: plumb minCudaVersion through Endpoint decorator, manifest, and …

0b99052

…provisioner

docs: add minCudaVersion to SDK reference, GPU provisioning, and drif…

682eda9

…t detection docs

fix: validate minCudaVersion against known CudaVersion values

8b625a6

KAJdev force-pushed the zeke/ae-2408-flash-default-gpu-endpoints-to-mincuda-128 branch from 6b0514a to 8b625a6 Compare March 17, 2026 21:49

runpod-Henrik reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: default GPU endpoints to minCudaVersion 12.8#277

feat: default GPU endpoints to minCudaVersion 12.8#277
KAJdev wants to merge 4 commits intomainfrom
zeke/ae-2408-flash-default-gpu-endpoints-to-mincuda-128

KAJdev commented Mar 17, 2026

Uh oh!

runpod-Henrik left a comment

Uh oh!

runpod-Henrik left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KAJdev commented Mar 17, 2026

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

Question: Two test gaps in the plumbing

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. Core change — clean

2. Issue: Existing GPU endpoints get silently re-provisioned on next deploy

3. Issue: No way to opt out of the "12.8" floor

Nit: SDK Reference shows None as the constructor default

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

3. Issue: No way to opt out of the `"12.8"` floor

Nit: SDK Reference shows `None` as the constructor default