-
Notifications
You must be signed in to change notification settings - Fork 32
Description
RunPod Bug Report: Pods show "RUNNING" but never actually start
Summary
Pods created via REST API show desiredStatus: "RUNNING" but never actually boot. uptimeInSeconds stays at 0, no ports are assigned, no SSH access is possible. The pod consumes credits but does nothing.
Tested Configurations
Test 1: RTX 5090, Community, pytorch image (2026-03-18 15:40 UTC)
Request:
curl -X POST "https://rest.runpod.io/v1/pods" \
-H "Authorization: Bearer $RUNPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "davaz-test-5090",
"gpuTypeIds": ["NVIDIA GeForce RTX 5090"],
"gpuCount": 1,
"imageName": "pytorch/pytorch:2.7.0-cuda12.8-cudnn9-runtime",
"containerDiskInGb": 50,
"volumeInGb": 200,
"ports": ["22/tcp", "8080/http"],
"cloudType": "COMMUNITY",
"dockerStartCmd": ["bash", "-c", "sleep infinity"]
}'Create Response (success):
{
"id": "xbwp04efggmm9i",
"desiredStatus": "RUNNING",
"costPerHr": 0.69,
"gpuCount": 1,
"imageName": "pytorch/pytorch:2.7.0-cuda12.8-cudnn9-runtime",
"machine": {
"gpuTypeId": "NVIDIA GeForce RTX 5090",
"location": "CA",
"diskThroughputMBps": 7300,
"maxDownloadSpeedMbps": 7988
},
"machineId": "8ddg10q4fe7n",
"memoryInGb": 125,
"vcpuCount": 64,
"volumeInGb": 200
}Status after 30 seconds:
{
"desiredStatus": "RUNNING",
"runtime": {
"uptimeInSeconds": 0,
"ports": []
},
"publicIp": ""
}Status after 90 seconds: Identical — uptime still 0, no ports.
Pod was manually destroyed to stop billing.
Test 2: RTX 5090, ubuntu:22.04 image (2026-03-18 09:50 UTC)
Request:
curl -X POST "https://rest.runpod.io/v1/pods" \
-H "Authorization: Bearer $RUNPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "davaz-5090-bare",
"gpuTypeIds": ["NVIDIA GeForce RTX 5090"],
"gpuCount": 1,
"imageName": "ubuntu:22.04",
"containerDiskInGb": 20,
"ports": ["22/tcp"]
}'Result: Same issue — desiredStatus: "RUNNING", uptimeInSeconds: 0, no ports assigned. Pod ran for hours doing nothing, consuming $0.89/hr.
Test 3: RTX PRO 6000, multiple datacenters (2026-03-18, earlier tests)
Same issue with:
- GPU:
NVIDIA RTX PRO 6000 Blackwell Server Edition - Datacenters: EU-RO-1, US-KS-2
- Images: pytorch, nvidia/cuda, ubuntu:22.04
- With and without network volumes
All pods stuck at uptime 0.
Network Volume Issues
Network volumes create successfully but pods using them also fail to start:
POST /v1/pods with "networkVolumeId": "fh1hx9ujcu"
→ {"error": "create pod: There are no instances currently available"}
POST /v1/pods with "networkVolumeId": "3vdttbgap9" (US-KS-2)
→ {"error": "create pod: could not find any pods with required specifications"}
When pods DO get assigned (without network volume, community cloud), they show RUNNING but never boot.
Environment
- API Key: Works correctly (can list pods, create/delete volumes, create pods)
- Account: Has sufficient credit ($5+ balance confirmed)
- SSH Key: Added to account settings at https://console.runpod.io/user/settings
- Date: 2026-03-18
- All tests via REST API v1 (
https://rest.runpod.io/v1/)
Expected Behavior
Pod should transition from RUNNING to actually running with:
uptimeInSeconds> 0- SSH port assigned in
runtime.ports - Container accessible via SSH
Actual Behavior
- Pod shows
desiredStatus: "RUNNING"immediately after creation runtime.uptimeInSecondsstays at 0 indefinitelyruntime.portsstays empty[]publicIpstays empty- No SSH access possible
- Pod consumes credits despite not running
- Destroying and recreating does not help
- Different GPUs, images, datacenters, and configurations all fail the same way
Impact
- Cannot use RunPod for any GPU workload
- Credits consumed by non-functional pods (~$0.89/hr per pod)
- Multiple pods created and destroyed over the day, all with same result
- Competing platforms (vast.ai, TensorDock) work correctly with identical workloads