Conductor is a platform for managing models, model runners, model configurations, and virtualizing combinations into virtual model runners exposed to the network through OpenAI, vLLM, Gemini, and Ollama APIs.
- Multi-tenant Architecture: Full tenant isolation with tenant-scoped data access
- Model Runner Endpoints: Define and manage first-class endpoint types for OpenAI, vLLM, Gemini, and Ollama model runners
- Model Definitions: Catalog your models with metadata like family, parameter size, and quantization
- Model Configurations: Create reusable configurations with pinned properties for embeddings and completions
- Virtual Model Runners: Combine endpoints and configurations into virtual endpoints with load balancing
- Configuration Pinning: Automatically inject model parameters into requests (like OllamaFlow)
- Session Affinity: Pin clients to specific backend endpoints based on IP address, API key, or custom headers to minimize context drops and model swapping
- Load Balancing: Round-robin, random, or first-available endpoint selection with weighted distribution and optional session affinity
- Health Checking: Automatic background health monitoring of endpoints with configurable thresholds
- Rate Limiting: Per-endpoint maximum parallel request limits with automatic capacity management
- Request History: Optional per-VMR request/response capture for debugging and auditing with configurable retention
- React Dashboard: Full-featured UI for managing all entities including real-time health status
cd docker
docker compose up -dThe server will be available at http://localhost:9000 and the dashboard at http://localhost:9100.
- .NET 10 SDK
- Node.js 20+
cd src/Conductor.Server
dotnet runcd dashboard
npm install
npm run devConductor currently supports four model runner provider types in both the backend proxy and the dashboard:
| Provider Type | Runner Type in UI | Proxied API Shape | Notes |
|---|---|---|---|
| OpenAI | OpenAI |
OpenAI REST API | Supports OpenAI-style chat, embeddings, and model listing |
| vLLM | vLLM |
OpenAI-compatible REST API | First-class runner type in the UI; uses the OpenAI-compatible API surface |
| Gemini | Gemini |
Gemini REST API | Supports Gemini-style models/{model}:generateContent, streaming, embeddings, and model listing |
| Ollama | Ollama |
Ollama REST API | Supports Ollama-style /api/generate, /api/chat, and embeddings flows |
Conductor supports two authentication methods:
- Header-based: Include
x-tenant-id,x-email, andx-passwordheaders - Bearer Token: Include
Authorization: Bearer {token}header
Users have three permission levels:
| Permission | Description |
|---|---|
Global Admin (IsAdmin=true) |
Full cross-tenant access to all resources |
Tenant Admin (IsTenantAdmin=true) |
Can manage users and credentials within their own tenant |
| Standard User | Can only access model configurations, endpoints, runners, and virtual runners in their tenant |
- Global Admins can operate on any tenant by specifying
TenantIdin their requests - Tenant Admins have elevated privileges within their assigned tenant
- Standard Users have read/write access to non-administrative resources
| Entity | Prefix | API Endpoint |
|---|---|---|
| Administrator | admin_ |
/v1.0/administrators |
| Tenant | ten_ |
/v1.0/tenants |
| User | usr_ |
/v1.0/users |
| Credential | cred_ |
/v1.0/credentials |
| Model Runner Endpoint | mre_ |
/v1.0/modelrunnerendpoints |
| Model Definition | md_ |
/v1.0/modeldefinitions |
| Model Configuration | mc_ |
/v1.0/modelconfigurations |
| Virtual Model Runner | vmr_ |
/v1.0/virtualmodelrunners |
| Request History | req_ |
/v1.0/requesthistory |
| Request History Summary | - | /v1.0/requesthistory/summary |
Virtual model runners expose an API at their configured base path. For example, a VMR with base path /v1.0/api/my-vmr/ would expose:
- OpenAI API:
/v1.0/api/my-vmr/v1/chat/completions,/v1.0/api/my-vmr/v1/embeddings - vLLM API:
/v1.0/api/my-vmr/v1/chat/completions,/v1.0/api/my-vmr/v1/embeddings - Gemini API:
/v1.0/api/my-vmr/v1beta/models/gemini-2.5-flash:generateContent,/v1.0/api/my-vmr/v1beta/models/text-embedding-004:embedContent - Ollama API:
/v1.0/api/my-vmr/api/generate,/v1.0/api/my-vmr/api/chat
{
"Webserver": {
"Hostname": "localhost",
"Port": 9000,
"Ssl": false,
"Cors": {
"Enabled": false,
"AllowedOrigins": [],
"AllowedMethods": ["GET", "POST", "PUT", "DELETE", "OPTIONS"],
"AllowedHeaders": ["Content-Type", "Authorization"],
"ExposedHeaders": [],
"AllowCredentials": false,
"MaxAgeSeconds": 86400
}
},
"Database": {
"Type": "Sqlite",
"Filename": "./conductor.db"
},
"Logging": {
"Servers": [],
"LogDirectory": "./logs/",
"LogFilename": "conductor.log",
"ConsoleLogging": true,
"MinimumSeverity": 0
},
"RequestHistory": {
"Enabled": true,
"Directory": "./request-history/",
"RetentionDays": 7,
"CleanupIntervalMinutes": 60,
"MaxRequestBodyBytes": 65536,
"MaxResponseBodyBytes": 65536
}
}- SQLite (default):
"Type": "Sqlite", "Filename": "./conductor.db" - PostgreSQL:
"Type": "PostgreSql", "ConnectionString": "Host=..." - SQL Server:
"Type": "SqlServer", "ConnectionString": "Server=..." - MySQL:
"Type": "MySql", "ConnectionString": "Server=..."
Cross-Origin Resource Sharing (CORS) can be enabled to allow browser-based applications to access the Conductor API.
| Property | Type | Default | Description |
|---|---|---|---|
Enabled |
bool | false |
Enable or disable CORS support |
AllowedOrigins |
string[] | [] |
List of allowed origins. Use ["*"] for all origins |
AllowedMethods |
string[] | ["GET", "POST", "PUT", "DELETE", "OPTIONS"] |
Allowed HTTP methods |
AllowedHeaders |
string[] | ["Content-Type", "Authorization", ...] |
Allowed request headers |
ExposedHeaders |
string[] | [] |
Headers exposed to the browser |
AllowCredentials |
bool | false |
Allow credentials (cookies, auth headers). Cannot be used with AllowedOrigins: ["*"] |
MaxAgeSeconds |
int | 86400 |
Preflight cache duration (0-86400 seconds) |
Example: Allow all origins (development)
{
"Webserver": {
"Cors": {
"Enabled": true,
"AllowedOrigins": ["*"]
}
}
}Example: Restrict to specific origins (production)
{
"Webserver": {
"Cors": {
"Enabled": true,
"AllowedOrigins": ["https://app.example.com", "https://admin.example.com"],
"AllowCredentials": true
}
}
}Request history captures request/response data for Virtual Model Runners with RequestHistoryEnabled set to true. This is useful for debugging, auditing, and troubleshooting.
| Property | Type | Default | Description |
|---|---|---|---|
Enabled |
bool | true |
Enable or disable request history globally |
Directory |
string | "./request-history/" |
Directory for storing request detail JSON files |
RetentionDays |
int | 30 |
Number of days to retain entries before cleanup (1-365) |
CleanupIntervalMinutes |
int | 60 |
Interval between cleanup runs in minutes (1-1440) |
MaxRequestBodyBytes |
int | 65536 |
Maximum request body bytes to capture (1-10485760) |
MaxResponseBodyBytes |
int | 65536 |
Maximum response body bytes to capture (1-10485760) |
Note: Request history must be enabled both globally (in conductor.json) and per-VMR (via the RequestHistoryEnabled property).
The summary endpoint returns aggregated request counts grouped by time buckets, useful for charting request volume and success/failure rates over time.
GET /v1.0/requesthistory/summary?startUtc={ISO8601}&endUtc={ISO8601}&interval={hour|day}&vmrGuid={guid}
| Parameter | Type | Required | Description |
|---|---|---|---|
startUtc |
string | No | Start of time range (UTC, ISO 8601). Default: 1 hour ago |
endUtc |
string | No | End of time range (UTC, ISO 8601). Default: now |
interval |
string | No | Bucket interval: minute, 15minute, hour, 6hour, or day. Default: hour |
vmrGuid |
string | No | Filter by Virtual Model Runner GUID |
Response:
{
"Data": [
{
"TimestampUtc": "2026-03-20T10:00:00Z",
"SuccessCount": 42,
"FailureCount": 3,
"TotalCount": 45
}
],
"StartUtc": "2026-03-20T10:00:00Z",
"EndUtc": "2026-03-20T11:00:00Z",
"Interval": "hour",
"TotalSuccess": 42,
"TotalFailure": 3,
"TotalRequests": 45
}Success is defined as HTTP status 100-399; failure is HTTP status 400-599 or null (incomplete requests).
Model configurations can define pinned properties that are automatically merged into incoming requests:
{
"Name": "Low Temperature Config",
"PinnedCompletionsProperties": {
"temperature": 0.3,
"top_p": 0.9,
"max_tokens": 2048
},
"PinnedEmbeddingsProperties": {
"model": "text-embedding-ada-002"
}
}When a request comes through a virtual model runner, the pinned properties are merged with the request body, allowing you to enforce specific model parameters.
Model Runner Endpoints support comprehensive health checking with the following properties:
| Property | Type | Default | Description |
|---|---|---|---|
HealthCheckUrl |
string | / |
URL path appended to endpoint base URL for health checks |
HealthCheckMethod |
enum | GET |
HTTP method (GET or HEAD) |
HealthCheckIntervalMs |
int | 5000 |
Milliseconds between health checks |
HealthCheckTimeoutMs |
int | 5000 |
Timeout for health check requests |
HealthCheckExpectedStatusCode |
int | 200 |
Expected HTTP status code for healthy |
UnhealthyThreshold |
int | 2 |
Consecutive failures before marking unhealthy |
HealthyThreshold |
int | 2 |
Consecutive successes before marking healthy |
HealthCheckUseAuth |
bool | false |
Include API key (Bearer token) in health check requests |
MaxParallelRequests |
int | 4 |
Maximum concurrent requests (0 = unlimited) |
Weight |
int | 1 |
Relative weight for load balancing (1-1000) |
Note for OpenAI and vLLM APIs: When using api.openai.com or another OpenAI-compatible backend that requires authentication for model listing, set HealthCheckUseAuth to true and HealthCheckUrl to /v1/models.
Note for Gemini API: When using generativelanguage.googleapis.com, set HealthCheckUseAuth to true and HealthCheckUrl to /v1beta/models. Gemini uses the x-goog-api-key header rather than bearer token authentication.
- Endpoints start in an unhealthy state and transition to healthy after meeting the
HealthyThreshold - Background tasks continuously monitor each active endpoint at the configured interval
- The proxy automatically excludes unhealthy endpoints from request routing
- When all endpoints are unhealthy, requests return
502 Bad Gateway - When all endpoints are at capacity, requests return
429 Too Many Requests
- Each endpoint tracks in-flight requests in real-time
- The
MaxParallelRequestsproperty enforces a per-endpoint concurrency limit - Set to
0for unlimited concurrent requests - Requests are counted from start until the response completes (including streaming)
- The
Weightproperty influences endpoint selection in round-robin and random modes - Higher weight = more traffic directed to that endpoint
- Example: Endpoint A (weight=3) receives 3x more traffic than Endpoint B (weight=1)
Monitor endpoint health via the REST API:
# Health of all endpoints in tenant
GET /v1.0/modelrunnerendpoints/health
# Health of endpoints for a specific VMR
GET /v1.0/virtualmodelrunners/{id}/healthResponse includes:
- Current health state (healthy/unhealthy)
- In-flight request count
- Total uptime/downtime
- Uptime percentage
- Last check timestamp
- Last error message (if any)
- Server:
jchristn77/conductor:latest - Dashboard:
jchristn77/conductor-ui:latest
# Build server
./build-server.sh # or build-server.bat on Windows
# Build dashboard
./build-dashboard.sh # or build-dashboard.bat on WindowsMIT License - see LICENSE.md for details.