Skip to content

jchristn/Conductor

Repository files navigation

Conductor

Conductor

Conductor is a platform for managing models, model runners, model configurations, and virtualizing combinations into virtual model runners exposed to the network through OpenAI, vLLM, Gemini, and Ollama APIs.

Features

  • Multi-tenant Architecture: Full tenant isolation with tenant-scoped data access
  • Model Runner Endpoints: Define and manage first-class endpoint types for OpenAI, vLLM, Gemini, and Ollama model runners
  • Model Definitions: Catalog your models with metadata like family, parameter size, and quantization
  • Model Configurations: Create reusable configurations with pinned properties for embeddings and completions
  • Virtual Model Runners: Combine endpoints and configurations into virtual endpoints with load balancing
  • Configuration Pinning: Automatically inject model parameters into requests (like OllamaFlow)
  • Session Affinity: Pin clients to specific backend endpoints based on IP address, API key, or custom headers to minimize context drops and model swapping
  • Load Balancing: Round-robin, random, or first-available endpoint selection with weighted distribution and optional session affinity
  • Health Checking: Automatic background health monitoring of endpoints with configurable thresholds
  • Rate Limiting: Per-endpoint maximum parallel request limits with automatic capacity management
  • Request History: Optional per-VMR request/response capture for debugging and auditing with configurable retention
  • React Dashboard: Full-featured UI for managing all entities including real-time health status

Quick Start

Using Docker Compose

cd docker
docker compose up -d

The server will be available at http://localhost:9000 and the dashboard at http://localhost:9100.

Building from Source

Prerequisites

  • .NET 10 SDK
  • Node.js 20+

Build and Run Server

cd src/Conductor.Server
dotnet run

Build and Run Dashboard

cd dashboard
npm install
npm run dev

API Overview

Supported Provider Types

Conductor currently supports four model runner provider types in both the backend proxy and the dashboard:

Provider Type Runner Type in UI Proxied API Shape Notes
OpenAI OpenAI OpenAI REST API Supports OpenAI-style chat, embeddings, and model listing
vLLM vLLM OpenAI-compatible REST API First-class runner type in the UI; uses the OpenAI-compatible API surface
Gemini Gemini Gemini REST API Supports Gemini-style models/{model}:generateContent, streaming, embeddings, and model listing
Ollama Ollama Ollama REST API Supports Ollama-style /api/generate, /api/chat, and embeddings flows

Authentication

Conductor supports two authentication methods:

  1. Header-based: Include x-tenant-id, x-email, and x-password headers
  2. Bearer Token: Include Authorization: Bearer {token} header

User Permission Model

Users have three permission levels:

Permission Description
Global Admin (IsAdmin=true) Full cross-tenant access to all resources
Tenant Admin (IsTenantAdmin=true) Can manage users and credentials within their own tenant
Standard User Can only access model configurations, endpoints, runners, and virtual runners in their tenant
  • Global Admins can operate on any tenant by specifying TenantId in their requests
  • Tenant Admins have elevated privileges within their assigned tenant
  • Standard Users have read/write access to non-administrative resources

Endpoints

Entity Prefix API Endpoint
Administrator admin_ /v1.0/administrators
Tenant ten_ /v1.0/tenants
User usr_ /v1.0/users
Credential cred_ /v1.0/credentials
Model Runner Endpoint mre_ /v1.0/modelrunnerendpoints
Model Definition md_ /v1.0/modeldefinitions
Model Configuration mc_ /v1.0/modelconfigurations
Virtual Model Runner vmr_ /v1.0/virtualmodelrunners
Request History req_ /v1.0/requesthistory
Request History Summary - /v1.0/requesthistory/summary

Virtual Model Runner Proxy

Virtual model runners expose an API at their configured base path. For example, a VMR with base path /v1.0/api/my-vmr/ would expose:

  • OpenAI API: /v1.0/api/my-vmr/v1/chat/completions, /v1.0/api/my-vmr/v1/embeddings
  • vLLM API: /v1.0/api/my-vmr/v1/chat/completions, /v1.0/api/my-vmr/v1/embeddings
  • Gemini API: /v1.0/api/my-vmr/v1beta/models/gemini-2.5-flash:generateContent, /v1.0/api/my-vmr/v1beta/models/text-embedding-004:embedContent
  • Ollama API: /v1.0/api/my-vmr/api/generate, /v1.0/api/my-vmr/api/chat

Configuration

conductor.json

{
  "Webserver": {
    "Hostname": "localhost",
    "Port": 9000,
    "Ssl": false,
    "Cors": {
      "Enabled": false,
      "AllowedOrigins": [],
      "AllowedMethods": ["GET", "POST", "PUT", "DELETE", "OPTIONS"],
      "AllowedHeaders": ["Content-Type", "Authorization"],
      "ExposedHeaders": [],
      "AllowCredentials": false,
      "MaxAgeSeconds": 86400
    }
  },
  "Database": {
    "Type": "Sqlite",
    "Filename": "./conductor.db"
  },
  "Logging": {
    "Servers": [],
    "LogDirectory": "./logs/",
    "LogFilename": "conductor.log",
    "ConsoleLogging": true,
    "MinimumSeverity": 0
  },
  "RequestHistory": {
    "Enabled": true,
    "Directory": "./request-history/",
    "RetentionDays": 7,
    "CleanupIntervalMinutes": 60,
    "MaxRequestBodyBytes": 65536,
    "MaxResponseBodyBytes": 65536
  }
}

Supported Databases

  • SQLite (default): "Type": "Sqlite", "Filename": "./conductor.db"
  • PostgreSQL: "Type": "PostgreSql", "ConnectionString": "Host=..."
  • SQL Server: "Type": "SqlServer", "ConnectionString": "Server=..."
  • MySQL: "Type": "MySql", "ConnectionString": "Server=..."

CORS Configuration

Cross-Origin Resource Sharing (CORS) can be enabled to allow browser-based applications to access the Conductor API.

Property Type Default Description
Enabled bool false Enable or disable CORS support
AllowedOrigins string[] [] List of allowed origins. Use ["*"] for all origins
AllowedMethods string[] ["GET", "POST", "PUT", "DELETE", "OPTIONS"] Allowed HTTP methods
AllowedHeaders string[] ["Content-Type", "Authorization", ...] Allowed request headers
ExposedHeaders string[] [] Headers exposed to the browser
AllowCredentials bool false Allow credentials (cookies, auth headers). Cannot be used with AllowedOrigins: ["*"]
MaxAgeSeconds int 86400 Preflight cache duration (0-86400 seconds)

Example: Allow all origins (development)

{
  "Webserver": {
    "Cors": {
      "Enabled": true,
      "AllowedOrigins": ["*"]
    }
  }
}

Example: Restrict to specific origins (production)

{
  "Webserver": {
    "Cors": {
      "Enabled": true,
      "AllowedOrigins": ["https://app.example.com", "https://admin.example.com"],
      "AllowCredentials": true
    }
  }
}

Request History Configuration

Request history captures request/response data for Virtual Model Runners with RequestHistoryEnabled set to true. This is useful for debugging, auditing, and troubleshooting.

Property Type Default Description
Enabled bool true Enable or disable request history globally
Directory string "./request-history/" Directory for storing request detail JSON files
RetentionDays int 30 Number of days to retain entries before cleanup (1-365)
CleanupIntervalMinutes int 60 Interval between cleanup runs in minutes (1-1440)
MaxRequestBodyBytes int 65536 Maximum request body bytes to capture (1-10485760)
MaxResponseBodyBytes int 65536 Maximum response body bytes to capture (1-10485760)

Note: Request history must be enabled both globally (in conductor.json) and per-VMR (via the RequestHistoryEnabled property).

Request History Summary API

The summary endpoint returns aggregated request counts grouped by time buckets, useful for charting request volume and success/failure rates over time.

GET /v1.0/requesthistory/summary?startUtc={ISO8601}&endUtc={ISO8601}&interval={hour|day}&vmrGuid={guid}
Parameter Type Required Description
startUtc string No Start of time range (UTC, ISO 8601). Default: 1 hour ago
endUtc string No End of time range (UTC, ISO 8601). Default: now
interval string No Bucket interval: minute, 15minute, hour, 6hour, or day. Default: hour
vmrGuid string No Filter by Virtual Model Runner GUID

Response:

{
  "Data": [
    {
      "TimestampUtc": "2026-03-20T10:00:00Z",
      "SuccessCount": 42,
      "FailureCount": 3,
      "TotalCount": 45
    }
  ],
  "StartUtc": "2026-03-20T10:00:00Z",
  "EndUtc": "2026-03-20T11:00:00Z",
  "Interval": "hour",
  "TotalSuccess": 42,
  "TotalFailure": 3,
  "TotalRequests": 45
}

Success is defined as HTTP status 100-399; failure is HTTP status 400-599 or null (incomplete requests).

Configuration Pinning

Model configurations can define pinned properties that are automatically merged into incoming requests:

{
  "Name": "Low Temperature Config",
  "PinnedCompletionsProperties": {
    "temperature": 0.3,
    "top_p": 0.9,
    "max_tokens": 2048
  },
  "PinnedEmbeddingsProperties": {
    "model": "text-embedding-ada-002"
  }
}

When a request comes through a virtual model runner, the pinned properties are merged with the request body, allowing you to enforce specific model parameters.

Health Checking & Rate Limiting

Endpoint Health Configuration

Model Runner Endpoints support comprehensive health checking with the following properties:

Property Type Default Description
HealthCheckUrl string / URL path appended to endpoint base URL for health checks
HealthCheckMethod enum GET HTTP method (GET or HEAD)
HealthCheckIntervalMs int 5000 Milliseconds between health checks
HealthCheckTimeoutMs int 5000 Timeout for health check requests
HealthCheckExpectedStatusCode int 200 Expected HTTP status code for healthy
UnhealthyThreshold int 2 Consecutive failures before marking unhealthy
HealthyThreshold int 2 Consecutive successes before marking healthy
HealthCheckUseAuth bool false Include API key (Bearer token) in health check requests
MaxParallelRequests int 4 Maximum concurrent requests (0 = unlimited)
Weight int 1 Relative weight for load balancing (1-1000)

Note for OpenAI and vLLM APIs: When using api.openai.com or another OpenAI-compatible backend that requires authentication for model listing, set HealthCheckUseAuth to true and HealthCheckUrl to /v1/models.

Note for Gemini API: When using generativelanguage.googleapis.com, set HealthCheckUseAuth to true and HealthCheckUrl to /v1beta/models. Gemini uses the x-goog-api-key header rather than bearer token authentication.

Health Check Behavior

  • Endpoints start in an unhealthy state and transition to healthy after meeting the HealthyThreshold
  • Background tasks continuously monitor each active endpoint at the configured interval
  • The proxy automatically excludes unhealthy endpoints from request routing
  • When all endpoints are unhealthy, requests return 502 Bad Gateway
  • When all endpoints are at capacity, requests return 429 Too Many Requests

Rate Limiting

  • Each endpoint tracks in-flight requests in real-time
  • The MaxParallelRequests property enforces a per-endpoint concurrency limit
  • Set to 0 for unlimited concurrent requests
  • Requests are counted from start until the response completes (including streaming)

Weighted Load Balancing

  • The Weight property influences endpoint selection in round-robin and random modes
  • Higher weight = more traffic directed to that endpoint
  • Example: Endpoint A (weight=3) receives 3x more traffic than Endpoint B (weight=1)

Health Status API

Monitor endpoint health via the REST API:

# Health of all endpoints in tenant
GET /v1.0/modelrunnerendpoints/health

# Health of endpoints for a specific VMR
GET /v1.0/virtualmodelrunners/{id}/health

Response includes:

  • Current health state (healthy/unhealthy)
  • In-flight request count
  • Total uptime/downtime
  • Uptime percentage
  • Last check timestamp
  • Last error message (if any)

Docker Images

  • Server: jchristn77/conductor:latest
  • Dashboard: jchristn77/conductor-ui:latest

Building Docker Images

# Build server
./build-server.sh  # or build-server.bat on Windows

# Build dashboard
./build-dashboard.sh  # or build-dashboard.bat on Windows

License

MIT License - see LICENSE.md for details.

Attributions

Music icons created by Freepik - Flaticon

About

Conductor is a platform for managing models, model runners, model configurations, and virtualizing combinations into virtual model runners exposed to the network via OpenAI, Ollama, vLLM, or Gemini compatible APIs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors