AssistantHub is a self-hosted RAG (Retrieval-Augmented Generation) data and chatbot platform. It enables you to create AI assistants that can answer questions grounded in your uploaded documents, powered by vector embeddings, hybrid search, and large language models. Upload PDFs, text files, HTML, and more -- AssistantHub automatically extracts content, summarizes, chunks, generates embeddings, and makes it searchable. Your assistants retrieve relevant context at query time and generate accurate, citation-ready responses.
AssistantHub ships as a fully orchestrated Docker Compose stack -- one command brings up the entire platform, including the LLM inference engine, document processing pipeline, vector database, object storage, and a browser-based management dashboard.
Slack support was added in v0.9.0, allowing each assistant to connect directly to Slack and process threaded Slack conversations through the same AssistantHub chat pipeline.
- Slack integration per assistant -- Configure Slack connectivity directly on assistant settings with
Enable Slack, app token, bot token, channel ID, start-of-message indicator, and draft connectivity verification. - Shared chat execution rail -- Slack requests reuse the same retrieval, compaction, citation, inference, and history flow as AssistantHub chat instead of a separate inference path.
- Thread-aware Slack replies -- Incoming Slack messages map to deterministic AssistantHub threads and replies are posted back to the originating Slack thread.
- Slack verification API and dashboard flow -- Added
POST /v1.0/assistants/{assistantId}/settings/slack/verifyplus dashboard support for testing draft values before save. - Chat history origin tracking --
chat_history.originnow records request source such asweborslack. - Migration script:
migrations/007_upgrade_to_v0.9.0.sql
AssistantHub supports per-assistant Slack connectivity through Assistant Settings.
- Enable Slack on an assistant and provide:
App Token(xapp-...)Bot Token(xoxb-...)Channel IDStart-of-Message Indicator
- Use
Verify Connectivityin the dashboard before saving - AssistantHub maintains one Socket Mode connection per Slack-enabled assistant
- In configured channels, messages are processed when they start with the configured indicator or mention the bot
- Direct messages to the bot are also supported
- Slack conversations reuse the same non-streaming chat execution rail as AssistantHub chat, including retrieval, citations, compaction, and history persistence
- Slack responses are posted back into the originating Slack thread
Operational notes:
- Slack tokens are stored in the AssistantHub database in plaintext, so rely on your deployment's at-rest protections
- The Slack app must have Socket Mode enabled and be invited to any private channels it should service
- AssistantHub consumes the
EasySlackNuGet package at version1.0.1
- Metadata filtering for chat completions -- Filter RAG retrieval to only return documents matching specified labels and/or tags. Labels are simple string lists (required/excluded). Tags are key-value conditions supporting operators:
Equals,NotEquals,Contains,StartsWith,EndsWith,GreaterThan,LessThan,IsNull,IsNotNull. Filters can be configured as defaults on an assistant (applied to every conversation) and/or supplied per-request via themetadata_filterfield on the chat completion request body. When both are present, they are merged (required labels/tags unioned, excluded labels/tags unioned). - Per-request
metadata_filteron chat completions -- ThePOST /v1.0/assistants/{id}/chatendpoint accepts an optionalmetadata_filterobject in the request body. This is an AssistantHub extension to the OpenAI-compatible chat schema. Clients that omit it get standard unfiltered retrieval. Example:{ "messages": [{"role": "user", "content": "What were the Q4 results?"}], "metadata_filter": { "required_labels": ["finance", "quarterly-report"], "excluded_labels": ["draft"], "required_tags": [ {"key": "department", "condition": "Equals", "value": "accounting"} ] } } - Assistant-level default filters -- New
RetrievalLabelFilterandRetrievalTagFiltersettings on each assistant. Configure via the dashboard (Retrieval Filters section) or API. These defaults are applied to every chat retrieval for that assistant. - Filter discovery endpoints -- Four new API endpoints to discover available filter values:
GET /v1.0/collections/{collectionId}/labels/distinct(admin)GET /v1.0/collections/{collectionId}/tags/distinct(admin)GET /v1.0/assistants/{assistantId}/labels/distinct(public)GET /v1.0/assistants/{assistantId}/tags/distinct(public)
- Dashboard -- Retrieval Filters configuration in assistant settings, collapsible metadata filter panel in the chat UI for per-session filtering, and metadata filter display in the history detail view
- Auditing -- The effective merged filter is stored in
ChatHistory.MetadataFilterand displayed in the History View modal - Docker image tags updated to v0.7.0
- See CHANGELOG.md for full details
- LLM-based re-ranking -- After initial retrieval, an LLM scores each chunk's relevance to the user's query and filters out low-quality results before context injection
- See CHANGELOG.md for full details
- Native web crawlers -- Built-in web crawling engine that automatically discovers, retrieves, and ingests website content. Configure a URL, schedule, and ingestion rule, and AssistantHub handles the rest
- Crawl plans and scheduling -- Persistent crawler configurations with automatic recurring execution on configurable intervals (one-time, minutes, hours, days, weeks)
- Delta-based crawling -- Subsequent crawls compare against the previous enumeration to process only new, changed, and deleted content
- Document traceability -- Every crawled document is linked back to its source crawler and operation. Filter the Documents view by crawler to see all ingested content
- On-demand controls -- Start, stop, test connectivity, and preview discovered content from the dashboard or API
- Full dashboard integration -- Crawlers management view, operations viewer with statistics, enumeration browser, and Documents view integration
- 16 new API endpoints -- Complete CRUD, lifecycle control, statistics, and enumeration access for crawl plans and operations
- See CHANGELOG.md for full details
- Query rewrite -- LLM-based query rewriting for improved retrieval recall
- Full multi-tenancy -- Row-level tenant isolation, three-tier authorization, auto-provisioning, tenant-scoped routes
- See CHANGELOG.md for full details
- Initial release with multi-assistant platform, automated document ingestion, flexible search modes, streaming chat, and browser-based dashboard
- See CHANGELOG.md for full details
- Assistants -- Create and manage multiple AI assistants, each with their own configuration, personality, and knowledge base.
- Documents -- Upload documents (PDF, text, HTML, and more) to build a knowledge base for each assistant. Documents are automatically chunked, embedded, and indexed.
- Crawlers -- Native web crawling engine that automatically discovers, retrieves, and ingests website content on a schedule. Supports delta-based crawling (only new/changed/deleted content is processed), configurable depth, parallelism, throttling, content filtering, and web authentication (Basic, API Key, Bearer Token). Each crawled document is traceable back to its source crawler and operation.
- Ingestion Rules -- Define reusable ingestion configurations that specify target S3 buckets, RecallDB collections, summarization, chunking strategies, and embedding settings. Documents reference an ingestion rule for processing.
- Summarization -- Optionally summarize document content before or after chunking using configurable completion endpoints, improving retrieval quality for long documents.
- Endpoint Management -- Manage embedding and completion (inference) endpoints on the Partio service directly from the dashboard or API.
- Search -- Leverages pgvector and RecallDB for vector, full-text, and hybrid search. Configure per-assistant search modes with tunable scoring weights for optimal retrieval from your document corpus.
- Retrieval Gate -- Optional LLM-based retrieval gate that intelligently decides whether each user message requires a new document search or can be answered from existing conversation context, reducing unnecessary retrieval calls.
- Chat -- Public-facing chat endpoint that retrieves relevant context from your documents and generates responses using configurable LLM providers (Ollama, OpenAI, Gemini). Supports real-time SSE streaming.
- Conversation Compaction -- Automatic summarization of older messages when the conversation approaches the context window limit, preserving continuity across long conversations.
- Feedback -- Collect thumbs-up/thumbs-down feedback and free-text comments on assistant responses to monitor quality and improve over time.
- Multi-Tenant -- Full row-level tenant isolation with three-tier authorization (Global Admin via API key or
IsAdminflag, Tenant Admin, User). Auto-provisioning of tenant resources, per-tenant S3 bucket isolation ({tenantId}_prefix), and tenant-scoped RecallDB mapping. - Dashboard -- Browser-based management UI for configuring assistants, uploading documents, viewing feedback, managing endpoints, and testing chat.
- Query rewrite -- Optionally rewrite user queries into multiple semantically varied phrasings before retrieval to broaden recall and capture synonyms, alternate phrasing, and conceptual restatements
- LLM-based re-ranking -- Re-ranking scores each retrieved chunk for relevance using an LLM, filtering low-quality results before context injection.
- Metadata filtering -- Filter RAG retrieval by document labels (required/excluded string lists) and tags (key-value conditions with conditional operators). Configure default filters per assistant and/or override per-conversation via the
metadata_filterfield on chat completion requests. - Source citations -- Optional per-assistant citation metadata that maps model claims to source documents with bracket notation, relevance scores, and text excerpts. Configurable document linking via presigned S3 URLs or authenticated download endpoints
- RAG evaluation -- Built-in evaluation framework for measuring retrieval and response quality. Define ground-truth facts (question/expected-facts pairs) per assistant, run automated evaluation passes with LLM-based judging, and review per-fact results with pass/fail verdicts. Supports custom judge prompts and real-time SSE progress streaming.
The fastest way to run AssistantHub and all its dependencies is with Docker Compose. This is the recommended deployment method.
cd docker
docker compose up -dOnce all services are healthy, open http://localhost:8801 to access the dashboard.
On a fresh startup, assistanthub-server now waits for partio-server to become healthy before it starts. This avoids the transient partio-server:8400 DNS/startup race that could previously abort AssistantHub startup immediately after a factory reset.
Note: Deploying individual services outside of Docker is also possible, but requires manual configuration and deployment of each dependency (PostgreSQL with pgvector, Ollama, Less3, DocumentAtom, Partio, RecallDB). The Docker Compose stack handles all service wiring, health checks, and startup ordering automatically, which is why manual setup documentation is not provided.
The Docker Compose stack orchestrates the following services:
| Service | Port | Description |
|---|---|---|
| assistanthub-server | 8800 | The core AssistantHub REST API server (.NET 10). Handles all business logic: assistant management, document ingestion orchestration, chat with RAG, user authentication, and integration with all downstream services. |
| assistanthub-dashboard | 8801 | Browser-based management dashboard (React 19, served by nginx). Provides a full UI for configuring assistants, uploading documents, managing endpoints, viewing feedback/history, and live chat testing. Proxies API requests to the server. |
| ollama | 11434 | Local LLM inference engine. Runs language models (e.g., gemma3:4b) for chat completion, conversation compaction, retrieval gate classification, and title generation. Models are persisted in a Docker volume. |
| less3 | 8000 | S3-compatible object storage server. Stores uploaded document files. AssistantHub uses the S3 API to write, read, and delete document objects during ingestion and cleanup. |
| less3-ui | 8001 | Web-based management UI for Less3. Allows direct browsing and management of S3 buckets and objects. |
| documentatom-server | 8301 | Document processing service. Extracts text content from uploaded files (PDF, DOCX, HTML, text, and more), returning structured cells that represent the document's content. |
| documentatom-dashboard | 8302 | Web-based management UI for DocumentAtom. |
| partio-server | 8321 | Text chunking, embedding, and summarization service. Splits extracted text into chunks using configurable strategies, computes vector embeddings via configurable embedding endpoints, and optionally summarizes content using a completion endpoint. Also manages embedding and completion endpoint configurations. |
| partio-dashboard | 8322 | Web-based management UI for Partio. Allows direct management of embedding and completion endpoints. |
| pgvector | 5432 | PostgreSQL with the pgvector extension. Provides the underlying vector storage and full-text search capabilities used by RecallDB. Supports cosine similarity search over high-dimensional embedding vectors. |
| recalldb-server | 8401 | Vector and full-text search database. Wraps pgvector with a REST API for storing, searching, and managing document embeddings. Supports vector search (semantic similarity), full-text search (keyword matching), and hybrid search (weighted combination). |
| recalldb-dashboard | 8402 | Web-based management UI for RecallDB. Allows direct browsing of collections, records, and search testing. |
If you already have Ollama running on your host machine or on another server, you can skip the containerized Ollama and point AssistantHub at your existing instance instead.
1. Comment out the Ollama service in docker/compose.yaml:
Comment out (or remove) the ollama service and its volume:
services:
# --- Infrastructure ---
# ollama:
# image: ollama/ollama:latest
# container_name: ollama
# ports:
# - "11434:11434"
# environment:
# OLLAMA_NUM_PARALLEL: "4"
# OLLAMA_MAX_LOADED_MODELS: "4"
# volumes:
# - ollama-models:/root/.ollama
# restart: unless-stoppedAlso comment out the ollama-models volume at the bottom of the file:
volumes:
pgvector-data:
# ollama-models:And remove - ollama from the partio-server service's depends_on list.
2. Update docker/assistanthub/assistanthub.json to point to your Ollama instance:
In the Inference section, change the Endpoint from the container hostname to your Ollama instance's address:
"Inference": {
"Provider": "Ollama",
"Endpoint": "http://host.docker.internal:11434",
"ApiKey": "default",
"DefaultModel": "gemma3:4b"
}- Ollama on the same machine (Docker Desktop): Use
http://host.docker.internal:11434. The special hostnamehost.docker.internalresolves to your host machine from inside Docker containers. Do not uselocalhost-- inside a container,localhostrefers to the container itself, not your host machine. - Ollama on the same machine (Linux without Docker Desktop): Use
http://172.17.0.1:11434(the default Docker bridge gateway), or run the compose stack withnetwork_mode: host. You may also need to setOLLAMA_HOST=0.0.0.0in your Ollama configuration so it listens on all interfaces. - Ollama on another machine: Use that machine's IP or hostname, e.g.
http://192.168.1.50:11434. Ensure the Ollama port is accessible from the Docker network.
3. Update docker/partio/partio.json to point to your Ollama instance:
In the DefaultEmbeddingEndpoints section, change the Endpoint from the container hostname to match the address you used above:
"DefaultEmbeddingEndpoints": [
{
"Model": "all-minilm",
"Endpoint": "http://host.docker.internal:11434",
"ApiFormat": "Ollama",
"ApiKey": null
}
]4. Update embedding and completion endpoints in the Partio dashboard:
After startup, open the Partio dashboard at http://localhost:8322 and update both the embedding endpoints and completion endpoints to point to your Ollama instance:
- Change the Endpoint URL from
http://ollama:11434to your instance's address (e.g.http://host.docker.internal:11434). - Change the Health Check URL from a relative path (
/api/tags) to a fully-qualified URL (e.g.http://host.docker.internal:11434/api/tags). Health checks using relative paths will fail with an "invalid request URI" error.
Without these changes, document ingestion (embeddings) and chat completions will fail.
5. Start the stack:
cd docker
docker compose up -d| Dashboard | URL | Default Credentials |
|---|---|---|
| AssistantHub | http://localhost:8801 | Email: admin@assistanthub, Password: password |
| Less3 | http://localhost:8001 | Admin API Key: less3admin, Access Key: default, Secret Key: default |
| DocumentAtom | http://localhost:8302 | No authentication configured by default |
| Partio | http://localhost:8322 | Email: admin@partio, Password: password, Admin API Key: partioadmin |
| RecallDB | http://localhost:8402 | Email: admin@recall, Password: password, Admin API Key: recalldbadmin |
Important: Change all default passwords immediately after first login.
The server reads configuration from assistanthub.json in the working directory. For Docker deployments, this file is located at docker/assistanthub/assistanthub.json and is mounted into the container.
{
"Webserver": {
"Hostname": "*",
"Port": 8800,
"Ssl": false
},
"Database": {
"Type": "Sqlite",
"Filename": "./data/assistanthub.db",
"Hostname": "",
"Port": 0,
"DatabaseName": "",
"Username": "",
"Password": ""
},
"S3": {
"Region": "USWest1",
"BucketName": "default",
"AccessKey": "default",
"SecretKey": "default",
"EndpointUrl": "http://less3:8000",
"UseSsl": false,
"BaseUrl": "http://less3:8000"
},
"DocumentAtom": {
"Endpoint": "http://documentatom-server:8000",
"AccessKey": "default"
},
"Chunking": {
"Endpoint": "http://partio-server:8400",
"AccessKey": "partioadmin",
"EndpointId": "default"
},
"Embeddings": {
"Endpoint": "http://partio-server:8400",
"AccessKey": "partioadmin",
"EndpointId": "default"
},
"Inference": {
"Provider": "Ollama",
"Endpoint": "http://ollama:11434",
"ApiKey": "default",
"DefaultModel": "gemma3:4b"
},
"RecallDb": {
"Endpoint": "http://recalldb-server:8600",
"AccessKey": "recalldbadmin"
},
"AdminApiKeys": [
"changeme"
],
"DefaultTenant": {
"Id": "default",
"Name": "Default"
},
"ProcessingLog": {
"Directory": "./processing-logs/",
"RetentionDays": 30
},
"ChatHistory": {
"RetentionDays": 7
},
"Crawl": {
"EnumerationDirectory": "./crawl-enumerations/"
},
"Logging": {
"ConsoleLogging": true,
"EnableColors": false,
"FileLogging": true,
"LogDirectory": "./logs/",
"LogFilename": "assistanthub.log",
"IncludeDateInFilename": true,
"MinimumSeverity": 1,
"Servers": []
}
}| Section | Description |
|---|---|
Webserver |
Hostname, port, and SSL toggle for the HTTP listener. |
Database |
Database type (Sqlite, Postgresql, SqlServer, Mysql) and connection details. |
S3 |
S3-compatible object storage (Less3) for uploaded documents. |
DocumentAtom |
Endpoint and access key for the DocumentAtom document-processing service. |
Chunking |
Endpoint, access key, and default endpoint ID for the Partio chunking service. |
Embeddings |
Endpoint, access key, and default endpoint ID for the Partio embeddings service. |
Inference |
LLM provider (Ollama, OpenAI, or Gemini), endpoint, API key, and default model. |
RecallDb |
Endpoint and access key for the RecallDB vector database service. |
AdminApiKeys |
List of API keys that grant global admin access (not tied to any tenant). Users with IsAdmin=true also receive global admin privileges. |
DefaultTenant |
ID and name for the default tenant, auto-created on first run. |
ProcessingLog |
Directory and retention for per-document processing logs (namespaced by tenant). |
ChatHistory |
Retention period in days for chat history records (0 = keep indefinitely). Background cleanup runs hourly. |
Crawl |
Directory for storing crawl enumeration files (delta snapshots used for change detection between crawl runs). |
Logging |
Console/file logging toggles, severity level, log directory, and optional syslog servers. |
To completely reset AssistantHub to a clean state, use the factory reset script:
cd docker
docker compose down
cd factory
./reset.sh # Linux/macOS
reset.bat # WindowsThe script will prompt you to type RESET to confirm. This destroys all runtime data (databases, uploaded documents, logs, vector data) and restores factory-default databases. Configuration files are preserved. Downloaded Ollama models are kept by default; pass --include-models to remove them as well.
After the reset completes, start the environment again:
cd docker
docker compose up -dExpected behavior after reset:
assistanthub-serverwill not start untilpartio-serveris healthy- this is intentional and prevents AssistantHub from failing early while validating chunking and embeddings connectivity
- if startup appears slower than before, wait for Partio to finish its health checks and model initialization
AssistantHub exposes a versioned REST API at /v1.0/. All authenticated endpoints require a bearer token in the Authorization header or as a token query parameter.
For complete endpoint documentation including request/response schemas and examples, see REST_API.md.
| Category | Endpoints | Description |
|---|---|---|
| Health | GET /, HEAD / |
Server info and health check (unauthenticated) |
| Authentication | POST /v1.0/authenticate |
Authenticate with email/password (+ optional TenantId) or bearer token |
| WhoAmI | GET /v1.0/whoami |
Return current authentication context (tenant, role, user) |
| Tenants | PUT/GET /v1.0/tenants, GET/PUT/DELETE/HEAD /v1.0/tenants/{id} |
Tenant management (global admin only) |
| Users | PUT/GET /v1.0/tenants/{tenantId}/users, GET/PUT/DELETE/HEAD .../users/{id} |
Tenant-scoped user management |
| Credentials | PUT/GET /v1.0/tenants/{tenantId}/credentials, GET/PUT/DELETE/HEAD .../credentials/{id} |
Tenant-scoped credential management |
| Buckets | PUT/GET /v1.0/buckets, GET/DELETE/HEAD /v1.0/buckets/{name} |
S3 bucket management (tenant-scoped by {tenantId}_ prefix) |
| Bucket Objects | GET/PUT/POST/DELETE /v1.0/buckets/{name}/objects |
S3 object management with upload, download, metadata, and directory creation (tenant-scoped) |
| Collections | PUT/GET /v1.0/collections, GET/PUT/DELETE/HEAD /v1.0/collections/{id} |
RecallDB collection management (admin only) |
| Collection Records | PUT/GET /v1.0/collections/{id}/records, GET/DELETE .../records/{recordId} |
Browse and manage records within collections (admin only) |
| Collection Metadata | GET /v1.0/collections/{id}/labels/distinct, GET .../tags/distinct |
Discover distinct label values and tag keys in a collection (admin only) |
| Ingestion Rules | PUT/GET /v1.0/ingestion-rules, GET/PUT/DELETE/HEAD /v1.0/ingestion-rules/{id} |
Document processing rule management |
| Embedding Endpoints | PUT /v1.0/endpoints/embedding, POST .../enumerate, GET/PUT/DELETE/HEAD .../{id}, GET .../health, POST .../test |
Partio embedding endpoint management and smoke testing (admin only) |
| Completion Endpoints | PUT /v1.0/endpoints/completion, POST .../enumerate, GET/PUT/DELETE/HEAD .../{id}, GET .../health, POST .../test |
Partio completion endpoint management and smoke testing (admin only) |
| Assistants | PUT/GET /v1.0/assistants, GET/PUT/DELETE/HEAD /v1.0/assistants/{id} |
Assistant management (owner or admin) |
| Assistant Settings | GET/PUT /v1.0/assistants/{id}/settings, POST .../settings/slack/verify |
Per-assistant endpoint, prompt, RAG, and Slack configuration. Includes draft Slack connectivity verification (owner or admin). |
| Crawl Plans | PUT/GET /v1.0/crawlplans, GET/PUT/DELETE/HEAD /v1.0/crawlplans/{id}, POST .../start, POST .../stop, POST .../connectivity, GET .../enumerate |
Crawler management with schedule control, connectivity testing, and content preview |
| Crawl Operations | GET /v1.0/crawlplans/{id}/operations, GET .../statistics, GET/DELETE .../operations/{id}, GET .../statistics, GET .../enumeration |
Crawl execution history, statistics, and enumeration file access |
| Documents | PUT/GET /v1.0/documents, GET/DELETE/HEAD /v1.0/documents/{id}, GET .../processing-log |
Document upload, management, and processing log access |
| Feedback | GET /v1.0/feedback, GET/DELETE /v1.0/feedback/{id} |
View and manage user feedback |
| History | GET /v1.0/history, GET/DELETE /v1.0/history/{id} |
View and manage chat history with timing metrics |
| Threads | GET /v1.0/threads |
List conversation threads |
| Models | GET /v1.0/models, POST /v1.0/models/pull, GET .../pull/status, DELETE /v1.0/models/{modelName} |
List, pull, delete, and check pull status for inference models |
| Eval Facts | PUT/GET /v1.0/eval/facts, GET/PUT/DELETE /v1.0/eval/facts/{factId} |
Ground-truth fact management for RAG evaluation |
| Eval Runs | POST/GET /v1.0/eval/runs, GET/DELETE /v1.0/eval/runs/{runId}, GET .../results, GET .../stream |
Start, list, and stream evaluation runs with LLM-judged results |
| Eval Results | GET /v1.0/eval/results/{resultId} |
Retrieve individual evaluation result details |
| Eval Judge Prompt | GET /v1.0/eval/judge-prompt/default |
Retrieve the default judge prompt template |
| Configuration | GET/PUT /v1.0/configuration |
View and update server configuration (admin only) |
| Public Chat | POST /v1.0/assistants/{id}/chat |
Chat completion with RAG and optional metadata filtering (unauthenticated, SSE or JSON) |
| Public Generate | POST /v1.0/assistants/{id}/generate |
Lightweight inference without RAG (unauthenticated) |
| Public Compact | POST /v1.0/assistants/{id}/compact |
Force conversation compaction (unauthenticated) |
| Public Feedback | POST /v1.0/assistants/{id}/feedback |
Submit feedback (unauthenticated) |
| Public Info | GET /v1.0/assistants/{id}/public |
Get assistant public info and appearance (unauthenticated) |
| Public Metadata | GET /v1.0/assistants/{id}/labels/distinct, GET .../tags/distinct |
Discover available label and tag filter values for an assistant's collection (unauthenticated) |
| Public Threads | POST /v1.0/assistants/{id}/threads |
Create a conversation thread (unauthenticated) |
┌──────────────────┐
│ Dashboard │
│ (React / Vite) │
│ Port 8801 │
└────────┬─────────┘
│
│ HTTP (nginx reverse proxy)
▼
┌──────────────────┐
│ AssistantHub │
│ Server (.NET 10) │
│ Port 8800 │
└──┬────┬────┬──┬──┘
│ │ │ │
┌──────────────┘ │ │ └──────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────┐ ┌────────────────┐ ┌──────────────────┐
│ DocumentAtom │ │ RecallDB │ │ Less3 │
│ (Doc Processing) │ │(Vector Search) │ │ (S3 Storage) │
│ Port 8301 │ │ Port 8401 │ │ Port 8000 │
└────────┬─────────┘ └────────┬───────┘ └──────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Partio │ │ pgvector │
│ (Chunk/Embed) │ │ (PostgreSQL) │
│ Port 8321 │ │ Port 5432 │
└────────┬─────────┘ └──────────────────┘
│
▼
┌──────────────────┐
│ Ollama │
│ (LLM Inference) │
│ Port 11434 │
└──────────────────┘
┌─────────┐ ┌──────────────┐ ┌──────────────┐
│ User │ │ AssistantHub │ │ Less3 │
│(Browser │──1───►│ Server │──2───►│ (S3 Storage) │
│ or API)│ │ │ └──────────────┘
└─────────┘ └──────┬───────┘
│
3 │
▼
┌──────────────┐
│ DocumentAtom │ Extracts text cells
│ │ from PDF, DOCX, HTML, etc.
└──────┬───────┘
│
4 │ Text cells
▼
┌──────────────┐
│ Partio │ Optionally summarizes cells,
│ │ chunks text, computes embeddings
└──────┬───────┘
│
5 │ Chunks + embeddings
▼
┌──────────────┐
│ RecallDB │ Stores chunks and vectors
│ (pgvector) │ for retrieval
└──────────────┘
- User uploads a document via the API or dashboard, selecting an ingestion rule.
- The document file is stored in the ingestion rule's S3 bucket via Less3.
- DocumentAtom extracts text content from the document, returning structured cells.
- Partio processes the cells: optionally summarizes (pre- or post-chunking per the rule), splits into chunks using the rule's chunking strategy, and computes vector embeddings via the configured embedding endpoint.
- Chunks and embeddings are stored in the ingestion rule's RecallDB collection. Chunk record IDs are saved on the document for cleanup on deletion.
┌─────────┐ ┌──────────────┐ ┌──────────────┐
│ User │ │ AssistantHub │ │ RecallDB │
│(Browser │──1───►│ Server │──2───►│ (pgvector) │
│ or API)│ │ │◄──3───│ │
└─────────┘ └──────┬───────┘ └──────────────┘
▲ │
│ 4 │ Context + messages
│ ▼
│ ┌──────────────┐
└─────6──────│ Ollama │ Generates response
│ (Inference) │ (streaming or batch)
└──────────────┘
- User sends a message to the chat endpoint with conversation history.
- If RAG is enabled (and the retrieval gate permits), the server embeds the query and searches RecallDB using the assistant's configured search mode (vector, full-text, or hybrid).
- RecallDB returns relevant document chunks ranked by similarity score.
- The server assembles the system prompt with retrieved context and sends the full message list to the configured inference provider (Ollama, OpenAI, or Gemini). If the conversation exceeds the context window, older messages are compacted first.
- The LLM generates a response.
- The response is streamed back to the user token-by-token via SSE (or returned as a complete JSON response). Chat history with timing metrics is persisted.
- Backend: .NET 10 (C#), WatsonWebserver
- Frontend: React 19, Vite 6, JavaScript
- Database: SQLite (default), PostgreSQL, SQL Server, MySQL
- Vector Search: RecallDB backed by PostgreSQL with pgvector
- Document Processing: DocumentAtom (text extraction), Partio (chunking, embedding, summarization)
- Object Storage: Less3 (S3-compatible)
- Inference Providers: Ollama (local), OpenAI (cloud), Gemini (cloud)
- Containerization: Docker, Docker Compose
- Web Server (Dashboard): nginx
Client libraries are available for integrating with the AssistantHub API:
| SDK | Location | Description |
|---|---|---|
| JavaScript/TypeScript | sdk/js/ |
Dual ESM/CJS output, native fetch, async generators for SSE streaming |
| Python | sdk/python/ |
Pydantic v2 models, httpx client, PEP 561 compliant |
| C# | sdk/csharp/ |
.NET 8.0, System.Text.Json, typed exceptions, IAsyncEnumerable streaming |
Each SDK directory contains its own README with installation instructions and usage examples.
- Bug Reports and Feature Requests -- Use the Issues tab to report bugs or request new features.
- Questions and Discussion -- Use the Discussions tab for general questions, ideas, and community feedback.
- Improvements -- We are happy to accept pull requests, please keep them focused and short
This project is licensed under the MIT License. See LICENSE.md for details.
Copyright (c) 2026 Joel Christner.








