Skip to content

jchristn/AssistantHub

Repository files navigation

AssistantHub

AssistantHub Logo
MIT License .NET 10.0 Docker Compose

AssistantHub is a self-hosted RAG (Retrieval-Augmented Generation) data and chatbot platform. It enables you to create AI assistants that can answer questions grounded in your uploaded documents, powered by vector embeddings, hybrid search, and large language models. Upload PDFs, text files, HTML, and more -- AssistantHub automatically extracts content, summarizes, chunks, generates embeddings, and makes it searchable. Your assistants retrieve relevant context at query time and generate accurate, citation-ready responses.

AssistantHub ships as a fully orchestrated Docker Compose stack -- one command brings up the entire platform, including the LLM inference engine, document processing pipeline, vector database, object storage, and a browser-based management dashboard.

Slack support was added in v0.9.0, allowing each assistant to connect directly to Slack and process threaded Slack conversations through the same AssistantHub chat pipeline.

Screenshots (click to expand)

Chat Example Citations Assistant Settings Endpoint Health Feedback Feedback Management History Details History Details


New in v0.9.0

  • Slack integration per assistant -- Configure Slack connectivity directly on assistant settings with Enable Slack, app token, bot token, channel ID, start-of-message indicator, and draft connectivity verification.
  • Shared chat execution rail -- Slack requests reuse the same retrieval, compaction, citation, inference, and history flow as AssistantHub chat instead of a separate inference path.
  • Thread-aware Slack replies -- Incoming Slack messages map to deterministic AssistantHub threads and replies are posted back to the originating Slack thread.
  • Slack verification API and dashboard flow -- Added POST /v1.0/assistants/{assistantId}/settings/slack/verify plus dashboard support for testing draft values before save.
  • Chat history origin tracking -- chat_history.origin now records request source such as web or slack.
  • Migration script: migrations/007_upgrade_to_v0.9.0.sql

Slack Integration Added In v0.9.0

AssistantHub supports per-assistant Slack connectivity through Assistant Settings.

  • Enable Slack on an assistant and provide:
    • App Token (xapp-...)
    • Bot Token (xoxb-...)
    • Channel ID
    • Start-of-Message Indicator
  • Use Verify Connectivity in the dashboard before saving
  • AssistantHub maintains one Socket Mode connection per Slack-enabled assistant
  • In configured channels, messages are processed when they start with the configured indicator or mention the bot
  • Direct messages to the bot are also supported
  • Slack conversations reuse the same non-streaming chat execution rail as AssistantHub chat, including retrieval, citations, compaction, and history persistence
  • Slack responses are posted back into the originating Slack thread

Operational notes:

  • Slack tokens are stored in the AssistantHub database in plaintext, so rely on your deployment's at-rest protections
  • The Slack app must have Socket Mode enabled and be invited to any private channels it should service
  • AssistantHub consumes the EasySlack NuGet package at version 1.0.1

New in v0.7.0

  • Metadata filtering for chat completions -- Filter RAG retrieval to only return documents matching specified labels and/or tags. Labels are simple string lists (required/excluded). Tags are key-value conditions supporting operators: Equals, NotEquals, Contains, StartsWith, EndsWith, GreaterThan, LessThan, IsNull, IsNotNull. Filters can be configured as defaults on an assistant (applied to every conversation) and/or supplied per-request via the metadata_filter field on the chat completion request body. When both are present, they are merged (required labels/tags unioned, excluded labels/tags unioned).
  • Per-request metadata_filter on chat completions -- The POST /v1.0/assistants/{id}/chat endpoint accepts an optional metadata_filter object in the request body. This is an AssistantHub extension to the OpenAI-compatible chat schema. Clients that omit it get standard unfiltered retrieval. Example:
    {
      "messages": [{"role": "user", "content": "What were the Q4 results?"}],
      "metadata_filter": {
        "required_labels": ["finance", "quarterly-report"],
        "excluded_labels": ["draft"],
        "required_tags": [
          {"key": "department", "condition": "Equals", "value": "accounting"}
        ]
      }
    }
  • Assistant-level default filters -- New RetrievalLabelFilter and RetrievalTagFilter settings on each assistant. Configure via the dashboard (Retrieval Filters section) or API. These defaults are applied to every chat retrieval for that assistant.
  • Filter discovery endpoints -- Four new API endpoints to discover available filter values:
    • GET /v1.0/collections/{collectionId}/labels/distinct (admin)
    • GET /v1.0/collections/{collectionId}/tags/distinct (admin)
    • GET /v1.0/assistants/{assistantId}/labels/distinct (public)
    • GET /v1.0/assistants/{assistantId}/tags/distinct (public)
  • Dashboard -- Retrieval Filters configuration in assistant settings, collapsible metadata filter panel in the chat UI for per-session filtering, and metadata filter display in the history detail view
  • Auditing -- The effective merged filter is stored in ChatHistory.MetadataFilter and displayed in the History View modal
  • Docker image tags updated to v0.7.0
  • See CHANGELOG.md for full details

v0.6.0

  • LLM-based re-ranking -- After initial retrieval, an LLM scores each chunk's relevance to the user's query and filters out low-quality results before context injection
  • See CHANGELOG.md for full details

v0.5.0

  • Native web crawlers -- Built-in web crawling engine that automatically discovers, retrieves, and ingests website content. Configure a URL, schedule, and ingestion rule, and AssistantHub handles the rest
  • Crawl plans and scheduling -- Persistent crawler configurations with automatic recurring execution on configurable intervals (one-time, minutes, hours, days, weeks)
  • Delta-based crawling -- Subsequent crawls compare against the previous enumeration to process only new, changed, and deleted content
  • Document traceability -- Every crawled document is linked back to its source crawler and operation. Filter the Documents view by crawler to see all ingested content
  • On-demand controls -- Start, stop, test connectivity, and preview discovered content from the dashboard or API
  • Full dashboard integration -- Crawlers management view, operations viewer with statistics, enumeration browser, and Documents view integration
  • 16 new API endpoints -- Complete CRUD, lifecycle control, statistics, and enumeration access for crawl plans and operations
  • See CHANGELOG.md for full details

v0.4.0

  • Query rewrite -- LLM-based query rewriting for improved retrieval recall
  • Full multi-tenancy -- Row-level tenant isolation, three-tier authorization, auto-provisioning, tenant-scoped routes
  • See CHANGELOG.md for full details

v0.3.0

  • Initial release with multi-assistant platform, automated document ingestion, flexible search modes, streaming chat, and browser-based dashboard
  • See CHANGELOG.md for full details

Features

  • Assistants -- Create and manage multiple AI assistants, each with their own configuration, personality, and knowledge base.
  • Documents -- Upload documents (PDF, text, HTML, and more) to build a knowledge base for each assistant. Documents are automatically chunked, embedded, and indexed.
  • Crawlers -- Native web crawling engine that automatically discovers, retrieves, and ingests website content on a schedule. Supports delta-based crawling (only new/changed/deleted content is processed), configurable depth, parallelism, throttling, content filtering, and web authentication (Basic, API Key, Bearer Token). Each crawled document is traceable back to its source crawler and operation.
  • Ingestion Rules -- Define reusable ingestion configurations that specify target S3 buckets, RecallDB collections, summarization, chunking strategies, and embedding settings. Documents reference an ingestion rule for processing.
  • Summarization -- Optionally summarize document content before or after chunking using configurable completion endpoints, improving retrieval quality for long documents.
  • Endpoint Management -- Manage embedding and completion (inference) endpoints on the Partio service directly from the dashboard or API.
  • Search -- Leverages pgvector and RecallDB for vector, full-text, and hybrid search. Configure per-assistant search modes with tunable scoring weights for optimal retrieval from your document corpus.
  • Retrieval Gate -- Optional LLM-based retrieval gate that intelligently decides whether each user message requires a new document search or can be answered from existing conversation context, reducing unnecessary retrieval calls.
  • Chat -- Public-facing chat endpoint that retrieves relevant context from your documents and generates responses using configurable LLM providers (Ollama, OpenAI, Gemini). Supports real-time SSE streaming.
  • Conversation Compaction -- Automatic summarization of older messages when the conversation approaches the context window limit, preserving continuity across long conversations.
  • Feedback -- Collect thumbs-up/thumbs-down feedback and free-text comments on assistant responses to monitor quality and improve over time.
  • Multi-Tenant -- Full row-level tenant isolation with three-tier authorization (Global Admin via API key or IsAdmin flag, Tenant Admin, User). Auto-provisioning of tenant resources, per-tenant S3 bucket isolation ({tenantId}_ prefix), and tenant-scoped RecallDB mapping.
  • Dashboard -- Browser-based management UI for configuring assistants, uploading documents, viewing feedback, managing endpoints, and testing chat.
  • Query rewrite -- Optionally rewrite user queries into multiple semantically varied phrasings before retrieval to broaden recall and capture synonyms, alternate phrasing, and conceptual restatements
  • LLM-based re-ranking -- Re-ranking scores each retrieved chunk for relevance using an LLM, filtering low-quality results before context injection.
  • Metadata filtering -- Filter RAG retrieval by document labels (required/excluded string lists) and tags (key-value conditions with conditional operators). Configure default filters per assistant and/or override per-conversation via the metadata_filter field on chat completion requests.
  • Source citations -- Optional per-assistant citation metadata that maps model claims to source documents with bracket notation, relevance scores, and text excerpts. Configurable document linking via presigned S3 URLs or authenticated download endpoints
  • RAG evaluation -- Built-in evaluation framework for measuring retrieval and response quality. Define ground-truth facts (question/expected-facts pairs) per assistant, run automated evaluation passes with LLM-based judging, and review per-fact results with pass/fail verdicts. Supports custom judge prompts and real-time SSE progress streaming.

Quick Start (Docker)

The fastest way to run AssistantHub and all its dependencies is with Docker Compose. This is the recommended deployment method.

cd docker
docker compose up -d

Once all services are healthy, open http://localhost:8801 to access the dashboard.

On a fresh startup, assistanthub-server now waits for partio-server to become healthy before it starts. This avoids the transient partio-server:8400 DNS/startup race that could previously abort AssistantHub startup immediately after a factory reset.

Note: Deploying individual services outside of Docker is also possible, but requires manual configuration and deployment of each dependency (PostgreSQL with pgvector, Ollama, Less3, DocumentAtom, Partio, RecallDB). The Docker Compose stack handles all service wiring, health checks, and startup ordering automatically, which is why manual setup documentation is not provided.

Services

The Docker Compose stack orchestrates the following services:

Service Port Description
assistanthub-server 8800 The core AssistantHub REST API server (.NET 10). Handles all business logic: assistant management, document ingestion orchestration, chat with RAG, user authentication, and integration with all downstream services.
assistanthub-dashboard 8801 Browser-based management dashboard (React 19, served by nginx). Provides a full UI for configuring assistants, uploading documents, managing endpoints, viewing feedback/history, and live chat testing. Proxies API requests to the server.
ollama 11434 Local LLM inference engine. Runs language models (e.g., gemma3:4b) for chat completion, conversation compaction, retrieval gate classification, and title generation. Models are persisted in a Docker volume.
less3 8000 S3-compatible object storage server. Stores uploaded document files. AssistantHub uses the S3 API to write, read, and delete document objects during ingestion and cleanup.
less3-ui 8001 Web-based management UI for Less3. Allows direct browsing and management of S3 buckets and objects.
documentatom-server 8301 Document processing service. Extracts text content from uploaded files (PDF, DOCX, HTML, text, and more), returning structured cells that represent the document's content.
documentatom-dashboard 8302 Web-based management UI for DocumentAtom.
partio-server 8321 Text chunking, embedding, and summarization service. Splits extracted text into chunks using configurable strategies, computes vector embeddings via configurable embedding endpoints, and optionally summarizes content using a completion endpoint. Also manages embedding and completion endpoint configurations.
partio-dashboard 8322 Web-based management UI for Partio. Allows direct management of embedding and completion endpoints.
pgvector 5432 PostgreSQL with the pgvector extension. Provides the underlying vector storage and full-text search capabilities used by RecallDB. Supports cosine similarity search over high-dimensional embedding vectors.
recalldb-server 8401 Vector and full-text search database. Wraps pgvector with a REST API for storing, searching, and managing document embeddings. Supports vector search (semantic similarity), full-text search (keyword matching), and hybrid search (weighted combination).
recalldb-dashboard 8402 Web-based management UI for RecallDB. Allows direct browsing of collections, records, and search testing.

Using an External Ollama Instance

If you already have Ollama running on your host machine or on another server, you can skip the containerized Ollama and point AssistantHub at your existing instance instead.

1. Comment out the Ollama service in docker/compose.yaml:

Comment out (or remove) the ollama service and its volume:

services:

  # --- Infrastructure ---

  # ollama:
  #   image: ollama/ollama:latest
  #   container_name: ollama
  #   ports:
  #     - "11434:11434"
  #   environment:
  #     OLLAMA_NUM_PARALLEL: "4"
  #     OLLAMA_MAX_LOADED_MODELS: "4"
  #   volumes:
  #     - ollama-models:/root/.ollama
  #   restart: unless-stopped

Also comment out the ollama-models volume at the bottom of the file:

volumes:
  pgvector-data:
  # ollama-models:

And remove - ollama from the partio-server service's depends_on list.

2. Update docker/assistanthub/assistanthub.json to point to your Ollama instance:

In the Inference section, change the Endpoint from the container hostname to your Ollama instance's address:

"Inference": {
  "Provider": "Ollama",
  "Endpoint": "http://host.docker.internal:11434",
  "ApiKey": "default",
  "DefaultModel": "gemma3:4b"
}
  • Ollama on the same machine (Docker Desktop): Use http://host.docker.internal:11434. The special hostname host.docker.internal resolves to your host machine from inside Docker containers. Do not use localhost -- inside a container, localhost refers to the container itself, not your host machine.
  • Ollama on the same machine (Linux without Docker Desktop): Use http://172.17.0.1:11434 (the default Docker bridge gateway), or run the compose stack with network_mode: host. You may also need to set OLLAMA_HOST=0.0.0.0 in your Ollama configuration so it listens on all interfaces.
  • Ollama on another machine: Use that machine's IP or hostname, e.g. http://192.168.1.50:11434. Ensure the Ollama port is accessible from the Docker network.

3. Update docker/partio/partio.json to point to your Ollama instance:

In the DefaultEmbeddingEndpoints section, change the Endpoint from the container hostname to match the address you used above:

"DefaultEmbeddingEndpoints": [
  {
    "Model": "all-minilm",
    "Endpoint": "http://host.docker.internal:11434",
    "ApiFormat": "Ollama",
    "ApiKey": null
  }
]

4. Update embedding and completion endpoints in the Partio dashboard:

After startup, open the Partio dashboard at http://localhost:8322 and update both the embedding endpoints and completion endpoints to point to your Ollama instance:

  • Change the Endpoint URL from http://ollama:11434 to your instance's address (e.g. http://host.docker.internal:11434).
  • Change the Health Check URL from a relative path (/api/tags) to a fully-qualified URL (e.g. http://host.docker.internal:11434/api/tags). Health checks using relative paths will fail with an "invalid request URI" error.

Without these changes, document ingestion (embeddings) and chat completions will fail.

5. Start the stack:

cd docker
docker compose up -d

Dashboards

Dashboard URL Default Credentials
AssistantHub http://localhost:8801 Email: admin@assistanthub, Password: password
Less3 http://localhost:8001 Admin API Key: less3admin, Access Key: default, Secret Key: default
DocumentAtom http://localhost:8302 No authentication configured by default
Partio http://localhost:8322 Email: admin@partio, Password: password, Admin API Key: partioadmin
RecallDB http://localhost:8402 Email: admin@recall, Password: password, Admin API Key: recalldbadmin

Important: Change all default passwords immediately after first login.


Configuration

The server reads configuration from assistanthub.json in the working directory. For Docker deployments, this file is located at docker/assistanthub/assistanthub.json and is mounted into the container.

{
  "Webserver": {
    "Hostname": "*",
    "Port": 8800,
    "Ssl": false
  },
  "Database": {
    "Type": "Sqlite",
    "Filename": "./data/assistanthub.db",
    "Hostname": "",
    "Port": 0,
    "DatabaseName": "",
    "Username": "",
    "Password": ""
  },
  "S3": {
    "Region": "USWest1",
    "BucketName": "default",
    "AccessKey": "default",
    "SecretKey": "default",
    "EndpointUrl": "http://less3:8000",
    "UseSsl": false,
    "BaseUrl": "http://less3:8000"
  },
  "DocumentAtom": {
    "Endpoint": "http://documentatom-server:8000",
    "AccessKey": "default"
  },
  "Chunking": {
    "Endpoint": "http://partio-server:8400",
    "AccessKey": "partioadmin",
    "EndpointId": "default"
  },
  "Embeddings": {
    "Endpoint": "http://partio-server:8400",
    "AccessKey": "partioadmin",
    "EndpointId": "default"
  },
  "Inference": {
    "Provider": "Ollama",
    "Endpoint": "http://ollama:11434",
    "ApiKey": "default",
    "DefaultModel": "gemma3:4b"
  },
  "RecallDb": {
    "Endpoint": "http://recalldb-server:8600",
    "AccessKey": "recalldbadmin"
  },
  "AdminApiKeys": [
    "changeme"
  ],
  "DefaultTenant": {
    "Id": "default",
    "Name": "Default"
  },
  "ProcessingLog": {
    "Directory": "./processing-logs/",
    "RetentionDays": 30
  },
  "ChatHistory": {
    "RetentionDays": 7
  },
  "Crawl": {
    "EnumerationDirectory": "./crawl-enumerations/"
  },
  "Logging": {
    "ConsoleLogging": true,
    "EnableColors": false,
    "FileLogging": true,
    "LogDirectory": "./logs/",
    "LogFilename": "assistanthub.log",
    "IncludeDateInFilename": true,
    "MinimumSeverity": 1,
    "Servers": []
  }
}

Key Settings

Section Description
Webserver Hostname, port, and SSL toggle for the HTTP listener.
Database Database type (Sqlite, Postgresql, SqlServer, Mysql) and connection details.
S3 S3-compatible object storage (Less3) for uploaded documents.
DocumentAtom Endpoint and access key for the DocumentAtom document-processing service.
Chunking Endpoint, access key, and default endpoint ID for the Partio chunking service.
Embeddings Endpoint, access key, and default endpoint ID for the Partio embeddings service.
Inference LLM provider (Ollama, OpenAI, or Gemini), endpoint, API key, and default model.
RecallDb Endpoint and access key for the RecallDB vector database service.
AdminApiKeys List of API keys that grant global admin access (not tied to any tenant). Users with IsAdmin=true also receive global admin privileges.
DefaultTenant ID and name for the default tenant, auto-created on first run.
ProcessingLog Directory and retention for per-document processing logs (namespaced by tenant).
ChatHistory Retention period in days for chat history records (0 = keep indefinitely). Background cleanup runs hourly.
Crawl Directory for storing crawl enumeration files (delta snapshots used for change detection between crawl runs).
Logging Console/file logging toggles, severity level, log directory, and optional syslog servers.

Factory Reset (Docker)

To completely reset AssistantHub to a clean state, use the factory reset script:

cd docker
docker compose down
cd factory
./reset.sh        # Linux/macOS
reset.bat         # Windows

The script will prompt you to type RESET to confirm. This destroys all runtime data (databases, uploaded documents, logs, vector data) and restores factory-default databases. Configuration files are preserved. Downloaded Ollama models are kept by default; pass --include-models to remove them as well.

After the reset completes, start the environment again:

cd docker
docker compose up -d

Expected behavior after reset:

  • assistanthub-server will not start until partio-server is healthy
  • this is intentional and prevents AssistantHub from failing early while validating chunking and embeddings connectivity
  • if startup appears slower than before, wait for Partio to finish its health checks and model initialization

API Overview

AssistantHub exposes a versioned REST API at /v1.0/. All authenticated endpoints require a bearer token in the Authorization header or as a token query parameter.

For complete endpoint documentation including request/response schemas and examples, see REST_API.md.

Endpoint Summary

Category Endpoints Description
Health GET /, HEAD / Server info and health check (unauthenticated)
Authentication POST /v1.0/authenticate Authenticate with email/password (+ optional TenantId) or bearer token
WhoAmI GET /v1.0/whoami Return current authentication context (tenant, role, user)
Tenants PUT/GET /v1.0/tenants, GET/PUT/DELETE/HEAD /v1.0/tenants/{id} Tenant management (global admin only)
Users PUT/GET /v1.0/tenants/{tenantId}/users, GET/PUT/DELETE/HEAD .../users/{id} Tenant-scoped user management
Credentials PUT/GET /v1.0/tenants/{tenantId}/credentials, GET/PUT/DELETE/HEAD .../credentials/{id} Tenant-scoped credential management
Buckets PUT/GET /v1.0/buckets, GET/DELETE/HEAD /v1.0/buckets/{name} S3 bucket management (tenant-scoped by {tenantId}_ prefix)
Bucket Objects GET/PUT/POST/DELETE /v1.0/buckets/{name}/objects S3 object management with upload, download, metadata, and directory creation (tenant-scoped)
Collections PUT/GET /v1.0/collections, GET/PUT/DELETE/HEAD /v1.0/collections/{id} RecallDB collection management (admin only)
Collection Records PUT/GET /v1.0/collections/{id}/records, GET/DELETE .../records/{recordId} Browse and manage records within collections (admin only)
Collection Metadata GET /v1.0/collections/{id}/labels/distinct, GET .../tags/distinct Discover distinct label values and tag keys in a collection (admin only)
Ingestion Rules PUT/GET /v1.0/ingestion-rules, GET/PUT/DELETE/HEAD /v1.0/ingestion-rules/{id} Document processing rule management
Embedding Endpoints PUT /v1.0/endpoints/embedding, POST .../enumerate, GET/PUT/DELETE/HEAD .../{id}, GET .../health, POST .../test Partio embedding endpoint management and smoke testing (admin only)
Completion Endpoints PUT /v1.0/endpoints/completion, POST .../enumerate, GET/PUT/DELETE/HEAD .../{id}, GET .../health, POST .../test Partio completion endpoint management and smoke testing (admin only)
Assistants PUT/GET /v1.0/assistants, GET/PUT/DELETE/HEAD /v1.0/assistants/{id} Assistant management (owner or admin)
Assistant Settings GET/PUT /v1.0/assistants/{id}/settings, POST .../settings/slack/verify Per-assistant endpoint, prompt, RAG, and Slack configuration. Includes draft Slack connectivity verification (owner or admin).
Crawl Plans PUT/GET /v1.0/crawlplans, GET/PUT/DELETE/HEAD /v1.0/crawlplans/{id}, POST .../start, POST .../stop, POST .../connectivity, GET .../enumerate Crawler management with schedule control, connectivity testing, and content preview
Crawl Operations GET /v1.0/crawlplans/{id}/operations, GET .../statistics, GET/DELETE .../operations/{id}, GET .../statistics, GET .../enumeration Crawl execution history, statistics, and enumeration file access
Documents PUT/GET /v1.0/documents, GET/DELETE/HEAD /v1.0/documents/{id}, GET .../processing-log Document upload, management, and processing log access
Feedback GET /v1.0/feedback, GET/DELETE /v1.0/feedback/{id} View and manage user feedback
History GET /v1.0/history, GET/DELETE /v1.0/history/{id} View and manage chat history with timing metrics
Threads GET /v1.0/threads List conversation threads
Models GET /v1.0/models, POST /v1.0/models/pull, GET .../pull/status, DELETE /v1.0/models/{modelName} List, pull, delete, and check pull status for inference models
Eval Facts PUT/GET /v1.0/eval/facts, GET/PUT/DELETE /v1.0/eval/facts/{factId} Ground-truth fact management for RAG evaluation
Eval Runs POST/GET /v1.0/eval/runs, GET/DELETE /v1.0/eval/runs/{runId}, GET .../results, GET .../stream Start, list, and stream evaluation runs with LLM-judged results
Eval Results GET /v1.0/eval/results/{resultId} Retrieve individual evaluation result details
Eval Judge Prompt GET /v1.0/eval/judge-prompt/default Retrieve the default judge prompt template
Configuration GET/PUT /v1.0/configuration View and update server configuration (admin only)
Public Chat POST /v1.0/assistants/{id}/chat Chat completion with RAG and optional metadata filtering (unauthenticated, SSE or JSON)
Public Generate POST /v1.0/assistants/{id}/generate Lightweight inference without RAG (unauthenticated)
Public Compact POST /v1.0/assistants/{id}/compact Force conversation compaction (unauthenticated)
Public Feedback POST /v1.0/assistants/{id}/feedback Submit feedback (unauthenticated)
Public Info GET /v1.0/assistants/{id}/public Get assistant public info and appearance (unauthenticated)
Public Metadata GET /v1.0/assistants/{id}/labels/distinct, GET .../tags/distinct Discover available label and tag filter values for an assistant's collection (unauthenticated)
Public Threads POST /v1.0/assistants/{id}/threads Create a conversation thread (unauthenticated)

Architecture

                          ┌──────────────────┐
                          │    Dashboard     │
                          │  (React / Vite)  │
                          │    Port 8801     │
                          └────────┬─────────┘
                                   │
                                   │ HTTP (nginx reverse proxy)
                                   ▼
                          ┌──────────────────┐
                          │  AssistantHub    │
                          │ Server (.NET 10) │
                          │    Port 8800     │
                          └──┬────┬────┬──┬──┘
                             │    │    │  │
              ┌──────────────┘    │    │  └──────────────┐
              │                   │    │                 │
              ▼                   ▼    ▼                 ▼
   ┌──────────────────┐ ┌────────────────┐    ┌──────────────────┐
   │   DocumentAtom   │ │   RecallDB     │    │      Less3       │
   │ (Doc Processing) │ │(Vector Search) │    │  (S3 Storage)    │
   │    Port 8301     │ │   Port 8401    │    │    Port 8000     │
   └────────┬─────────┘ └────────┬───────┘    └──────────────────┘
            │                    │
            ▼                    ▼
   ┌──────────────────┐ ┌──────────────────┐
   │     Partio       │ │    pgvector      │
   │ (Chunk/Embed)    │ │  (PostgreSQL)    │
   │    Port 8321     │ │    Port 5432     │
   └────────┬─────────┘ └──────────────────┘
            │
            ▼
   ┌──────────────────┐
   │     Ollama       │
   │  (LLM Inference) │
   │   Port 11434     │
   └──────────────────┘

Document Ingestion Data Flow

  ┌─────────┐       ┌──────────────┐       ┌──────────────┐
  │  User   │       │ AssistantHub │       │    Less3     │
  │(Browser │──1───►│   Server     │──2───►│ (S3 Storage) │
  │  or API)│       │              │       └──────────────┘
  └─────────┘       └──────┬───────┘
                           │
                      3    │
                           ▼
                    ┌──────────────┐
                    │ DocumentAtom │   Extracts text cells
                    │              │   from PDF, DOCX, HTML, etc.
                    └──────┬───────┘
                           │
                      4    │  Text cells
                           ▼
                    ┌──────────────┐
                    │    Partio    │   Optionally summarizes cells,
                    │              │   chunks text, computes embeddings
                    └──────┬───────┘
                           │
                      5    │  Chunks + embeddings
                           ▼
                    ┌──────────────┐
                    │   RecallDB   │   Stores chunks and vectors
                    │  (pgvector)  │   for retrieval
                    └──────────────┘
  1. User uploads a document via the API or dashboard, selecting an ingestion rule.
  2. The document file is stored in the ingestion rule's S3 bucket via Less3.
  3. DocumentAtom extracts text content from the document, returning structured cells.
  4. Partio processes the cells: optionally summarizes (pre- or post-chunking per the rule), splits into chunks using the rule's chunking strategy, and computes vector embeddings via the configured embedding endpoint.
  5. Chunks and embeddings are stored in the ingestion rule's RecallDB collection. Chunk record IDs are saved on the document for cleanup on deletion.

Chat Data Flow

  ┌─────────┐       ┌──────────────┐       ┌──────────────┐
  │  User   │       │ AssistantHub │       │   RecallDB   │
  │(Browser │──1───►│   Server     │──2───►│  (pgvector)  │
  │  or API)│       │              │◄──3───│              │
  └─────────┘       └──────┬───────┘       └──────────────┘
       ▲                   │
       │                4  │  Context + messages
       │                   ▼
       │            ┌──────────────┐
       └─────6──────│    Ollama    │   Generates response
                    │  (Inference) │   (streaming or batch)
                    └──────────────┘
  1. User sends a message to the chat endpoint with conversation history.
  2. If RAG is enabled (and the retrieval gate permits), the server embeds the query and searches RecallDB using the assistant's configured search mode (vector, full-text, or hybrid).
  3. RecallDB returns relevant document chunks ranked by similarity score.
  4. The server assembles the system prompt with retrieved context and sends the full message list to the configured inference provider (Ollama, OpenAI, or Gemini). If the conversation exceeds the context window, older messages are compacted first.
  5. The LLM generates a response.
  6. The response is streamed back to the user token-by-token via SSE (or returned as a complete JSON response). Chat history with timing metrics is persisted.

Tech Stack

  • Backend: .NET 10 (C#), WatsonWebserver
  • Frontend: React 19, Vite 6, JavaScript
  • Database: SQLite (default), PostgreSQL, SQL Server, MySQL
  • Vector Search: RecallDB backed by PostgreSQL with pgvector
  • Document Processing: DocumentAtom (text extraction), Partio (chunking, embedding, summarization)
  • Object Storage: Less3 (S3-compatible)
  • Inference Providers: Ollama (local), OpenAI (cloud), Gemini (cloud)
  • Containerization: Docker, Docker Compose
  • Web Server (Dashboard): nginx

SDKs

Client libraries are available for integrating with the AssistantHub API:

SDK Location Description
JavaScript/TypeScript sdk/js/ Dual ESM/CJS output, native fetch, async generators for SSE streaming
Python sdk/python/ Pydantic v2 models, httpx client, PEP 561 compliant
C# sdk/csharp/ .NET 8.0, System.Text.Json, typed exceptions, IAsyncEnumerable streaming

Each SDK directory contains its own README with installation instructions and usage examples.


Issues, Feedback, and Improvements

  • Bug Reports and Feature Requests -- Use the Issues tab to report bugs or request new features.
  • Questions and Discussion -- Use the Discussions tab for general questions, ideas, and community feedback.
  • Improvements -- We are happy to accept pull requests, please keep them focused and short

License

This project is licensed under the MIT License. See LICENSE.md for details.

Copyright (c) 2026 Joel Christner.

About

AssistantHub is an end-to-end AI assistant management platform using RAG with vector, full-text, and metadata search

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors