Skip to content

Abstract AssistantServer behind ABC for pluggable voice frameworks#14

Open
sboily wants to merge 2 commits intoServiceNow:mainfrom
sboily:abstract-assistant-server
Open

Abstract AssistantServer behind ABC for pluggable voice frameworks#14
sboily wants to merge 2 commits intoServiceNow:mainfrom
sboily:abstract-assistant-server

Conversation

@sboily
Copy link

@sboily sboily commented Mar 24, 2026

Why

We're building RoomKit — a multi-channel voice/AI framework — and we'd love to use EVA to benchmark our voice agents. Today, the assistant server is tightly coupled to Pipecat, which means testing any other framework requires forking and rewriting internals.

We think EVA's evaluation methodology (bot-to-bot conversations, accuracy + experience metrics) is excellent and would benefit from being framework-agnostic, letting the community benchmark their own voice stacks against the same scenarios and metrics.

What

This PR introduces an AssistantServerBase ABC that codifies the existing contract between ConversationWorker and the assistant server (which was already narrow — just start(), stop(), and get_conversation_stats()). The current Pipecat implementation becomes PipecatAssistantServer, one concrete subclass behind a factory with lazy imports.

After this change, plugging in a new framework requires:

  1. One new file implementing AssistantServerBase
  2. One line in the factory registry
  3. Expanding the Literal type in RunConfig.framework

No changes to the orchestrator, metrics, or evaluation pipeline.

Heads up

This is a first proposal to open a discussion. We understand this is a meaningful architectural change and we're not expecting it to be merged as-is. We'd like to hear from the maintainers:

  • Are you open to making the assistant server pluggable?
  • Does the ABC contract look right, or would you draw the boundary differently?
  • Any concerns about the renaming (pipecat_logsframework_logs)?

Happy to iterate on any of this based on your feedback.

🤖 Generated with Claude Code

@tara-servicenow
Copy link
Collaborator

Thank you for creating this PR! We would also love to make EVA framework agnostic so that its possible to test with frameworks outside of pipecat.

This makes sense as a start - however the bigger challenge we see is implementing any AssistantServer in a way such that it replicates the pipecat logging exactly. Renaming the logs is fine, however our evaluation logic is tightly coupled with the pipecat log entries and format. Any AssistantServer would need to create the exact same type of logs. It would be helpful for us if you could include the AssistantServer class you plan to use, in addition to the abstract base class, so we can see how it would work.

Extract AssistantServerBase ABC from the Pipecat-coupled AssistantServer
so that alternative voice frameworks can be plugged in without modifying
the orchestrator, metrics, or evaluation pipeline.

Changes:
- New AssistantServerBase ABC (src/eva/assistant/base.py) defining the
  server contract: start(), stop(), get_conversation_stats()
- Rename AssistantServer -> PipecatAssistantServer (backward-compat alias)
- Factory function with lazy-import registry in assistant/__init__.py
- Add EVA_FRAMEWORK config field to RunConfig (default: "pipecat")
- Worker uses factory instead of direct import
- Rename pipecat_logs_path -> framework_logs_path throughout
- Remove dead execute_realtime_tool from ToolExecutor
- Move nvidia-riva-client to optional [nvidia] dep (conflicts with
  roomkit's websockets/deepgram-sdk requirements)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sboily sboily force-pushed the abstract-assistant-server branch 2 times, most recently from f9f15e8 to ac045a8 Compare March 25, 2026 16:26
Concrete implementation of AssistantServerBase using RoomKit's voice
pipeline while reusing EVA's AgenticSystem, AuditLog, and ToolExecutor
for LLM reasoning.

Architecture:
- TwilioWebSocketBackend (from roomkit.voice.backends.twilio_ws)
  bridges EVA's Twilio WebSocket protocol to RoomKit's VoiceChannel
- RoomKit VoiceChannel handles STT (Deepgram), TTS (ElevenLabs), VAD
- RoomKit WavFileRecorder with ALL mode for audio output
  (inbound + outbound + mixed WAV files)
- RoomKit hooks produce framework_logs.jsonl events
- EVA's AgenticSystem handles LLM reasoning + tool calling unchanged
- Dedicated write queue for full-duplex WebSocket audio

Tested end-to-end: full conversations with correct metrics,
clean audio recordings, user_behavioral_fidelity = 1.0.

Usage: EVA_FRAMEWORK=roomkit EVA_MODEL__STT=deepgram \
       EVA_MODEL__TTS=elevenlabs python main.py --debug

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sboily sboily force-pushed the abstract-assistant-server branch from ac045a8 to 9ff42b0 Compare March 26, 2026 00:32
@sboily
Copy link
Author

sboily commented Mar 26, 2026

Thanks for the feedback! I've updated the PR with a concrete RoomKitAssistantServer implementation.

What i did:

  • Added a working RoomKit implementation using RoomKit's VoiceChannel (Deepgram STT, ElevenLabs TTS, VAD, audio recording) while reusing EVA's AgenticSystem, AuditLog, and
    ToolExecutor unchanged
  • Ran the full EVA metrics pipeline end-to-end: 12 metrics computed successfully, including task_completion, faithfulness, conciseness, agent_speech_fidelity, tool_call_validity, and
    more
  • The framework_logs.jsonl format contract works — RoomKit hooks produce the same events the metrics pipeline expects

Dependency challenge:
One significant challenge was dependency conflicts. Pipecat pins deepgram-sdk<4 and nvidia-riva-client (which pins websockets==15.0.1), while RoomKit needs deepgram-sdk>=6 and
websockets>=16. They can't coexist in the same venv.

I propose restructuring pyproject.toml to separate framework-specific deps into optional extras:

  pip install eva[pipecat]   # Pipecat + its providers
  pip install eva[roomkit]   # RoomKit + its providers

Core deps (LLM, metrics, evaluation) stay in dependencies. This makes the framework choice explicit and avoids version conflicts. The PR includes this restructuring.

Current status:
This is a working first implementation. Before fine-tuning further, i'd like your feedback on:

  • The overall approach and ABC contract
  • The dependency restructuring in pyproject.toml
  • Any concerns about the log format compatibility

Happy to iterate based on your input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants