Abstract AssistantServer behind ABC for pluggable voice frameworks by sboily · Pull Request #14 · ServiceNow/eva

sboily · 2026-03-24T22:06:07Z

Why

We're building RoomKit — a multi-channel voice/AI framework — and we'd love to use EVA to benchmark our voice agents. Today, the assistant server is tightly coupled to Pipecat, which means testing any other framework requires forking and rewriting internals.

We think EVA's evaluation methodology (bot-to-bot conversations, accuracy + experience metrics) is excellent and would benefit from being framework-agnostic, letting the community benchmark their own voice stacks against the same scenarios and metrics.

What

This PR introduces an AssistantServerBase ABC that codifies the existing contract between ConversationWorker and the assistant server (which was already narrow — just start(), stop(), and get_conversation_stats()). The current Pipecat implementation becomes PipecatAssistantServer, one concrete subclass behind a factory with lazy imports.

After this change, plugging in a new framework requires:

One new file implementing AssistantServerBase
One line in the factory registry
Expanding the Literal type in RunConfig.framework

No changes to the orchestrator, metrics, or evaluation pipeline.

Heads up

This is a first proposal to open a discussion. We understand this is a meaningful architectural change and we're not expecting it to be merged as-is. We'd like to hear from the maintainers:

Are you open to making the assistant server pluggable?
Does the ABC contract look right, or would you draw the boundary differently?
Any concerns about the renaming (pipecat_logs → framework_logs)?

Happy to iterate on any of this based on your feedback.

🤖 Generated with Claude Code

tara-servicenow · 2026-03-24T23:08:07Z

Thank you for creating this PR! We would also love to make EVA framework agnostic so that its possible to test with frameworks outside of pipecat.

This makes sense as a start - however the bigger challenge we see is implementing any AssistantServer in a way such that it replicates the pipecat logging exactly. Renaming the logs is fine, however our evaluation logic is tightly coupled with the pipecat log entries and format. Any AssistantServer would need to create the exact same type of logs. It would be helpful for us if you could include the AssistantServer class you plan to use, in addition to the abstract base class, so we can see how it would work.

Extract AssistantServerBase ABC from the Pipecat-coupled AssistantServer so that alternative voice frameworks can be plugged in without modifying the orchestrator, metrics, or evaluation pipeline. Changes: - New AssistantServerBase ABC (src/eva/assistant/base.py) defining the server contract: start(), stop(), get_conversation_stats() - Rename AssistantServer -> PipecatAssistantServer (backward-compat alias) - Factory function with lazy-import registry in assistant/__init__.py - Add EVA_FRAMEWORK config field to RunConfig (default: "pipecat") - Worker uses factory instead of direct import - Rename pipecat_logs_path -> framework_logs_path throughout - Remove dead execute_realtime_tool from ToolExecutor - Move nvidia-riva-client to optional [nvidia] dep (conflicts with roomkit's websockets/deepgram-sdk requirements) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Concrete implementation of AssistantServerBase using RoomKit's voice pipeline while reusing EVA's AgenticSystem, AuditLog, and ToolExecutor for LLM reasoning. Architecture: - TwilioWebSocketBackend (from roomkit.voice.backends.twilio_ws) bridges EVA's Twilio WebSocket protocol to RoomKit's VoiceChannel - RoomKit VoiceChannel handles STT (Deepgram), TTS (ElevenLabs), VAD - RoomKit WavFileRecorder with ALL mode for audio output (inbound + outbound + mixed WAV files) - RoomKit hooks produce framework_logs.jsonl events - EVA's AgenticSystem handles LLM reasoning + tool calling unchanged - Dedicated write queue for full-duplex WebSocket audio Tested end-to-end: full conversations with correct metrics, clean audio recordings, user_behavioral_fidelity = 1.0. Usage: EVA_FRAMEWORK=roomkit EVA_MODEL__STT=deepgram \ EVA_MODEL__TTS=elevenlabs python main.py --debug Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sboily · 2026-03-26T00:36:19Z

Thanks for the feedback! I've updated the PR with a concrete RoomKitAssistantServer implementation.

What i did:

Added a working RoomKit implementation using RoomKit's VoiceChannel (Deepgram STT, ElevenLabs TTS, VAD, audio recording) while reusing EVA's AgenticSystem, AuditLog, and
ToolExecutor unchanged
Ran the full EVA metrics pipeline end-to-end: 12 metrics computed successfully, including task_completion, faithfulness, conciseness, agent_speech_fidelity, tool_call_validity, and
more
The framework_logs.jsonl format contract works — RoomKit hooks produce the same events the metrics pipeline expects

Dependency challenge:
One significant challenge was dependency conflicts. Pipecat pins deepgram-sdk<4 and nvidia-riva-client (which pins websockets==15.0.1), while RoomKit needs deepgram-sdk>=6 and
websockets>=16. They can't coexist in the same venv.

I propose restructuring pyproject.toml to separate framework-specific deps into optional extras:

  pip install eva[pipecat]   # Pipecat + its providers
  pip install eva[roomkit]   # RoomKit + its providers

Core deps (LLM, metrics, evaluation) stay in dependencies. This makes the framework choice explicit and avoids version conflicts. The PR includes this restructuring.

Current status:
This is a working first implementation. Before fine-tuning further, i'd like your feedback on:

The overall approach and ABC contract
The dependency restructuring in pyproject.toml
Any concerns about the log format compatibility

Happy to iterate based on your input.

sboily force-pushed the abstract-assistant-server branch 2 times, most recently from f9f15e8 to ac045a8 Compare March 25, 2026 16:26

sboily force-pushed the abstract-assistant-server branch from ac045a8 to 9ff42b0 Compare March 26, 2026 00:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract AssistantServer behind ABC for pluggable voice frameworks#14

Abstract AssistantServer behind ABC for pluggable voice frameworks#14
sboily wants to merge 2 commits intoServiceNow:mainfrom
sboily:abstract-assistant-server

sboily commented Mar 24, 2026 •

edited

Loading

Uh oh!

tara-servicenow commented Mar 24, 2026

Uh oh!

sboily commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sboily commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Heads up

Uh oh!

tara-servicenow commented Mar 24, 2026

Uh oh!

sboily commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sboily commented Mar 24, 2026 •

edited

Loading

sboily commented Mar 26, 2026 •

edited

Loading