The STREAMIND challenge

Welcome to the STREAMIND Grand Challenge!

The objective of the challenge is to design and implement a complete streaming pipeline, bridging the gap between live audio reception and structured language model inference. Participants will build their solutions on top of Juturna, an open-source Python framework developed by Meetecho for real-time AI data pipeline prototyping.

Important dates

Registration deadline: April 3, 2026
Paper submission deadline: June 19, 2026
Acceptance notification: July 17, 2026

To register, submit a form here.

Challenge overview

The core task of this challenge is the implementation of a pipeline that consumes live audio streams as inputs, and produces textual summaries and keywords as outputs.

Roughly speaking, such pipeline can be organized around five main processing steps:

Audio reception via WebRTC/RTP: a live audio stream is delivered as a Opus-encoded mono RTP stream on a dedicated port.
Incremental ASR transcription: the incoming audio stream is incrementally transcribed by an ASR node operating on short, consecutive and partially overlapping audio chunks of a set temporal length.
Novel chunk extraction: novel transcript text is isolated while content from previous audio chunks is discarded.
Window aggregation: transcription chunks are accumulated into context windows of 300 seconds.
Summarization: aggregated windows are processed to produce structured textual outputs.
Transmission: summary objects are stored locally on the filesystem and transmitted to a destination endpoint through POST requests.

Evaluation metrics

All submitted chunks for an audio source will be assigned a composite score.

Chunk $i$, associated with the reference summary $ref_i$, contains the generated summary $gen_i$ and the keyword list $[k_1, k_2, k_3]_i$. Its summary score $S_i$ can then be computed using the LLM-as-a-judge technique, and normalized between 0 and 30.

$$ S_i = \text{Judge}(ref_i, gen_i, [k_1, k_2, k_3]_i) : 0 <= S_i <= 30 $$

The final chunk score $C_i$ is obtained by subtracting the latency associated with the chunk $i$ from its summary score.

$$ C_{i} = S_i - {\text{latency}_i} $$

The final score for an audio source is the average of all scores of its chunks.

References and Resources

Juturna framework: https://github.com/meetecho/juturna
Janus WebRTC server: https://github.com/meetecho/janus-gateway

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The STREAMIND challenge

Important dates

Challenge overview

Evaluation metrics

References and Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The STREAMIND challenge

Important dates

Challenge overview

Evaluation metrics

References and Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages