scribe

State-of-the-art local audio transcription with speaker diarization for macOS.

100% local. No cloud. No API keys. No data leaves your machine.

Features

Transcription — Accurate speech-to-text powered by OpenAI Whisper (via WhisperKit)
Speaker diarization — Identify who said what, powered by pyannote (via SpeakerKit)
Apple Silicon optimized — Runs on CoreML and the Apple Neural Engine
Multiple output formats — Plain text, JSON (with word timestamps), SRT, VTT
99 languages — Supports all languages that Whisper supports
Fast — Processes audio faster than real-time on Apple Silicon

Install

brew install theam/tap/scribe

Or build from source:

git clone https://github.com/theam/scribe.git
cd scribe
swift build -c release
cp .build/release/scribe /usr/local/bin/

Usage

Basic transcription

scribe transcribe meeting.wav

With speaker diarization

scribe transcribe meeting.wav --diarize

Specify number of speakers (improves accuracy)

scribe transcribe meeting.wav --diarize --speakers 4

Output formats

scribe transcribe meeting.wav --format txt    # plain text (default)
scribe transcribe meeting.wav --format json   # structured JSON with word timestamps
scribe transcribe meeting.wav --format srt    # SRT subtitles
scribe transcribe meeting.wav --format vtt    # WebVTT subtitles

Save to file

scribe transcribe meeting.wav --format json --output transcript.json

Choose a model

scribe transcribe meeting.wav --model large-v3         # best accuracy
scribe transcribe meeting.wav --model large-v3-turbo   # best speed/accuracy (default)
scribe transcribe meeting.wav --model small             # fastest, lower accuracy

Force a language

scribe transcribe meeting.wav --language es    # Spanish
scribe transcribe meeting.wav --language fr    # French

Manage models

scribe models list                       # see available/downloaded models
scribe models download large-v3-turbo    # download a model
scribe models remove large-v3            # remove a downloaded model

Output Examples

Plain text with diarization

[00:00:12] Speaker 1: Hello everyone, welcome to the meeting.
[00:00:18] Speaker 2: Thanks for joining. Let's start with the agenda.

JSON

{
  "metadata": {
    "duration": 3612.5,
    "diarization": true
  },
  "segments": [
    {
      "start": 12.0,
      "end": 16.5,
      "text": "Hello everyone, welcome to the meeting.",
      "speaker": "Speaker 1",
      "words": [
        { "start": 12.0, "end": 12.4, "text": "Hello" }
      ]
    }
  ]
}

Requirements

macOS 14 (Sonoma) or later
Apple Silicon (M1 or later) recommended — Intel Macs fall back to CPU inference

Models

The default model (large-v3-turbo, ~632MB) is downloaded automatically on first use. Other models can be downloaded with scribe models download.

Model	Size	Speed	Accuracy	Languages
tiny	~75MB	Fastest	Lower	99
base	~142MB	Very fast	Fair	99
small	~466MB	Fast	Good	99
medium	~1.5GB	Moderate	Very good	99
large-v3-turbo	~632MB	Fast	Very good	99
large-v3	~3.1GB	Slower	Best	99

Acknowledgments

scribe is built on the shoulders of excellent open-source projects:

OpenAI Whisper (Apache 2.0) — The speech recognition model that powers transcription
WhisperKit (MIT) by Argmax — CoreML implementation of Whisper for Apple Silicon
SpeakerKit (MIT) by Argmax — CoreML speaker diarization
pyannote.audio (MIT) by Herve Bredin — The diarization model architecture that SpeakerKit builds on
swift-argument-parser (Apache 2.0) by Apple — CLI argument parsing

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
Sources/scribe		Sources/scribe
.gitignore		.gitignore
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scribe

Features

Install

Usage

Basic transcription

With speaker diarization

Specify number of speakers (improves accuracy)

Output formats

Save to file

Choose a model

Force a language

Manage models

Output Examples

Plain text with diarization

JSON

Requirements

Models

Acknowledgments

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scribe

Features

Install

Usage

Basic transcription

With speaker diarization

Specify number of speakers (improves accuracy)

Output formats

Save to file

Choose a model

Force a language

Manage models

Output Examples

Plain text with diarization

JSON

Requirements

Models

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages