websource

Turn websites into reusable structured data sources — through conversation, not code.

websource is a local-first CLI tool that analyzes a URL, detects extractable fields (title, price, image, date…), and generates a reusable extraction config that runs on demand or on a schedule. All data stays on your machine in SQLite.

Quick start

git clone https://github.com/2fe2000/websource.git
cd websource
npm install
npx playwright install chromium

# Interactive setup wizard
npx tsx bin/websource.ts init https://books.toscrape.com

Commands

Command	Description
`init [url]`	Guided setup for a new data source
`scan <url>`	Analyze a page without saving
`sources list`	List all sources
`sources show <id>`	Show source details
`preview <id>`	Dry-run extraction (no DB write)
`extract <id>`	Run extraction and save
`diff <id>`	Show changes since last run
`schedule <id> <expr>`	Set a cron refresh schedule
`serve`	Start local REST API + scheduler
`export <id>`	Export to JSON/CSV
`doctor`	Run health checks

Claude Code integration (optional)

If you use Claude Code, websource exposes a full MCP server so Claude can call extraction tools directly — no bash commands needed.

MCP server (project-level — automatic)

When you open this project in Claude Code, it automatically picks up .mcp.json and connects to the websource MCP server. No extra setup required.

MCP server (user-level — any directory)

Register once to use the /websource wizard from any directory:

claude mcp add websource -s user -- npx tsx /absolute/path/to/websource/bin/mcp-server.ts

Claude Code skill (interactive wizard)

Install the /websource slash command skill:

bash scripts/install-skill.sh

Then use /websource or paste a URL in any Claude Code chat to launch the guided wizard: category discovery → field selection → schedule → source creation.

Available MCP tools

Tool	Description
`websource_discover_sections`	Detect category/tab structure on a page
`websource_analyze_page`	Detect fields, blocks, and pagination
`websource_create_source`	Create and persist a data source
`websource_preview_extraction`	Dry-run extraction (no DB write)
`websource_run_extraction`	Run extraction and save results
`websource_list_sources`	List all saved sources

Configuration

All config is optional. Copy .env.example to .env to override defaults:

Variable	Default	Description
`WEBSOURCE_DATA_DIR`	`~/.local/share/websource`	Database and log location
`WEBSOURCE_CONFIG_DIR`	`~/.config/websource`	Config file location
`LOG_LEVEL`	`warn`	`trace` / `debug` / `info` / `warn` / `error`

Data storage

All extracted data is stored locally in a single SQLite database:

~/.local/share/websource/
├── websource.db   ← all data
└── logs/          ← log files (production mode only)

Table	Contents
`sources`	Source list (name, URL, status)
`extraction_configs`	Field selectors, fetchMode, and other settings
`runs`	Extraction run history (time, record counts, status)
`snapshots`	The actual extracted records
`diffs`	Added / changed / removed records between runs
`schedules`	Cron schedule settings

Export extracted data:

# JSON
npx tsx bin/websource.ts export <sourceId> --format json

# CSV
npx tsx bin/websource.ts export <sourceId> --format csv

# REST API
npx tsx bin/websource.ts serve
# GET http://localhost:3847/sources/:id/data

Change the storage location — add to .env:

WEBSOURCE_DATA_DIR=/your/custom/path

Architecture

Node.js + TypeScript (ESM, strict)
Cheerio for static HTML parsing, Playwright for JS-rendered pages
SQLite (better-sqlite3) for all local persistence
Fastify for the local REST API
node-cron for scheduling

See docs/ARCHITECTURE.md for details.

Documentation

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bin		bin
docs		docs
scripts		scripts
skills/websource		skills/websource
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

websource

Quick start

Commands

Claude Code integration (optional)

MCP server (project-level — automatic)

MCP server (user-level — any directory)

Claude Code skill (interactive wizard)

Available MCP tools

Configuration

Data storage

Architecture

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

websource

Quick start

Commands

Claude Code integration (optional)

MCP server (project-level — automatic)

MCP server (user-level — any directory)

Claude Code skill (interactive wizard)

Available MCP tools

Configuration

Data storage

Architecture

Documentation

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages