fix: reorder extraction prompts for LLM prompt caching by hafezparast · Pull Request #1873 · unclecode/crawl4ai

hafezparast · 2026-03-27T08:25:53Z

Summary

Reorders all 4 extraction prompt templates in prompts.py to put instructions before URL/HTML content
Enables LLM prompt caching — instruction prefix stays constant across calls and gets cached
Up to 90% cheaper input tokens with Anthropic, 50% cheaper with OpenAI

What changed

crawl4ai/prompts.py — 4 templates reordered:

PROMPT_EXTRACT_BLOCKS
PROMPT_EXTRACT_BLOCKS_WITH_INSTRUCTION
PROMPT_EXTRACT_SCHEMA_WITH_INSTRUCTION
PROMPT_EXTRACT_INFERRED_SCHEMA

Before:

URL → HTML content → Instructions

After:

Instructions → URL → HTML content

Why this works

LLM providers cache input token prefixes. When the same prefix appears across multiple requests, cached tokens are billed at a discount:

Anthropic: 90% discount on cached input tokens
OpenAI: 50% discount on cached input tokens

Since instructions are identical across pages in a crawl session but URL/HTML change every call, putting instructions first makes them a cacheable prefix.

Risk

Zero — LLMs produce identical output regardless of section ordering within the prompt. Only the token billing changes. All template variables ({URL}, {HTML}, {REQUEST}, {SCHEMA}) remain intact.

Test plan

All 4 template variables verified present after reorder
Python import of all prompts succeeds
15/15 unit tests pass

Closes #1699

🤖 Generated with Claude Code

Move instructions before URL/HTML content in all 4 extraction prompt templates. This enables LLM providers (Anthropic, OpenAI) to cache the instruction prefix across calls, reducing input token costs by up to 90% (Anthropic) or 50% (OpenAI) for batch extraction jobs. Before: URL → HTML → Instructions (instructions not cacheable) After: Instructions → URL → HTML (instructions cached as prefix) No behavioral change — LLMs produce identical output regardless of section ordering. Only the token billing is affected. Closes unclecode#1699 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JonasPapinigis · 2026-03-27T19:21:02Z

Hahaha no way, this is why he's the goat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: reorder extraction prompts for LLM prompt caching#1873

fix: reorder extraction prompts for LLM prompt caching#1873
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/prompt-caching-order-1699

hafezparast commented Mar 27, 2026

Uh oh!

JonasPapinigis commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hafezparast commented Mar 27, 2026

Summary

What changed

Why this works

Risk

Test plan

Uh oh!

JonasPapinigis commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants