Prompt Cache ROI Calculator

Project monthly savings and breakeven from prompt caching across Anthropic and OpenAI models with embedded June 2026 pricing.

Prompt structure

Static prefix tokensSystem + few-shot + retrieved context that stays constant.Query tokensPer-call user content (not cached).Output tokensAverage model response length.Calls per day

Cache hit rate85%

Fraction of calls landing on a warm cache. 100% = always warm; 0% = every call writes fresh.

Selected model

Input$3.00 / 1M

Cache write$3.75 / 1M

Cached read$0.3000 / 1M

Output$15.00 / 1M

Cache read 90% off input; first-write surcharge 25%.

No-cache / month

$2250

Cached / month

$1202

Monthly savings

$1048

46.6%

Breakeven calls

~0.0h spread

Calls / month

60,000

Model	No-cache / mo	Cached / mo	Savings	Savings %	Breakeven
Claude Fable 5 Anthropic	$7500	$4008	$3492	46.6%	1 calls
Claude Opus 4.8 Anthropic	$3750	$2004	$1746	46.6%	1 calls
Claude Sonnet 4.6 Anthropic	$2250	$1202	$1048	46.6%	1 calls
Claude Haiku 4.5 Anthropic	$750	$401	$349	46.6%	1 calls
GPT-4o OpenAI	$1755	$1245	$510	29.1%	—
GPT-4o Mini OpenAI	$105	$74.70	$30.60	29.1%	—

How the math works

Without caching, every call pays the full input rate on (static + query) tokens. With caching, the first call writes the static prefix at the cache-write rate (Anthropic adds a 25% surcharge on writes; OpenAI does not), and every subsequent warm read replaces input rate with the cached-read rate (90% off Anthropic, 50% off OpenAI). Hit rate scales how many of your monthly calls land on a warm prefix.

What This Tool Does

Prompt Cache ROI Calculator is built for deterministic developer and agent workflows.

Compute Anthropic and OpenAI prompt-caching breakeven and projected monthly savings vs no-cache, with embedded June 2026 pricing for Claude and GPT models.

Use How to Use for execution steps and FAQ for constraints, policies, and edge cases.

Last updated: July 19, 2026

This tool is provided as-is for convenience. Output should be verified before use in any production or critical context.

Agent Invocation

Best Path For Builders

Browser workflow

Runs instantly in the browser with private local processing and copy/export-ready output.

Browser Workflow

This tool is optimized for instant in-browser execution with local data handling. Run it here and copy/export the output directly.

/prompt-cache-roi-calculator/

For automation planning, fetch the canonical contract at /api/tool/prompt-cache-roi-calculator.json.

How to Use Prompt Cache ROI Calculator

1

Describe your prompt structure

Enter the static prefix size (system, few-shot, retrieved context that stays constant), per-call query tokens, average output tokens, and how many calls per day you expect. Each field accepts plain integers.
2

Set the cache hit rate

The slider controls what fraction of monthly calls land on a warm prefix. Real-world hit rates depend on TTL and traffic shape — try 80-95% for steady traffic and lower values for spiky workloads.
3

Pick a model

Each model card shows input, cached read, cache write (Anthropic only), and output prices per million tokens. The June 2026 figures cover Claude Fable 5, Opus 4.8, Sonnet 4.6, Haiku 4.5, GPT-4o, and GPT-4o-mini.
4

Read savings and breakeven

Headline tiles show no-cache vs cached monthly cost, dollar savings, and how many warm reads pay back the first cache write. The full comparison table runs the same calculation across every embedded model.

Frequently Asked Questions

Where does the pricing come from?

The June 2026 public list prices for Anthropic (Fable 5, Opus 4.8, Sonnet 4.6, Haiku 4.5) and OpenAI (GPT-4o, GPT-4o-mini), expressed per million tokens. Numbers cover input, cached read, cache write where charged, and output.

How is breakeven calculated?

Anthropic charges a 25% premium to write a cache entry. Breakeven divides that one-time premium by the per-call savings of cached reads vs uncached input. OpenAI has no write premium, so breakeven is immediate when caching applies.

What does cache hit rate mean here?

The fraction of your monthly calls that land on a warm prefix. Cold calls pay the full write rate; warm calls pay the cached read rate. Adjust the slider to see how your TTL and traffic shape change projected savings.

Does it send my data to a server?

No. All math happens in your browser using the embedded pricing table. Token counts, cache assumptions, and cost projections never leave the page.

Why are GPT-4o savings smaller than Claude savings?

OpenAI cached input is roughly 50% off uncached input, while Anthropic cached read is 90% off. Anthropic also charges a one-time write premium, so heavy reuse pays it back faster than a single warm read does.

Prompt Cache ROI Calculator