Free Tool · AI Cost

LLM API Cost Calculator — Compare GPT, Claude & Gemini Pricing

Free LLM API cost calculator: estimate and compare monthly OpenAI, Anthropic Claude and Google Gemini API costs from your token usage — and see how task routing and prompt caching cut the bill 30–50%.

For engineering and finance leaders sizing or trimming an AI bill.

Your current model

Requests / month

Avg input tokens / request

Avg output tokens / request

Optimization levers

Route simple tasks to a cheaper model — 40%

Cheaper model for routed tasks

Cacheable input (prompt caching) — 30%

Current spend

$562.5

/ month

Optimized spend

$316.93

/ month

You could save

44%

$245.57/mo · $2,947/yr

Prices as of 2026-06-01 — verify against the provider before budgeting.

Same usage, every model

Model	Provider	$ / month	vs. yours
Gemini 1.5 Flash	Google	$13.13	-98%
GPT-4.1 nano	OpenAI	$17.5	-97%
Gemini 2.5 Flash-Lite	Google	$17.5	-97%
Gemini 2.0 Flash	Google	$17.5	-97%
GPT-4o mini	OpenAI	$26.25	-95%
Grok 4 Fast (reasoning)	xAI	$27.5	-95%
GPT-5.4 nano	OpenAI	$46.25	-92%
DeepSeek-V3	DeepSeek	$47.75	-92%
GPT-4.1 mini	OpenAI	$70	-88%
Gemini 2.5 Flash	Google	$85	-85%
Grok Build 0.1	xAI	$125	-78%
Grok 4.3	xAI	$156.25	-72%
GPT-5.4 mini	OpenAI	$168.75	-70%
o4-mini	OpenAI	$192.5	-66%
Claude Haiku 4.5	Anthropic	$200	-64%
Gemini 1.5 Pro	Google	$218.75	-61%
Grok 4.20	xAI	$300	-47%
Gemini 3.5 Flash	Google	$337.5	-40%
Gemini 2.5 Pro	Google	$343.75	-39%
GPT-4.1	OpenAI	$350	-38%
o3	OpenAI	$350	-38%
Grok-2	xAI	$400	-29%
GPT-4o	OpenAI	$437.5	-22%
Gemini 3.1 Pro	Google	$450	-20%
GPT-5.4YOURS	OpenAI	$562.5	—
Claude Sonnet 4.6	Anthropic	$600	+7%
Claude Opus 4.8	Anthropic	$1,000	+78%

These are estimates. Want the real per-application breakdown?

A calculator projects forward; attributing a shared AI invoice across apps and reconciling it to the actual bill is what the Anchor platform does. I help teams cut AI spend 30–50% without losing capability.

Audit my AI spend See the Anchor case study

How to estimate your LLM API cost

Your monthly LLM API bill is driven by three numbers: how many requests you make, how many input (prompt) tokens and output (completion) tokens each request uses, and the per-token price of the model you call. This LLM API cost calculator multiplies them out across OpenAI (GPT-5.4, GPT-4.1, o3, o4-mini), Anthropic Claude (Opus 4.8, Sonnet 4.6, Haiku 4.5) and Google Gemini (3.1 Pro, 2.5 Pro, 2.5 Flash) — plus xAI Grok and DeepSeek — so you can compare the real cost of the same workload on every model at once.

The two levers that cut an AI bill 30–50%

Most teams overpay because every request hits a frontier model regardless of difficulty, and because repeated context is billed at full price on every call. The calculator lets you model the two highest- leverage fixes:

Task-based routing.Send the share of simple, high-volume requests that do not need a frontier model to a cheaper one (for example GPT-4o mini, Claude 3.5 Haiku or Gemini Flash) and watch the blended cost fall.
Prompt caching. When a large slice of your input is shared context — system prompts, retrieved documents, few-shot examples — caching it bills those tokens at a fraction of the standard input rate.

These mirror the playbook in the deep-dive, Cut your AI bill 30–50% without losing capability.

From estimate to attributed spend

A calculator projects forward; it cannot tell you which of your applications drove last month's shared invoice. That requires capturing real usage and reconciling it to the actual bill — the approach in the Anchor cost-attribution case study. If you want a defensible, per-application breakdown of your AI spend, get in touch.

Frequently asked questions

How do I estimate my monthly LLM API cost?

Multiply your input and output tokens per request by the per-token price of your model, then by your monthly request volume. This calculator does that across OpenAI, Anthropic and Google models at once, and adds the effect of prompt caching and routing simple tasks to a cheaper model.

Why is my OpenAI / Claude / Gemini bill higher than expected?

The most common causes are routing every request to a frontier model regardless of difficulty, paying full input price for repeated context that could be cached, and verbose outputs. Right-sizing the model per task and caching shared context typically removes 30–50% of spend without losing capability.

Are these prices exact?

They are sensible, dated defaults based on each provider’s published list prices. Providers change pricing and meter some endpoints differently, so treat the result as a planning estimate and confirm against the current provider pricing page before budgeting.

How can I get an accurate per-application breakdown of shared AI spend?

A calculator estimates forward; attributing a real shared invoice across applications requires capturing actual usage and reconciling to the bill. That is exactly what the Anchor case study covers — invoice-anchored, per-application cost attribution.

Read: Cut your AI bill 30–50%

Case study: Anchor cost attribution

Hire me to audit your AI spend

More free tools

JSON to SQL SchemaPaste JSON, get a typed CREATE TABLE statement. Infers column types and nullability for PostgreSQL, MySQL or SQLite — nothing leaves your browser.

Available for new work

Have a data, analytics, or AI problem worth solving?

From ETL pipelines to cloud warehouses and self-hosted AI, let's scope the work with clear outcomes. I reply within one business day.

Start a Project Connect on LinkedIn