LLM API Cost Calculator — Compare GPT, Claude & Gemini Pricing
Free LLM API cost calculator: estimate and compare monthly OpenAI, Anthropic Claude and Google Gemini API costs from your token usage — and see how task routing and prompt caching cut the bill 30–50%.
For engineering and finance leaders sizing or trimming an AI bill.
Optimization levers
Current spend
$562.5
/ month
Optimized spend
$316.93
/ month
You could save
44%
$245.57/mo · $2,947/yr
Same usage, every model
| Model | Provider | $ / month | vs. yours |
|---|---|---|---|
| Gemini 1.5 Flash | $13.13 | -98% | |
| GPT-4.1 nano | OpenAI | $17.5 | -97% |
| Gemini 2.5 Flash-Lite | $17.5 | -97% | |
| Gemini 2.0 Flash | $17.5 | -97% | |
| GPT-4o mini | OpenAI | $26.25 | -95% |
| Grok 4 Fast (reasoning) | xAI | $27.5 | -95% |
| GPT-5.4 nano | OpenAI | $46.25 | -92% |
| DeepSeek-V3 | DeepSeek | $47.75 | -92% |
| GPT-4.1 mini | OpenAI | $70 | -88% |
| Gemini 2.5 Flash | $85 | -85% | |
| Grok Build 0.1 | xAI | $125 | -78% |
| Grok 4.3 | xAI | $156.25 | -72% |
| GPT-5.4 mini | OpenAI | $168.75 | -70% |
| o4-mini | OpenAI | $192.5 | -66% |
| Claude Haiku 4.5 | Anthropic | $200 | -64% |
| Gemini 1.5 Pro | $218.75 | -61% | |
| Grok 4.20 | xAI | $300 | -47% |
| Gemini 3.5 Flash | $337.5 | -40% | |
| Gemini 2.5 Pro | $343.75 | -39% | |
| GPT-4.1 | OpenAI | $350 | -38% |
| o3 | OpenAI | $350 | -38% |
| Grok-2 | xAI | $400 | -29% |
| GPT-4o | OpenAI | $437.5 | -22% |
| Gemini 3.1 Pro | $450 | -20% | |
| GPT-5.4YOURS | OpenAI | $562.5 | — |
| Claude Sonnet 4.6 | Anthropic | $600 | +7% |
| Claude Opus 4.8 | Anthropic | $1,000 | +78% |
These are estimates. Want the real per-application breakdown?
A calculator projects forward; attributing a shared AI invoice across apps and reconciling it to the actual bill is what the Anchor platform does. I help teams cut AI spend 30–50% without losing capability.
How to estimate your LLM API cost
Your monthly LLM API bill is driven by three numbers: how many requests you make, how many input (prompt) tokens and output (completion) tokens each request uses, and the per-token price of the model you call. This LLM API cost calculator multiplies them out across OpenAI (GPT-5.4, GPT-4.1, o3, o4-mini), Anthropic Claude (Opus 4.8, Sonnet 4.6, Haiku 4.5) and Google Gemini (3.1 Pro, 2.5 Pro, 2.5 Flash) — plus xAI Grok and DeepSeek — so you can compare the real cost of the same workload on every model at once.
The two levers that cut an AI bill 30–50%
Most teams overpay because every request hits a frontier model regardless of difficulty, and because repeated context is billed at full price on every call. The calculator lets you model the two highest- leverage fixes:
- Task-based routing.Send the share of simple, high-volume requests that do not need a frontier model to a cheaper one (for example GPT-4o mini, Claude 3.5 Haiku or Gemini Flash) and watch the blended cost fall.
- Prompt caching. When a large slice of your input is shared context — system prompts, retrieved documents, few-shot examples — caching it bills those tokens at a fraction of the standard input rate.
These mirror the playbook in the deep-dive, Cut your AI bill 30–50% without losing capability.
From estimate to attributed spend
A calculator projects forward; it cannot tell you which of your applications drove last month's shared invoice. That requires capturing real usage and reconciling it to the actual bill — the approach in the Anchor cost-attribution case study. If you want a defensible, per-application breakdown of your AI spend, get in touch.
Frequently asked questions
How do I estimate my monthly LLM API cost?
Multiply your input and output tokens per request by the per-token price of your model, then by your monthly request volume. This calculator does that across OpenAI, Anthropic and Google models at once, and adds the effect of prompt caching and routing simple tasks to a cheaper model.
Why is my OpenAI / Claude / Gemini bill higher than expected?
The most common causes are routing every request to a frontier model regardless of difficulty, paying full input price for repeated context that could be cached, and verbose outputs. Right-sizing the model per task and caching shared context typically removes 30–50% of spend without losing capability.
Are these prices exact?
They are sensible, dated defaults based on each provider’s published list prices. Providers change pricing and meter some endpoints differently, so treat the result as a planning estimate and confirm against the current provider pricing page before budgeting.
How can I get an accurate per-application breakdown of shared AI spend?
A calculator estimates forward; attributing a real shared invoice across applications requires capturing actual usage and reconciling to the bill. That is exactly what the Anchor case study covers — invoice-anchored, per-application cost attribution.