LLM Pricing Comparison 2025: All Models
Which LLM is cheapest for your use case? It depends on the task. GPT-4o-mini is 16.7x cheaper than GPT-4o, but only for tasks that don't require advanced reasoning. This guide breaks down every major model's pricing and optimal use cases.
Last updated: December 2025. Prices change frequently—verify with provider before committing to production.
Complete Pricing Table
| Provider | Model | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|
| OpenAI | GPT-4o-mini | $0.15 | $0.60 | 128K |
| GPT-4o | $2.50 | $10.00 | 128K | |
| o1-preview | $15.00 | $60.00 | 128K | |
| Anthropic | Claude 3 Haiku | $0.25 | $1.25 | 200K |
| Claude 3.5 Sonnet | $3.00 | $15.00 | 200K | |
| Claude 3 Opus | $15.00 | $75.00 | 200K | |
| Gemini 1.5 Flash | $0.075 | $0.30 | 1M | |
| Gemini 1.5 Pro | $1.25 | $5.00 | 2M | |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M |
Cheapest Model by Use Case
Classification & Extraction
Winner: Gemini 1.5 Flash ($0.075/1M input)
For simple tasks like "Is this email spam?" or "Extract name and email from this text," use the cheapest model. Gemini Flash is 2x cheaper than GPT-4o-mini and performs equivalently.
| Model | Cost per 1K requests | Quality |
|---|---|---|
| Gemini 1.5 Flash | $0.15 (300 tokens/req) | 95% |
| GPT-4o-mini | $0.30 | 96% |
| Claude Haiku | $0.50 | 95% |
Customer Support Q&A
Winner: GPT-4o-mini ($0.15/1M input)
For answering FAQ-style questions from a knowledge base, GPT-4o-mini offers best balance of cost and quality. Gemini Flash is cheaper but occasionally misses context.
Summarization
Winner: Gemini 2.0 Flash ($0.10/1M input) or GPT-4o-mini ($0.15/1M)
Both handle summarization well. Gemini is 33% cheaper but GPT-4o-mini has slightly better nuance.
Code Generation
Winner: Claude 3.5 Sonnet ($3/1M input)
Despite being 20x more expensive than GPT-4o-mini, Claude Sonnet has 8-12% higher first-run code success rate. Worth the premium for production code.
| Model | Cost per 1K functions | First-run Success |
|---|---|---|
| Claude Sonnet | $6.00 (2,000 tokens/req) | 76% |
| GPT-4o | $5.00 | 68% |
| GPT-4o-mini | $0.30 | 58% |
Long-Form Content Writing
Winner: GPT-4o ($2.50/1M input)
For blog posts, articles, and marketing copy, GPT-4o offers best quality-to-cost ratio. Claude Sonnet writes well but costs 20% more.
Complex Reasoning
Winner: o1-preview ($15/1M input) or Claude Opus ($15/1M)
For multi-step logic, mathematical reasoning, or edge case handling, premium models justify their cost. o1-preview excels at reasoning, Claude Opus at nuanced understanding.
Cost at Scale: Monthly Spend Examples
Scenario 1: Customer Support Chatbot (1M requests/month, 150 tokens avg)
| Model | Total Tokens | Monthly Cost |
|---|---|---|
| Gemini Flash | 150M | $11.25 |
| GPT-4o-mini | 150M | $22.50 |
| GPT-4o | 150M | $375.00 |
Recommendation: Use GPT-4o-mini for 80% of queries (FAQ), GPT-4o for 20% (complex issues). Blended cost: ~$90/month.
Scenario 2: Code Generation API (10K requests/month, 2K tokens avg)
| Model | Total Tokens | Monthly Cost |
|---|---|---|
| Claude Sonnet | 20M | $60 |
| GPT-4o | 20M | $50 |
| GPT-4o-mini | 20M | $3 |
Recommendation: Claude Sonnet worth $10/month premium for 8% higher success rate (fewer debugging cycles).
Scenario 3: Content Generation Service (5K articles/month, 1.5K words avg)
| Model | Total Tokens | Monthly Cost |
|---|---|---|
| GPT-4o | 10M | $25 |
| Claude Sonnet | 10M | $30 |
| Gemini Pro | 10M | $12.50 |
Recommendation: GPT-4o for best balance. Gemini Pro is cheaper but output quality is 10-15% lower for creative writing.
Hidden Costs to Consider
1. Context Window Costs
Longer context windows = higher input costs. If you're sending 50K tokens of context per request:
- GPT-4o: 50K × $2.50/1M = $0.125 per request
- Claude Sonnet: 50K × $3.00/1M = $0.15 per request
Solution: Use prompt compression to reduce unnecessary context.
2. Output Verbosity
Output tokens cost 4-5x more than input. If Model A produces 500 tokens and Model B produces 800 tokens for the same task, Model B costs 60% more even if input pricing is equal.
3. Retry Costs
If a model produces incorrect output 40% of the time (like GPT-4o-mini for code generation), you pay for retries:
- 1st attempt: $0.30
- 2nd attempt (40% probability): $0.12
- 3rd attempt (16% probability): $0.05
- Expected cost: $0.47
Compare to Claude Sonnet with 76% first-run success:
- 1st attempt: $6.00
- 2nd attempt (24% probability): $1.44
- Expected cost: $7.44
Claude Sonnet is still more expensive but the gap narrows from 20x to 16x when accounting for retries.
Money-Saving Tips
Tip #1: Use Model Tiers
Route 70-80% of tasks to cheap models, 15-20% to mid-tier, 5-10% to premium. Average cost drops 60-70%.
Tip #2: Compress Prompts
Reducing prompt from 2,000 to 800 tokens saves 60% on input costs.
Tip #3: Cache Repeated Context
If you're sending the same 10K token knowledge base with every request, use prompt caching (supported by Anthropic and OpenAI). First request pays full cost, subsequent requests pay 10% for cached content.
Tip #4: Batch Requests
OpenAI offers 50% discount for batch API requests with 24-hour SLA. If real-time isn't needed, use batch mode.
Optimize Costs with AI Gateway
AI Gateway automatically routes to the cheapest model for each task. Save 40-50% with intelligent routing across OpenAI, Anthropic, and Google.
Try Free for 14 Days →Related: Complete Guide to LLM Cost Optimization • Intelligent LLM Routing Guide