LLM Pricing Comparison 2025: All Models

Pricing 8 min read • December 4, 2025

Which LLM is cheapest for your use case? It depends on the task. GPT-4o-mini is 16.7x cheaper than GPT-4o, but only for tasks that don't require advanced reasoning. This guide breaks down every major model's pricing and optimal use cases.

Last updated: December 2025. Prices change frequently—verify with provider before committing to production.

Complete Pricing Table

Provider	Model	Input ($/1M)	Output ($/1M)	Context
OpenAI	GPT-4o-mini	$0.15	$0.60	128K
	GPT-4o	$2.50	$10.00	128K
	o1-preview	$15.00	$60.00	128K
Anthropic	Claude 3 Haiku	$0.25	$1.25	200K
	Claude 3.5 Sonnet	$3.00	$15.00	200K
	Claude 3 Opus	$15.00	$75.00	200K
Google	Gemini 1.5 Flash	$0.075	$0.30	1M
	Gemini 1.5 Pro	$1.25	$5.00	2M
	Gemini 2.0 Flash	$0.10	$0.40	1M

Cheapest Model by Use Case

Classification & Extraction

Winner: Gemini 1.5 Flash ($0.075/1M input)

For simple tasks like "Is this email spam?" or "Extract name and email from this text," use the cheapest model. Gemini Flash is 2x cheaper than GPT-4o-mini and performs equivalently.

Model	Cost per 1K requests	Quality
Gemini 1.5 Flash	$0.15 (300 tokens/req)	95%
GPT-4o-mini	$0.30	96%
Claude Haiku	$0.50	95%

Customer Support Q&A

Winner: GPT-4o-mini ($0.15/1M input)

For answering FAQ-style questions from a knowledge base, GPT-4o-mini offers best balance of cost and quality. Gemini Flash is cheaper but occasionally misses context.

Summarization

Winner: Gemini 2.0 Flash ($0.10/1M input) or GPT-4o-mini ($0.15/1M)

Both handle summarization well. Gemini is 33% cheaper but GPT-4o-mini has slightly better nuance.

Code Generation

Winner: Claude 3.5 Sonnet ($3/1M input)

Despite being 20x more expensive than GPT-4o-mini, Claude Sonnet has 8-12% higher first-run code success rate. Worth the premium for production code.

Model	Cost per 1K functions	First-run Success
Claude Sonnet	$6.00 (2,000 tokens/req)	76%
GPT-4o	$5.00	68%
GPT-4o-mini	$0.30	58%

Long-Form Content Writing

Winner: GPT-4o ($2.50/1M input)

For blog posts, articles, and marketing copy, GPT-4o offers best quality-to-cost ratio. Claude Sonnet writes well but costs 20% more.

Complex Reasoning

Winner: o1-preview ($15/1M input) or Claude Opus ($15/1M)

For multi-step logic, mathematical reasoning, or edge case handling, premium models justify their cost. o1-preview excels at reasoning, Claude Opus at nuanced understanding.

Cost at Scale: Monthly Spend Examples

Scenario 1: Customer Support Chatbot (1M requests/month, 150 tokens avg)

Model	Total Tokens	Monthly Cost
Gemini Flash	150M	$11.25
GPT-4o-mini	150M	$22.50
GPT-4o	150M	$375.00

Recommendation: Use GPT-4o-mini for 80% of queries (FAQ), GPT-4o for 20% (complex issues). Blended cost: ~$90/month.

Scenario 2: Code Generation API (10K requests/month, 2K tokens avg)

Model	Total Tokens	Monthly Cost
Claude Sonnet	20M	$60
GPT-4o	20M	$50
GPT-4o-mini	20M	$3

Recommendation: Claude Sonnet worth $10/month premium for 8% higher success rate (fewer debugging cycles).

Scenario 3: Content Generation Service (5K articles/month, 1.5K words avg)

Model	Total Tokens	Monthly Cost
GPT-4o	10M	$25
Claude Sonnet	10M	$30
Gemini Pro	10M	$12.50

Recommendation: GPT-4o for best balance. Gemini Pro is cheaper but output quality is 10-15% lower for creative writing.

Hidden Costs to Consider

1. Context Window Costs

Longer context windows = higher input costs. If you're sending 50K tokens of context per request:

GPT-4o: 50K × $2.50/1M = $0.125 per request
Claude Sonnet: 50K × $3.00/1M = $0.15 per request

Solution: Use prompt compression to reduce unnecessary context.

2. Output Verbosity

Output tokens cost 4-5x more than input. If Model A produces 500 tokens and Model B produces 800 tokens for the same task, Model B costs 60% more even if input pricing is equal.

3. Retry Costs

If a model produces incorrect output 40% of the time (like GPT-4o-mini for code generation), you pay for retries:

1st attempt: $0.30
2nd attempt (40% probability): $0.12
3rd attempt (16% probability): $0.05
Expected cost: $0.47

Compare to Claude Sonnet with 76% first-run success:

1st attempt: $6.00
2nd attempt (24% probability): $1.44
Expected cost: $7.44

Claude Sonnet is still more expensive but the gap narrows from 20x to 16x when accounting for retries.

Money-Saving Tips

Tip #1: Use Model Tiers

Route 70-80% of tasks to cheap models, 15-20% to mid-tier, 5-10% to premium. Average cost drops 60-70%.

Tip #2: Compress Prompts

Reducing prompt from 2,000 to 800 tokens saves 60% on input costs.

Tip #3: Cache Repeated Context

If you're sending the same 10K token knowledge base with every request, use prompt caching (supported by Anthropic and OpenAI). First request pays full cost, subsequent requests pay 10% for cached content.

Tip #4: Batch Requests

OpenAI offers 50% discount for batch API requests with 24-hour SLA. If real-time isn't needed, use batch mode.

Optimize Costs with AI Gateway

AI Gateway automatically routes to the cheapest model for each task. Save 40-50% with intelligent routing across OpenAI, Anthropic, and Google.

Try Free for 14 Days →