← Back to Blog

LLM Pricing Comparison 2025: All Models

Which LLM is cheapest for your use case? It depends on the task. GPT-4o-mini is 16.7x cheaper than GPT-4o, but only for tasks that don't require advanced reasoning. This guide breaks down every major model's pricing and optimal use cases.

Last updated: December 2025. Prices change frequently—verify with provider before committing to production.

Complete Pricing Table

Provider Model Input ($/1M) Output ($/1M) Context
OpenAI GPT-4o-mini $0.15 $0.60 128K
GPT-4o $2.50 $10.00 128K
o1-preview $15.00 $60.00 128K
Anthropic Claude 3 Haiku $0.25 $1.25 200K
Claude 3.5 Sonnet $3.00 $15.00 200K
Claude 3 Opus $15.00 $75.00 200K
Google Gemini 1.5 Flash $0.075 $0.30 1M
Gemini 1.5 Pro $1.25 $5.00 2M
Gemini 2.0 Flash $0.10 $0.40 1M

Cheapest Model by Use Case

Classification & Extraction

Winner: Gemini 1.5 Flash ($0.075/1M input)

For simple tasks like "Is this email spam?" or "Extract name and email from this text," use the cheapest model. Gemini Flash is 2x cheaper than GPT-4o-mini and performs equivalently.

ModelCost per 1K requestsQuality
Gemini 1.5 Flash$0.15 (300 tokens/req)95%
GPT-4o-mini$0.3096%
Claude Haiku$0.5095%

Customer Support Q&A

Winner: GPT-4o-mini ($0.15/1M input)

For answering FAQ-style questions from a knowledge base, GPT-4o-mini offers best balance of cost and quality. Gemini Flash is cheaper but occasionally misses context.

Summarization

Winner: Gemini 2.0 Flash ($0.10/1M input) or GPT-4o-mini ($0.15/1M)

Both handle summarization well. Gemini is 33% cheaper but GPT-4o-mini has slightly better nuance.

Code Generation

Winner: Claude 3.5 Sonnet ($3/1M input)

Despite being 20x more expensive than GPT-4o-mini, Claude Sonnet has 8-12% higher first-run code success rate. Worth the premium for production code.

ModelCost per 1K functionsFirst-run Success
Claude Sonnet$6.00 (2,000 tokens/req)76%
GPT-4o$5.0068%
GPT-4o-mini$0.3058%

Long-Form Content Writing

Winner: GPT-4o ($2.50/1M input)

For blog posts, articles, and marketing copy, GPT-4o offers best quality-to-cost ratio. Claude Sonnet writes well but costs 20% more.

Complex Reasoning

Winner: o1-preview ($15/1M input) or Claude Opus ($15/1M)

For multi-step logic, mathematical reasoning, or edge case handling, premium models justify their cost. o1-preview excels at reasoning, Claude Opus at nuanced understanding.

Cost at Scale: Monthly Spend Examples

Scenario 1: Customer Support Chatbot (1M requests/month, 150 tokens avg)

ModelTotal TokensMonthly Cost
Gemini Flash150M$11.25
GPT-4o-mini150M$22.50
GPT-4o150M$375.00

Recommendation: Use GPT-4o-mini for 80% of queries (FAQ), GPT-4o for 20% (complex issues). Blended cost: ~$90/month.

Scenario 2: Code Generation API (10K requests/month, 2K tokens avg)

ModelTotal TokensMonthly Cost
Claude Sonnet20M$60
GPT-4o20M$50
GPT-4o-mini20M$3

Recommendation: Claude Sonnet worth $10/month premium for 8% higher success rate (fewer debugging cycles).

Scenario 3: Content Generation Service (5K articles/month, 1.5K words avg)

ModelTotal TokensMonthly Cost
GPT-4o10M$25
Claude Sonnet10M$30
Gemini Pro10M$12.50

Recommendation: GPT-4o for best balance. Gemini Pro is cheaper but output quality is 10-15% lower for creative writing.

Hidden Costs to Consider

1. Context Window Costs

Longer context windows = higher input costs. If you're sending 50K tokens of context per request:

Solution: Use prompt compression to reduce unnecessary context.

2. Output Verbosity

Output tokens cost 4-5x more than input. If Model A produces 500 tokens and Model B produces 800 tokens for the same task, Model B costs 60% more even if input pricing is equal.

3. Retry Costs

If a model produces incorrect output 40% of the time (like GPT-4o-mini for code generation), you pay for retries:

Compare to Claude Sonnet with 76% first-run success:

Claude Sonnet is still more expensive but the gap narrows from 20x to 16x when accounting for retries.

Money-Saving Tips

Tip #1: Use Model Tiers

Route 70-80% of tasks to cheap models, 15-20% to mid-tier, 5-10% to premium. Average cost drops 60-70%.

Tip #2: Compress Prompts

Reducing prompt from 2,000 to 800 tokens saves 60% on input costs.

Tip #3: Cache Repeated Context

If you're sending the same 10K token knowledge base with every request, use prompt caching (supported by Anthropic and OpenAI). First request pays full cost, subsequent requests pay 10% for cached content.

Tip #4: Batch Requests

OpenAI offers 50% discount for batch API requests with 24-hour SLA. If real-time isn't needed, use batch mode.

Optimize Costs with AI Gateway

AI Gateway automatically routes to the cheapest model for each task. Save 40-50% with intelligent routing across OpenAI, Anthropic, and Google.

Try Free for 14 Days →

Related: Complete Guide to LLM Cost OptimizationIntelligent LLM Routing Guide