How to Cache Claude API Calls and Save 40%

AI Automation • January 29, 2026 • 12 min read

⚠️ Real Cost Impact: If you're calling Claude API 1,000 times/day with the same 500-token system prompt, you're burning $15/day ($450/month) on redundant API calls. This pattern will cut that to $9/day—a $270/month savings.

I just finished building ProspectEngine, a Clay.com alternative for B2B lead enrichment. During development, I hit a wall: our AI personalization pipeline was calling Claude API 10,000+ times per day, with costs spiraling to $500+/month.

The problem? Every API call sent the same 500-token system prompt. We were paying Anthropic to process identical instructions 10,000 times.

After implementing an AI Gateway caching pattern, we cut costs by 42% and reduced latency by 60%. Here's the exact system we built.

The Problem: Naive AI API Usage Burns Money

Most Make.com scenarios that call LLM APIs look like this:

1. HTTP Module → Call Claude API
2. Send request:
   {
     "system": "You are writing cold email openers for B2B sales...",
     "prompt": "Generate opener for {{contact.name}} at {{contact.company}}..."
   }
3. Parse response
4. Save to database

What's wrong with this?

The 500-token system prompt is sent every single time
Anthropic charges for input tokens: $0.003 per 1K tokens
1,000 calls/day = 500,000 system tokens = $1.50/day wasted on redundant context
Over a month: $45 burned on repeated instructions

But it gets worse. If you're generating similar content (e.g., 100 contacts at the same company), you're also making near-duplicate requests that could be cached.

Real Example: ProspectEngine Costs (Before Optimization)

Daily API Calls

12,000

Daily Cost

$36

Monthly Cost

$1,080

Cache Hit Rate

For an early-stage SaaS, $1,080/month on AI was unsustainable. We needed a smarter pattern.

The Solution: AI Gateway Pattern

An AI Gateway sits between your Make.com scenario and the LLM API. It handles:

Prompt Caching - Cache system prompts across requests
Response Deduplication - Don't regenerate identical content
Rate Limiting - Prevent bill shock from runaway scenarios
Cost Tracking - Log every request for budget monitoring

Architecture Overview

┌──────────────┐
│  Make.com    │
│  Scenario    │
└──────┬───────┘
       │ HTTP Request
       ▼
┌─────────────────────────────────────────────┐
│         AI Gateway (Node.js + Redis)        │
│                                             │
│  1. Generate cache key from request         │
│  2. Check Redis for cached response         │
│  3. If MISS → Call Claude API               │
│  4. Store response in Redis (24h TTL)       │
│  5. Return response to Make.com             │
└─────────────────────────────────────────────┘
       │ Cached or Fresh
       ▼
┌──────────────┐
│  Response    │
│  (JSON)      │
└──────────────┘

Implementation: Step-by-Step

Step 1: Set Up Redis (Free Tier)

We use Upstash Redis (free tier: 10K requests/day, 256MB storage).

// Sign up at upstash.com
// Get your REDIS_URL (looks like: rediss://...@us1-xxx.upstash.io:6379)

// Install Redis client
npm install ioredis

Step 2: Build the AI Gateway (Node.js)

Deploy this as a Netlify Function or Vercel Serverless Function:

// netlify/functions/ai-gateway.js
const Redis = require('ioredis');
const crypto = require('crypto');

const redis = new Redis(process.env.REDIS_URL);

exports.handler = async (event) => {
  const body = JSON.parse(event.body);
  const { system, prompt, model = 'claude-3-5-sonnet-20241022' } = body;

  // Generate cache key from system + prompt
  const cacheKey = `ai:${crypto
    .createHash('md5')
    .update(`${system}:${prompt}`)
    .digest('hex')}`;

  // Check cache
  const cached = await redis.get(cacheKey);
  if (cached) {
    console.log('✅ CACHE HIT:', cacheKey);
    return {
      statusCode: 200,
      body: JSON.stringify({
        response: cached,
        cached: true,
        cost: 0 // No API call = $0
      })
    };
  }

  // CACHE MISS - Call Claude API
  console.log('❌ CACHE MISS - Calling Claude API');

  const apiResponse = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'x-api-key': process.env.ANTHROPIC_API_KEY,
      'anthropic-version': '2023-06-01',
      'content-type': 'application/json'
    },
    body: JSON.stringify({
      model,
      max_tokens: 1024,
      system,
      messages: [{ role: 'user', content: prompt }]
    })
  });

  const data = await apiResponse.json();
  const response = data.content[0].text;

  // Calculate cost
  const inputTokens = data.usage.input_tokens;
  const outputTokens = data.usage.output_tokens;
  const cost = (inputTokens * 0.003 / 1000) + (outputTokens * 0.015 / 1000);

  // Store in cache (24 hour TTL)
  await redis.set(cacheKey, response, 'EX', 86400);

  return {
    statusCode: 200,
    body: JSON.stringify({
      response,
      cached: false,
      cost,
      tokens: { input: inputTokens, output: outputTokens }
    })
  };
};

Step 3: Update Make.com to Use Gateway

In your Make.com scenario, replace direct Claude API calls with:

HTTP Module:
URL: https://your-site.netlify.app/.netlify/functions/ai-gateway
Method: POST
Body (JSON):
{
  "system": "You are writing cold email openers for B2B sales. Keep it 2-3 sentences, focus on pain points.",
  "prompt": "Generate opener for {{contact.name}} at {{contact.company}} in {{contact.industry}}"
}

The gateway returns:

{
  "response": "I noticed {{company}} is hiring 3 {{title}}s...",
  "cached": true,  // or false if API was called
  "cost": 0.0023   // $0 if cached
}

Advanced Optimizations

Optimization 1: Prompt Caching (Anthropic Native)

Anthropic now offers native prompt caching that reduces system prompt costs by 90%:

// Add to your gateway
body: JSON.stringify({
  model,
  max_tokens: 1024,
  system: [
    {
      type: 'text',
      text: system,
      cache_control: { type: 'ephemeral' }  // ← Cache this!
    }
  ],
  messages: [{ role: 'user', content: prompt }]
})

Pricing with prompt caching:

First call: $0.003/1K tokens (normal rate)
Cached reads: $0.0003/1K tokens (10x cheaper!)
Cache lifetime: 5 minutes

💡 Pro Tip: Combine Redis caching (for identical requests) with prompt caching (for same system prompt, different user prompts). This stacks savings—we went from $36/day to $9/day.

Optimization 2: Deduplication by Contact ID

If you're enriching 100 contacts at the same company, you might generate near-identical content. Add contact-level caching:

// More specific cache key
const cacheKey = `ai:${contactId}:${templateId}:${crypto
  .createHash('md5')
  .update(prompt)
  .digest('hex')}`;

This prevents regenerating the same opener for John Smith and Jane Doe at Acme Corp.

Optimization 3: Rate Limiting

Prevent runaway scenarios from draining your budget:

// Add to gateway
const dailyKey = `rate:${new Date().toISOString().split('T')[0]}`;
const dailyCount = await redis.incr(dailyKey);
await redis.expire(dailyKey, 86400);

if (dailyCount > 10000) {
  return {
    statusCode: 429,
    body: JSON.stringify({
      error: 'Daily rate limit exceeded (10,000 requests/day)'
    })
  };
}

Results: Before vs. After

Metric	Before Gateway	After Gateway	Improvement
Daily API Calls	12,000	4,800	60% reduction
Cache Hit Rate	0%	65%	—
Daily Cost	$36.00	$20.88	42% savings
Monthly Cost	$1,080	$626	$454/month saved
Avg Response Time	2.3s	0.9s	61% faster

Cost Breakdown: Where Savings Come From

Without Caching (10,000 requests/day)

System prompt: 500 tokens × 10,000 requests = 5M tokens/day
User prompts: 200 tokens avg × 10,000 = 2M tokens/day
Output tokens: 150 tokens avg × 10,000 = 1.5M tokens/day
Total input cost: 7M × $0.003/1K = $21/day
Total output cost: 1.5M × $0.015/1K = $22.50/day
Daily total: $43.50

With Caching (65% hit rate)

Cached requests: 6,500 (cost: $0)
Fresh requests: 3,500
System prompt (cached): 500 × 3,500 × $0.0003/1K = $0.52/day
User prompts: 200 × 3,500 × $0.003/1K = $2.10/day
Output: 150 × 3,500 × $0.015/1K = $7.88/day
Daily total: $10.50
Savings: $33/day = $990/month

📊 Key Insight: The higher your cache hit rate, the more you save. In production, we see 65-75% hit rates, but your mileage will vary based on how unique your prompts are.

When to Use This Pattern

✅ Great for:

Batch processing: Enriching 100+ contacts with similar prompts
Template-based content: Email openers, social media posts
Repeated workflows: Daily/weekly automation scenarios
Multi-client agencies: Same prompts across different client data

❌ Not ideal for:

Highly unique prompts: Every request is different (0% cache hit rate)
Real-time chat: Low volume, latency-sensitive
Time-sensitive content: Need fresh responses every time

Common Pitfalls to Avoid

1. Cache Key Collisions

Don't use overly generic cache keys:

❌ Bad: ai:${prompt}
✅ Good: ai:${templateId}:${contactId}:${md5(prompt)}

2. Stale Cache Issues

If your prompts reference time-sensitive data, use shorter TTLs:

// For content with company news, use 6-hour TTL
await redis.set(cacheKey, response, 'EX', 21600);

3. Over-Caching Personalization

Don't cache content that should be unique per contact:

Cache: "Pain point for {industry}" (same for all landscaping companies)
Don't cache: "Opener for {name} at {company}" (unique per contact)

Monitoring & Debugging

Add logging to track cache performance:

// Log every request
console.log(JSON.stringify({
  timestamp: new Date().toISOString(),
  cacheKey,
  cached,
  cost,
  latency: Date.now() - startTime
}));

// Daily summary (run via cron)
const dailyStats = await redis.get('stats:daily');
console.log('Today:', {
  totalRequests: dailyStats.total,
  cacheHits: dailyStats.hits,
  cacheMisses: dailyStats.misses,
  hitRate: `${(dailyStats.hits / dailyStats.total * 100).toFixed(1)}%`,
  totalCost: `$${dailyStats.cost.toFixed(2)}`
});

Next-Level Optimization: Model Routing

For even more savings, route requests to cheaper models based on complexity:

function selectModel(prompt) {
  const wordCount = prompt.split(' ').length;

  if (wordCount < 50) {
    return 'claude-3-haiku-20240307';  // $0.00025/1K in
  } else if (wordCount < 200) {
    return 'claude-3-5-sonnet-20241022';  // $0.003/1K in
  } else {
    return 'claude-3-opus-20240229';  // $0.015/1K in
  }
}

Potential savings: If 60% of your prompts can use Haiku instead of Sonnet, that's another 30% cost reduction.

Want Us to Build This for You?

We implement AI Gateway patterns for agencies and SaaS companies running high-volume LLM workflows. Typical savings: 40-60% on AI costs.

Schedule a Strategy Call →

Conclusion

If you're running LLM-powered automation at scale, caching isn't optional—it's the difference between $1,000/month and $400/month.

The AI Gateway pattern is especially powerful for:

B2B lead enrichment (ProspectEngine use case)
Content generation pipelines
Customer support automation
Data extraction workflows

Action items:

Audit your current LLM API usage (check Anthropic/OpenAI billing)
Calculate potential savings (requests/day × 0.65 cache hit rate × cost/request)
Deploy the AI Gateway pattern (copy the code above)
Monitor cache hit rate and adjust TTLs
Iterate on cache keys for better deduplication

In our case, this single optimization saved $454/month—enough to cover our entire Redis hosting, serverless functions, and still pocket $400.

Questions? Found a bug in the code? Email me: chris@resultantai.com