← Back to Blog

How to Cache Claude API Calls and Save 40%

January 29, 2026 12 min read
⚠️ Real Cost Impact: If you're calling Claude API 1,000 times/day with the same 500-token system prompt, you're burning $15/day ($450/month) on redundant API calls. This pattern will cut that to $9/day—a $270/month savings.

I just finished building ProspectEngine, a Clay.com alternative for B2B lead enrichment. During development, I hit a wall: our AI personalization pipeline was calling Claude API 10,000+ times per day, with costs spiraling to $500+/month.

The problem? Every API call sent the same 500-token system prompt. We were paying Anthropic to process identical instructions 10,000 times.

After implementing an AI Gateway caching pattern, we cut costs by 42% and reduced latency by 60%. Here's the exact system we built.

The Problem: Naive AI API Usage Burns Money

Most Make.com scenarios that call LLM APIs look like this:

1. HTTP Module → Call Claude API
2. Send request:
   {
     "system": "You are writing cold email openers for B2B sales...",
     "prompt": "Generate opener for {{contact.name}} at {{contact.company}}..."
   }
3. Parse response
4. Save to database

What's wrong with this?

But it gets worse. If you're generating similar content (e.g., 100 contacts at the same company), you're also making near-duplicate requests that could be cached.

Real Example: ProspectEngine Costs (Before Optimization)

Daily API Calls
12,000
Daily Cost
$36
Monthly Cost
$1,080
Cache Hit Rate
0%

For an early-stage SaaS, $1,080/month on AI was unsustainable. We needed a smarter pattern.

The Solution: AI Gateway Pattern

An AI Gateway sits between your Make.com scenario and the LLM API. It handles:

  1. Prompt Caching - Cache system prompts across requests
  2. Response Deduplication - Don't regenerate identical content
  3. Rate Limiting - Prevent bill shock from runaway scenarios
  4. Cost Tracking - Log every request for budget monitoring

Architecture Overview

┌──────────────┐
│  Make.com    │
│  Scenario    │
└──────┬───────┘
       │ HTTP Request
       ▼
┌─────────────────────────────────────────────┐
│         AI Gateway (Node.js + Redis)        │
│                                             │
│  1. Generate cache key from request         │
│  2. Check Redis for cached response         │
│  3. If MISS → Call Claude API               │
│  4. Store response in Redis (24h TTL)       │
│  5. Return response to Make.com             │
└─────────────────────────────────────────────┘
       │ Cached or Fresh
       ▼
┌──────────────┐
│  Response    │
│  (JSON)      │
└──────────────┘

Implementation: Step-by-Step

Step 1: Set Up Redis (Free Tier)

We use Upstash Redis (free tier: 10K requests/day, 256MB storage).

// Sign up at upstash.com
// Get your REDIS_URL (looks like: rediss://...@us1-xxx.upstash.io:6379)

// Install Redis client
npm install ioredis

Step 2: Build the AI Gateway (Node.js)

Deploy this as a Netlify Function or Vercel Serverless Function:

// netlify/functions/ai-gateway.js
const Redis = require('ioredis');
const crypto = require('crypto');

const redis = new Redis(process.env.REDIS_URL);

exports.handler = async (event) => {
  const body = JSON.parse(event.body);
  const { system, prompt, model = 'claude-3-5-sonnet-20241022' } = body;

  // Generate cache key from system + prompt
  const cacheKey = `ai:${crypto
    .createHash('md5')
    .update(`${system}:${prompt}`)
    .digest('hex')}`;

  // Check cache
  const cached = await redis.get(cacheKey);
  if (cached) {
    console.log('✅ CACHE HIT:', cacheKey);
    return {
      statusCode: 200,
      body: JSON.stringify({
        response: cached,
        cached: true,
        cost: 0 // No API call = $0
      })
    };
  }

  // CACHE MISS - Call Claude API
  console.log('❌ CACHE MISS - Calling Claude API');

  const apiResponse = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'x-api-key': process.env.ANTHROPIC_API_KEY,
      'anthropic-version': '2023-06-01',
      'content-type': 'application/json'
    },
    body: JSON.stringify({
      model,
      max_tokens: 1024,
      system,
      messages: [{ role: 'user', content: prompt }]
    })
  });

  const data = await apiResponse.json();
  const response = data.content[0].text;

  // Calculate cost
  const inputTokens = data.usage.input_tokens;
  const outputTokens = data.usage.output_tokens;
  const cost = (inputTokens * 0.003 / 1000) + (outputTokens * 0.015 / 1000);

  // Store in cache (24 hour TTL)
  await redis.set(cacheKey, response, 'EX', 86400);

  return {
    statusCode: 200,
    body: JSON.stringify({
      response,
      cached: false,
      cost,
      tokens: { input: inputTokens, output: outputTokens }
    })
  };
};

Step 3: Update Make.com to Use Gateway

In your Make.com scenario, replace direct Claude API calls with:

HTTP Module:
URL: https://your-site.netlify.app/.netlify/functions/ai-gateway
Method: POST
Body (JSON):
{
  "system": "You are writing cold email openers for B2B sales. Keep it 2-3 sentences, focus on pain points.",
  "prompt": "Generate opener for {{contact.name}} at {{contact.company}} in {{contact.industry}}"
}

The gateway returns:

{
  "response": "I noticed {{company}} is hiring 3 {{title}}s...",
  "cached": true,  // or false if API was called
  "cost": 0.0023   // $0 if cached
}

Advanced Optimizations

Optimization 1: Prompt Caching (Anthropic Native)

Anthropic now offers native prompt caching that reduces system prompt costs by 90%:

// Add to your gateway
body: JSON.stringify({
  model,
  max_tokens: 1024,
  system: [
    {
      type: 'text',
      text: system,
      cache_control: { type: 'ephemeral' }  // ← Cache this!
    }
  ],
  messages: [{ role: 'user', content: prompt }]
})

Pricing with prompt caching:

💡 Pro Tip: Combine Redis caching (for identical requests) with prompt caching (for same system prompt, different user prompts). This stacks savings—we went from $36/day to $9/day.

Optimization 2: Deduplication by Contact ID

If you're enriching 100 contacts at the same company, you might generate near-identical content. Add contact-level caching:

// More specific cache key
const cacheKey = `ai:${contactId}:${templateId}:${crypto
  .createHash('md5')
  .update(prompt)
  .digest('hex')}`;

This prevents regenerating the same opener for John Smith and Jane Doe at Acme Corp.

Optimization 3: Rate Limiting

Prevent runaway scenarios from draining your budget:

// Add to gateway
const dailyKey = `rate:${new Date().toISOString().split('T')[0]}`;
const dailyCount = await redis.incr(dailyKey);
await redis.expire(dailyKey, 86400);

if (dailyCount > 10000) {
  return {
    statusCode: 429,
    body: JSON.stringify({
      error: 'Daily rate limit exceeded (10,000 requests/day)'
    })
  };
}

Results: Before vs. After

Metric Before Gateway After Gateway Improvement
Daily API Calls 12,000 4,800 60% reduction
Cache Hit Rate 0% 65%
Daily Cost $36.00 $20.88 42% savings
Monthly Cost $1,080 $626 $454/month saved
Avg Response Time 2.3s 0.9s 61% faster

Cost Breakdown: Where Savings Come From

Without Caching (10,000 requests/day)

With Caching (65% hit rate)

📊 Key Insight: The higher your cache hit rate, the more you save. In production, we see 65-75% hit rates, but your mileage will vary based on how unique your prompts are.

When to Use This Pattern

✅ Great for:

❌ Not ideal for:

Common Pitfalls to Avoid

1. Cache Key Collisions

Don't use overly generic cache keys:

❌ Bad: ai:${prompt}
✅ Good: ai:${templateId}:${contactId}:${md5(prompt)}

2. Stale Cache Issues

If your prompts reference time-sensitive data, use shorter TTLs:

// For content with company news, use 6-hour TTL
await redis.set(cacheKey, response, 'EX', 21600);

3. Over-Caching Personalization

Don't cache content that should be unique per contact:

Monitoring & Debugging

Add logging to track cache performance:

// Log every request
console.log(JSON.stringify({
  timestamp: new Date().toISOString(),
  cacheKey,
  cached,
  cost,
  latency: Date.now() - startTime
}));

// Daily summary (run via cron)
const dailyStats = await redis.get('stats:daily');
console.log('Today:', {
  totalRequests: dailyStats.total,
  cacheHits: dailyStats.hits,
  cacheMisses: dailyStats.misses,
  hitRate: `${(dailyStats.hits / dailyStats.total * 100).toFixed(1)}%`,
  totalCost: `$${dailyStats.cost.toFixed(2)}`
});

Next-Level Optimization: Model Routing

For even more savings, route requests to cheaper models based on complexity:

function selectModel(prompt) {
  const wordCount = prompt.split(' ').length;

  if (wordCount < 50) {
    return 'claude-3-haiku-20240307';  // $0.00025/1K in
  } else if (wordCount < 200) {
    return 'claude-3-5-sonnet-20241022';  // $0.003/1K in
  } else {
    return 'claude-3-opus-20240229';  // $0.015/1K in
  }
}

Potential savings: If 60% of your prompts can use Haiku instead of Sonnet, that's another 30% cost reduction.

Want Us to Build This for You?

We implement AI Gateway patterns for agencies and SaaS companies running high-volume LLM workflows. Typical savings: 40-60% on AI costs.

Schedule a Strategy Call →

Conclusion

If you're running LLM-powered automation at scale, caching isn't optional—it's the difference between $1,000/month and $400/month.

The AI Gateway pattern is especially powerful for:

Action items:

  1. Audit your current LLM API usage (check Anthropic/OpenAI billing)
  2. Calculate potential savings (requests/day × 0.65 cache hit rate × cost/request)
  3. Deploy the AI Gateway pattern (copy the code above)
  4. Monitor cache hit rate and adjust TTLs
  5. Iterate on cache keys for better deduplication

In our case, this single optimization saved $454/month—enough to cover our entire Redis hosting, serverless functions, and still pocket $400.

Questions? Found a bug in the code? Email me: chris@resultantai.com

Related: How to Prevent AI Bill Shock in Make.com