I just finished building ProspectEngine, a Clay.com alternative for B2B lead enrichment. During development, I hit a wall: our AI personalization pipeline was calling Claude API 10,000+ times per day, with costs spiraling to $500+/month.
The problem? Every API call sent the same 500-token system prompt. We were paying Anthropic to process identical instructions 10,000 times.
After implementing an AI Gateway caching pattern, we cut costs by 42% and reduced latency by 60%. Here's the exact system we built.
Most Make.com scenarios that call LLM APIs look like this:
1. HTTP Module → Call Claude API
2. Send request:
{
"system": "You are writing cold email openers for B2B sales...",
"prompt": "Generate opener for {{contact.name}} at {{contact.company}}..."
}
3. Parse response
4. Save to database
What's wrong with this?
But it gets worse. If you're generating similar content (e.g., 100 contacts at the same company), you're also making near-duplicate requests that could be cached.
For an early-stage SaaS, $1,080/month on AI was unsustainable. We needed a smarter pattern.
An AI Gateway sits between your Make.com scenario and the LLM API. It handles:
┌──────────────┐
│ Make.com │
│ Scenario │
└──────┬───────┘
│ HTTP Request
▼
┌─────────────────────────────────────────────┐
│ AI Gateway (Node.js + Redis) │
│ │
│ 1. Generate cache key from request │
│ 2. Check Redis for cached response │
│ 3. If MISS → Call Claude API │
│ 4. Store response in Redis (24h TTL) │
│ 5. Return response to Make.com │
└─────────────────────────────────────────────┘
│ Cached or Fresh
▼
┌──────────────┐
│ Response │
│ (JSON) │
└──────────────┘
We use Upstash Redis (free tier: 10K requests/day, 256MB storage).
// Sign up at upstash.com
// Get your REDIS_URL (looks like: rediss://...@us1-xxx.upstash.io:6379)
// Install Redis client
npm install ioredis
Deploy this as a Netlify Function or Vercel Serverless Function:
// netlify/functions/ai-gateway.js
const Redis = require('ioredis');
const crypto = require('crypto');
const redis = new Redis(process.env.REDIS_URL);
exports.handler = async (event) => {
const body = JSON.parse(event.body);
const { system, prompt, model = 'claude-3-5-sonnet-20241022' } = body;
// Generate cache key from system + prompt
const cacheKey = `ai:${crypto
.createHash('md5')
.update(`${system}:${prompt}`)
.digest('hex')}`;
// Check cache
const cached = await redis.get(cacheKey);
if (cached) {
console.log('✅ CACHE HIT:', cacheKey);
return {
statusCode: 200,
body: JSON.stringify({
response: cached,
cached: true,
cost: 0 // No API call = $0
})
};
}
// CACHE MISS - Call Claude API
console.log('❌ CACHE MISS - Calling Claude API');
const apiResponse = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'x-api-key': process.env.ANTHROPIC_API_KEY,
'anthropic-version': '2023-06-01',
'content-type': 'application/json'
},
body: JSON.stringify({
model,
max_tokens: 1024,
system,
messages: [{ role: 'user', content: prompt }]
})
});
const data = await apiResponse.json();
const response = data.content[0].text;
// Calculate cost
const inputTokens = data.usage.input_tokens;
const outputTokens = data.usage.output_tokens;
const cost = (inputTokens * 0.003 / 1000) + (outputTokens * 0.015 / 1000);
// Store in cache (24 hour TTL)
await redis.set(cacheKey, response, 'EX', 86400);
return {
statusCode: 200,
body: JSON.stringify({
response,
cached: false,
cost,
tokens: { input: inputTokens, output: outputTokens }
})
};
};
In your Make.com scenario, replace direct Claude API calls with:
HTTP Module:
URL: https://your-site.netlify.app/.netlify/functions/ai-gateway
Method: POST
Body (JSON):
{
"system": "You are writing cold email openers for B2B sales. Keep it 2-3 sentences, focus on pain points.",
"prompt": "Generate opener for {{contact.name}} at {{contact.company}} in {{contact.industry}}"
}
The gateway returns:
{
"response": "I noticed {{company}} is hiring 3 {{title}}s...",
"cached": true, // or false if API was called
"cost": 0.0023 // $0 if cached
}
Anthropic now offers native prompt caching that reduces system prompt costs by 90%:
// Add to your gateway
body: JSON.stringify({
model,
max_tokens: 1024,
system: [
{
type: 'text',
text: system,
cache_control: { type: 'ephemeral' } // ← Cache this!
}
],
messages: [{ role: 'user', content: prompt }]
})
Pricing with prompt caching:
If you're enriching 100 contacts at the same company, you might generate near-identical content. Add contact-level caching:
// More specific cache key
const cacheKey = `ai:${contactId}:${templateId}:${crypto
.createHash('md5')
.update(prompt)
.digest('hex')}`;
This prevents regenerating the same opener for John Smith and Jane Doe at Acme Corp.
Prevent runaway scenarios from draining your budget:
// Add to gateway
const dailyKey = `rate:${new Date().toISOString().split('T')[0]}`;
const dailyCount = await redis.incr(dailyKey);
await redis.expire(dailyKey, 86400);
if (dailyCount > 10000) {
return {
statusCode: 429,
body: JSON.stringify({
error: 'Daily rate limit exceeded (10,000 requests/day)'
})
};
}
| Metric | Before Gateway | After Gateway | Improvement |
|---|---|---|---|
| Daily API Calls | 12,000 | 4,800 | 60% reduction |
| Cache Hit Rate | 0% | 65% | — |
| Daily Cost | $36.00 | $20.88 | 42% savings |
| Monthly Cost | $1,080 | $626 | $454/month saved |
| Avg Response Time | 2.3s | 0.9s | 61% faster |
Don't use overly generic cache keys:
ai:${prompt}ai:${templateId}:${contactId}:${md5(prompt)}
If your prompts reference time-sensitive data, use shorter TTLs:
// For content with company news, use 6-hour TTL
await redis.set(cacheKey, response, 'EX', 21600);
Don't cache content that should be unique per contact:
Add logging to track cache performance:
// Log every request
console.log(JSON.stringify({
timestamp: new Date().toISOString(),
cacheKey,
cached,
cost,
latency: Date.now() - startTime
}));
// Daily summary (run via cron)
const dailyStats = await redis.get('stats:daily');
console.log('Today:', {
totalRequests: dailyStats.total,
cacheHits: dailyStats.hits,
cacheMisses: dailyStats.misses,
hitRate: `${(dailyStats.hits / dailyStats.total * 100).toFixed(1)}%`,
totalCost: `$${dailyStats.cost.toFixed(2)}`
});
For even more savings, route requests to cheaper models based on complexity:
function selectModel(prompt) {
const wordCount = prompt.split(' ').length;
if (wordCount < 50) {
return 'claude-3-haiku-20240307'; // $0.00025/1K in
} else if (wordCount < 200) {
return 'claude-3-5-sonnet-20241022'; // $0.003/1K in
} else {
return 'claude-3-opus-20240229'; // $0.015/1K in
}
}
Potential savings: If 60% of your prompts can use Haiku instead of Sonnet, that's another 30% cost reduction.
We implement AI Gateway patterns for agencies and SaaS companies running high-volume LLM workflows. Typical savings: 40-60% on AI costs.
Schedule a Strategy Call →If you're running LLM-powered automation at scale, caching isn't optional—it's the difference between $1,000/month and $400/month.
The AI Gateway pattern is especially powerful for:
Action items:
In our case, this single optimization saved $454/month—enough to cover our entire Redis hosting, serverless functions, and still pocket $400.
Questions? Found a bug in the code? Email me: chris@resultantai.com