API & HTTP Caching: Cache-Control, ETags, and the 30x-Faster API That Costs Nothing Extra
If your SaaS has APIs serving the same data to many users in 2026, every uncached request is wasted compute. The naive implementation: every request hits the DB; every page generates fresh; CDN caches nothing because Cache-Control isn't set. The result: slow APIs, expensive scaling, blown budgets. The fix isn't more servers — it's HTTP caching done right. Cache-Control headers + ETag conditional GET + stale-while-revalidate gets you 30-100x performance improvements at zero infrastructure cost. Most indie SaaS leaves this on the table; mid-market often gets it half-right; only the disciplined teams ship it correctly.
A working HTTP-caching strategy answers: which responses are cacheable (mostly GETs of public / semi-public data), what TTL (seconds for fresh / minutes for stale-OK / hours for static), how to invalidate (Cache-Tag + invalidate on writes), how to handle authenticated content (private cache; vary headers), how to use ETags (conditional GET; saves bandwidth), and how to debug (Cache-Status header from CDN).
This guide is the implementation playbook for HTTP / API caching. Companion to Caching Strategies (Redis/DB), Performance Optimization, HTTP Retry & Backoff, and Multi-region Deployment.
Why HTTP Caching Matters
Get the value clear first.
Help me understand the impact.
The economics:
**Without caching**:
- Every request hits origin
- Origin scales linearly with traffic
- Slow at distance from server (200-500ms)
- Cost per request: full server compute
**With HTTP caching**:
- 80-99% of requests served from CDN edge
- Origin handles only cache-misses
- 10-30ms TTFB (CDN edge close to user)
- Cost: near-zero per cached request
**Real numbers**:
For a typical SaaS:
- Without cache: $500-2000/mo on origin compute for 10M req/mo
- With aggressive caching: $50-200/mo (10-20%)
Plus latency: 200ms → 30ms perceived = users feel app is faster.
**Where caching makes the difference**:
- Marketing pages (always cacheable)
- Public API endpoints (often cacheable)
- Status / health endpoints
- Image / asset delivery
- API responses with public-but-changes-occasionally data
- Long-poll / streaming endpoints (with care)
**Where caching can't help**:
- Real-time / live data
- Per-user authenticated content (private cache only)
- Mutations (POST / PUT / DELETE)
For my product: [endpoints]
Output:
1. Cacheable endpoint inventory
2. Cost estimate (without caching)
3. Caching priority
The biggest unforced error: shipping with no Cache-Control headers. CDNs default to "don't cache." Origin handles every request. Performance + cost both suffer.
Cache-Control Header: The Foundation
Help me set Cache-Control right.
The directives:
**public**:
Response can be cached by ANY cache (CDN; browser; intermediate proxy).
Use for: public data; non-personalized.
**private**:
Response can be cached by browser only; not shared caches.
Use for: per-user data; user-specific responses.
**no-store**:
Don't cache anywhere.
Use for: sensitive data; one-time tokens.
**no-cache**:
Cache, but revalidate (ETag) before using.
Use for: data that changes; want freshness check.
**max-age=[seconds]**:
How long the cache can use before refreshing.
**s-maxage=[seconds]**:
Like max-age but for shared caches (CDN). Lets you set different TTLs for browser vs CDN.
**stale-while-revalidate=[seconds]**:
Use stale content while revalidating in background. Massive UX win.
**stale-if-error=[seconds]**:
If origin fails, use stale content (gracefully degrade).
**must-revalidate**:
Don't use stale beyond max-age.
**immutable**:
Tells browser this won't change (asset with hash in URL); skip revalidation.
**The standard recipes**:
**Static asset** (JS / CSS bundle with hash):
Cache-Control: public, max-age=31536000, immutable
1 year + immutable. Hash changes filename when content changes.
**Public API response** (changes every minute):
Cache-Control: public, s-maxage=60, stale-while-revalidate=86400
Cache 60s; serve stale up to 24h while revalidating.
**Page (HTML; changes occasionally)**:
Cache-Control: public, s-maxage=300, stale-while-revalidate=86400
**User-specific API response**:
Cache-Control: private, max-age=0, must-revalidate
Browser-only; revalidate every time.
**Sensitive data** (tokens / personal info):
Cache-Control: no-store
For my responses: [audit per endpoint]
Output:
1. Per-endpoint Cache-Control recipe
2. Static vs dynamic split
3. Audit script
The single most-impactful header: stale-while-revalidate. Users get instant response from cache; revalidation happens in background; next user gets fresh. Massive UX + perceived performance win.
ETag: Conditional GET
Help me use ETags.
The pattern:
Server sends response with ETag header (hash of content):
HTTP/1.1 200 OK ETag: "abc123" Content-Length: 5000 [body]
Client caches response with ETag.
On next request, client sends:
GET /api/data If-None-Match: "abc123"
Server compares. If unchanged:
HTTP/1.1 304 Not Modified ETag: "abc123" [no body]
Client uses cached body. Saved: bandwidth (no body transfer); database query (often).
**Implementation**:
```typescript
// Generate ETag (often hash of content or version)
const data = await db.query(...);
const etag = `"${crypto.createHash('sha256').update(JSON.stringify(data)).digest('hex')}"`;
// Check If-None-Match
const ifNoneMatch = req.headers.get('if-none-match');
if (ifNoneMatch === etag) {
return new Response(null, { status: 304, headers: { ETag: etag } });
}
return new Response(JSON.stringify(data), {
status: 200,
headers: {
'ETag': etag,
'Cache-Control': 'public, max-age=60',
'Content-Type': 'application/json',
},
});
Smart ETag generation:
For data that's expensive to compute:
- Use updated_at or version column from DB as ETag
- Skip the body-generation if If-None-Match matches
const lastUpdated = await db.query('SELECT max(updated_at) FROM table');
const etag = `"${lastUpdated.toISOString()}"`;
if (req.headers.get('if-none-match') === etag) {
return 304; // Save full data fetch
}
const data = await fullQuery();
return data with ETag;
This makes ETag check ultra-fast (single timestamp query); body fetch only on change.
Last-Modified alternative:
Last-Modified: Tue, 30 Apr 2026 10:00:00 GMT
Client returns:
If-Modified-Since: Tue, 30 Apr 2026 10:00:00 GMT
Same idea; second-precision; less granular than ETag.
For my API: [audit]
Output:
- ETag strategy
- Implementation
- Backend optimization
The non-obvious win: **ETag comparison BEFORE expensive query**. Client cache hit → 304 in 5ms. Without ETag: full database query + serialization + response = 200ms. 40x improvement on cache hits.
## CDN Caching: Cloudflare / Vercel / Fastly
Help me set up CDN caching.
The 2026 CDN landscape:
Cloudflare:
- Largest CDN; free tier robust
- Cache-Control respect: yes (in defaults; Cache Rules customize)
- Cache Tags / Purge: paid plans
- Edge Cache + Browser Cache TTL configurable
Vercel CDN:
- Bundled with Vercel deployments
- Global edge network
- Cache-Control respect: yes
- Cache Tags + on-demand invalidation: built-in
- ISR (Incremental Static Regeneration): Next.js-native
Fastly:
- Premium CDN; advanced features
- VCL (Varnish Configuration Language): full control
- Cache Tags: native
- Real-time invalidation
AWS CloudFront:
- Enterprise; flexible; AWS-locked
- Cache behaviors per path
- Invalidation: usually slow (5-15 min); paid
The Vercel CDN pattern:
// Next.js Route Handler
export async function GET() {
const data = await fetchData();
return Response.json(data, {
headers: {
'Cache-Control': 'public, s-maxage=60, stale-while-revalidate=86400',
'Vercel-CDN-Cache-Control': 'public, s-maxage=60', // Vercel-specific
},
});
}
The "Cache-Tag" pattern (Cloudflare / Fastly / Vercel):
Tag responses for grouped invalidation:
Cache-Tag: products,category-electronics,user-12345
When data changes:
// On product update
await invalidateCacheByTag('products');
// All product responses purge instantly
Vercel: revalidateTag('products') API.
Cloudflare: Purge by Tag (Enterprise tier).
Fastly: native.
Edge-side CDN-only TTL:
Set s-maxage higher than max-age:
Cache-Control: public, max-age=0, s-maxage=3600
Browser revalidates every time; CDN caches 1 hour. Maximum CDN benefit; users always get fresh from CDN.
For my CDN: [pick]
Output:
- CDN config
- Cache-Tag strategy
- Invalidation flow
The 2026 default for Vercel apps: **`s-maxage` + `stale-while-revalidate` + `Cache-Tag` for invalidation**. Three-line change; massive performance win.
## Authenticated Content: Vary Headers + Private Cache
Help me cache authenticated content.
The challenge: response varies by user. Naive cache = wrong user gets another user's data.
The solutions:
Option 1: Don't cache authenticated
Simplest:
Cache-Control: private, max-age=0
Browser caches; CDN doesn't. User gets cached locally; no shared-cache issues.
Option 2: Vary header
If different responses per user, tell CDN to vary:
Vary: Authorization
CDN caches separately per Authorization value. But: usually defeats CDN benefit (every user is unique). Use rarely.
Option 3: Cache by segment
Cache by user-segment:
Vary: X-User-Segment
If you compute X-User-Segment header (free / pro / enterprise), CDN caches 3 versions. Useful for tier-based pricing pages.
Option 4: ESI / per-user fragments
Cache the page; insert per-user data via Edge Side Includes (ESI) or React Suspense:
- Static page cached at edge
- Dynamic per-user component fetched separately
- Combined client-side
Modern Next.js handles this natively (Server Components + Suspense + dynamic = hybrid).
Option 5: Token-aware private cache
Cache-Control: private, max-age=300
Browser caches per-user; can use for 5 min. CDN doesn't.
User gets fast subsequent requests; first hit goes to origin.
For my auth: [scope]
Output:
- Per-endpoint strategy
- Vary if needed
- ESI / Suspense pattern
The discipline: **default to `private` for authenticated**. Most authenticated content benefits from browser cache + per-user. Don't fight the protocol.
## Stale-While-Revalidate: Magic for UX
Help me use SWR.
The pattern (Cache-Control level):
Cache-Control: public, s-maxage=60, stale-while-revalidate=86400
Behavior:
- 0-60 seconds: serve from cache (fast)
- 60s-24h: serve stale immediately + revalidate in background
- 24h+: cache miss; fetch from origin
The user-perceived effect:
- Always fast (cache hit); never blocking on origin
- Eventually fresh (background revalidation)
Where SWR shines:
- Marketing pages (changes occasionally; freshness within day OK)
- Public API responses (counts; lists; aggregations)
- Pricing pages (don't need millisecond freshness)
- Dashboard data (background-refresh feels modern)
Where SWR doesn't fit:
- Real-time data (chat / live feeds)
- Time-sensitive data (auction prices)
- Per-second-fresh requirements
Frontend SWR (TanStack Query / SWR library):
Server-side SWR (Cache-Control header) and client-side SWR (TanStack Query / SWR library) are different layers; both improve UX:
// Client-side
const { data } = useSWR('/api/products', fetcher, {
refreshInterval: 60_000,
revalidateOnFocus: true,
});
Combined with server-side SWR: client uses cached data; revalidates from CDN; CDN serves cached or revalidates from origin. Layered cache.
For my endpoints: [SWR strategy]
Output:
- Per-endpoint SWR config
- Client + server layers
- UX expectations
The win that compounds: **stale-while-revalidate at the CDN level**. Users always fast; data is always fresh-eventually. Best of both worlds.
## Cache Invalidation: The Hardest Problem
Help me invalidate cache.
The two strategies:
Time-based (TTL):
Set short TTL; cache expires; eventually fresh.
Pros: simple Cons: stale window; over-fetching after expiry
Use for: data where stale-OK (marketing pages)
Event-based (Tag invalidation):
Tag responses; invalidate when underlying data changes.
// On product update
await db.product.update({ id, name });
await revalidateTag(`product-${id}`);
Pros: instant freshness; long TTLs OK Cons: requires invalidation discipline; coupled to writes
Use for: data where freshness matters
Hybrid (recommended):
Cache-Control: public, s-maxage=3600, stale-while-revalidate=86400
Cache-Tag: product-123, products
- Long TTL with SWR (handles slow writes)
- Tag invalidation on writes (handles fast updates)
Best of both.
Invalidation patterns:
- On write:
revalidateTagimmediately after DB update - Cron: refresh popular content periodically
- On demand: API endpoint to trigger invalidation
- Webhook: external system signals invalidation
Common invalidation bugs:
- Forgetting to invalidate (stale data)
- Over-invalidating (cache thrash)
- Invalidating wrong tag (still stale)
- Race conditions (read after write before invalidation)
For my invalidation: [audit]
Output:
- Tag strategy
- Invalidation triggers
- Testing pattern
The discipline: **invalidate on write, with tags**. Without it, you're choosing between fresh-but-slow and stale-but-fast. Tag invalidation gives you both.
## Debugging Cache: Cache-Status Header
Help me debug caching.
The debug tools:
Cache-Status header (RFC 9211):
CDN returns:
Cache-Status: Vercel; hit, Cloudflare; hit
Or:
Cache-Status: Vercel; miss, Cloudflare; expired; key=...
Tells you:
- Did edge cache hit / miss?
- Which cache?
- Why missed (expired / stale / no-cache)?
Per-CDN headers:
- Cloudflare:
cf-cache-status - Vercel:
x-vercel-cache - Fastly:
x-served-by+x-cache - CloudFront:
x-cache
Tools:
- Browser DevTools: Network tab shows cache status
curl -Ito see headers- Cloudflare / Vercel dashboards show cache hit-rate
Common debugging steps:
- Check Cache-Control on response. Is it set?
- Check CDN dashboard cache hit-rate. Is it >80%?
- Check x-vercel-cache header. Why miss?
- Check Vary headers. Are they fragmenting cache?
Standard cache hit-rate targets:
- Static assets: 99%+
- Public API: 70-90%
- Marketing pages: 80-95%
- Authenticated: per-user-dependent (often N/A for shared cache)
For my system: [debug]
Output:
- Debug commands
- CDN dashboard review
- Common issues
The debugging fundamental: **`curl -I https://yoursite.com/api/endpoint`**. Headers tell you everything. Cache-Control set? Cache-Status hit? Vary fragmenting?
## Common Caching Mistakes
Help me avoid mistakes.
The 10 mistakes:
1. No Cache-Control headers Default = "don't cache"; performance left on table.
2. Same Cache-Control everywhere Static + dynamic + sensitive all get same; wrong for some.
3. Cache sensitive data "public" on token / PII = leak.
4. Cache long; no invalidation Stale data shown; customers complain.
5. Vary on Authorization CDN-fragmented per user; defeats benefit.
6. ETag without conditional check Sets ETag but doesn't return 304; wasted.
7. No SWR Users wait on cache miss; missed UX win.
8. Cache user data with public Wrong user gets shown.
9. Forgetting CDN tier Cloudflare free tier doesn't cache HTML by default.
10. Cache-Tag without invalidation Tag set; never used; stale forever.
For my system: [risks]
Output:
- Top 3 risks
- Mitigations
- Audit
The single most-painful mistake: **caching authenticated content with `public`**. User A sees user B's data. Privacy breach. Always `private` for authenticated.
## What Done Looks Like
A working HTTP caching strategy:
- Cache-Control set on every response
- Static assets: `public, max-age=31536000, immutable` with hash-in-filename
- Public API: `public, s-maxage=60, stale-while-revalidate=86400`
- Authenticated: `private, max-age=0, must-revalidate`
- Sensitive: `no-store`
- ETag on responses worth conditional-GET
- 304 returned when If-None-Match matches
- Cache-Tag for grouped invalidation
- Invalidate on writes (revalidateTag)
- CDN cache hit-rate monitored (target 80%+ on public)
- Cache-Status header debugging available
- 30-100x performance improvement on cached endpoints
The proof you got it right: cache hit-rate dashboards show 80%+ on public endpoints; origin compute is 1/10 what it would be without caching; users perceive app as fast even on slow networks.
## See Also
- [Caching Strategies](caching-strategies-chat.md) — Redis / DB caching companion
- [Performance Optimization](performance-optimization-chat.md) — broader perf
- [HTTP Retry & Backoff](http-retry-backoff-chat.md) — companion HTTP layer
- [Multi-region Deployment](multi-region-deployment-chat.md) — region + cache interplay
- [Database Indexing Strategy](database-indexing-strategy-chat.md) — indexed queries first; cache second
- [Webhook Signature Verification](webhook-signature-verification-chat.md) — adjacent HTTP discipline
- [API Pagination Patterns](api-pagination-patterns-chat.md) — pagination + caching
- [API Versioning](api-versioning-chat.md) — version + cache interplay
- [Public API](public-api-chat.md) — broader API design
- [Schema Validation with Zod](schema-validation-zod-chat.md) — validation companion
- [VibeReference: CDN Providers](https://vibereference.dev/cloud-and-hosting/cdn-providers) — Cloudflare / Vercel / Fastly / CloudFront
- [VibeReference: Vercel](https://vibereference.dev/cloud-and-hosting/vercel) — Vercel-specific cache features
- [VibeReference: Cloudflare](https://vibereference.dev/cloud-and-hosting/cloudflare) — Cloudflare cache rules