API & HTTP Caching: Cache-Control, ETags, and the 30x-Faster API That Costs Nothing Extra

⬅️ Day 6: Grow Overview

If your SaaS has APIs serving the same data to many users in 2026, every uncached request is wasted compute. The naive implementation: every request hits the DB; every page generates fresh; CDN caches nothing because Cache-Control isn't set. The result: slow APIs, expensive scaling, blown budgets. The fix isn't more servers — it's HTTP caching done right. Cache-Control headers + ETag conditional GET + stale-while-revalidate gets you 30-100x performance improvements at zero infrastructure cost. Most indie SaaS leaves this on the table; mid-market often gets it half-right; only the disciplined teams ship it correctly.

A working HTTP-caching strategy answers: which responses are cacheable (mostly GETs of public / semi-public data), what TTL (seconds for fresh / minutes for stale-OK / hours for static), how to invalidate (Cache-Tag + invalidate on writes), how to handle authenticated content (private cache; vary headers), how to use ETags (conditional GET; saves bandwidth), and how to debug (Cache-Status header from CDN).

This guide is the implementation playbook for HTTP / API caching. Companion to Caching Strategies (Redis/DB), Performance Optimization, HTTP Retry & Backoff, and Multi-region Deployment.

Why HTTP Caching Matters

Get the value clear first.

Help me understand the impact.

The economics:

**Without caching**:
- Every request hits origin
- Origin scales linearly with traffic
- Slow at distance from server (200-500ms)
- Cost per request: full server compute

**With HTTP caching**:
- 80-99% of requests served from CDN edge
- Origin handles only cache-misses
- 10-30ms TTFB (CDN edge close to user)
- Cost: near-zero per cached request

**Real numbers**:

For a typical SaaS:
- Without cache: $500-2000/mo on origin compute for 10M req/mo
- With aggressive caching: $50-200/mo (10-20%)

Plus latency: 200ms → 30ms perceived = users feel app is faster.

**Where caching makes the difference**:

- Marketing pages (always cacheable)
- Public API endpoints (often cacheable)
- Status / health endpoints
- Image / asset delivery
- API responses with public-but-changes-occasionally data
- Long-poll / streaming endpoints (with care)

**Where caching can't help**:

- Real-time / live data
- Per-user authenticated content (private cache only)
- Mutations (POST / PUT / DELETE)

For my product: [endpoints]

Output:
1. Cacheable endpoint inventory
2. Cost estimate (without caching)
3. Caching priority

The biggest unforced error: shipping with no Cache-Control headers. CDNs default to "don't cache." Origin handles every request. Performance + cost both suffer.

Cache-Control Header: The Foundation

Help me set Cache-Control right.

The directives:

**public**:
Response can be cached by ANY cache (CDN; browser; intermediate proxy).
Use for: public data; non-personalized.

**private**:
Response can be cached by browser only; not shared caches.
Use for: per-user data; user-specific responses.

**no-store**:
Don't cache anywhere.
Use for: sensitive data; one-time tokens.

**no-cache**:
Cache, but revalidate (ETag) before using.
Use for: data that changes; want freshness check.

**max-age=[seconds]**:
How long the cache can use before refreshing.

**s-maxage=[seconds]**:
Like max-age but for shared caches (CDN). Lets you set different TTLs for browser vs CDN.

**stale-while-revalidate=[seconds]**:
Use stale content while revalidating in background. Massive UX win.

**stale-if-error=[seconds]**:
If origin fails, use stale content (gracefully degrade).

**must-revalidate**:
Don't use stale beyond max-age.

**immutable**:
Tells browser this won't change (asset with hash in URL); skip revalidation.

**The standard recipes**:

**Static asset** (JS / CSS bundle with hash):

Cache-Control: public, max-age=31536000, immutable

1 year + immutable. Hash changes filename when content changes.

**Public API response** (changes every minute):

Cache-Control: public, s-maxage=60, stale-while-revalidate=86400

Cache 60s; serve stale up to 24h while revalidating.

**Page (HTML; changes occasionally)**:

Cache-Control: public, s-maxage=300, stale-while-revalidate=86400


**User-specific API response**:

Cache-Control: private, max-age=0, must-revalidate

Browser-only; revalidate every time.

**Sensitive data** (tokens / personal info):

Cache-Control: no-store


For my responses: [audit per endpoint]

Output:
1. Per-endpoint Cache-Control recipe
2. Static vs dynamic split
3. Audit script

The single most-impactful header: stale-while-revalidate. Users get instant response from cache; revalidation happens in background; next user gets fresh. Massive UX + perceived performance win.

ETag: Conditional GET

Help me use ETags.

The pattern:

Server sends response with ETag header (hash of content):

HTTP/1.1 200 OK ETag: "abc123" Content-Length: 5000 [body]


Client caches response with ETag.

On next request, client sends:

GET /api/data If-None-Match: "abc123"


Server compares. If unchanged:

HTTP/1.1 304 Not Modified ETag: "abc123" [no body]


Client uses cached body. Saved: bandwidth (no body transfer); database query (often).

**Implementation**:

```typescript
// Generate ETag (often hash of content or version)
const data = await db.query(...);
const etag = `"${crypto.createHash('sha256').update(JSON.stringify(data)).digest('hex')}"`;

// Check If-None-Match
const ifNoneMatch = req.headers.get('if-none-match');
if (ifNoneMatch === etag) {
  return new Response(null, { status: 304, headers: { ETag: etag } });
}

return new Response(JSON.stringify(data), {
  status: 200,
  headers: {
    'ETag': etag,
    'Cache-Control': 'public, max-age=60',
    'Content-Type': 'application/json',
  },
});

Smart ETag generation:

For data that's expensive to compute:

Use updated_at or version column from DB as ETag
Skip the body-generation if If-None-Match matches

const lastUpdated = await db.query('SELECT max(updated_at) FROM table');
const etag = `"${lastUpdated.toISOString()}"`;

if (req.headers.get('if-none-match') === etag) {
  return 304; // Save full data fetch
}

const data = await fullQuery();
return data with ETag;

This makes ETag check ultra-fast (single timestamp query); body fetch only on change.

Last-Modified alternative:

Last-Modified: Tue, 30 Apr 2026 10:00:00 GMT

Client returns:

If-Modified-Since: Tue, 30 Apr 2026 10:00:00 GMT

Same idea; second-precision; less granular than ETag.

For my API: [audit]

Output:

ETag strategy
Implementation
Backend optimization


The non-obvious win: **ETag comparison BEFORE expensive query**. Client cache hit → 304 in 5ms. Without ETag: full database query + serialization + response = 200ms. 40x improvement on cache hits.

## CDN Caching: Cloudflare / Vercel / Fastly

Help me set up CDN caching.

The 2026 CDN landscape:

Cloudflare:

Largest CDN; free tier robust
Cache-Control respect: yes (in defaults; Cache Rules customize)
Cache Tags / Purge: paid plans
Edge Cache + Browser Cache TTL configurable

Vercel CDN:

Bundled with Vercel deployments
Global edge network
Cache-Control respect: yes
Cache Tags + on-demand invalidation: built-in
ISR (Incremental Static Regeneration): Next.js-native

Fastly:

Premium CDN; advanced features
VCL (Varnish Configuration Language): full control
Cache Tags: native
Real-time invalidation

AWS CloudFront:

Enterprise; flexible; AWS-locked
Cache behaviors per path
Invalidation: usually slow (5-15 min); paid

The Vercel CDN pattern:

// Next.js Route Handler
export async function GET() {
  const data = await fetchData();
  
  return Response.json(data, {
    headers: {
      'Cache-Control': 'public, s-maxage=60, stale-while-revalidate=86400',
      'Vercel-CDN-Cache-Control': 'public, s-maxage=60', // Vercel-specific
    },
  });
}

The "Cache-Tag" pattern (Cloudflare / Fastly / Vercel):

Tag responses for grouped invalidation:

Cache-Tag: products,category-electronics,user-12345

When data changes:

// On product update
await invalidateCacheByTag('products');
// All product responses purge instantly

Vercel: revalidateTag('products') API. Cloudflare: Purge by Tag (Enterprise tier). Fastly: native.

Edge-side CDN-only TTL:

Set s-maxage higher than max-age:

Cache-Control: public, max-age=0, s-maxage=3600

Browser revalidates every time; CDN caches 1 hour. Maximum CDN benefit; users always get fresh from CDN.

For my CDN: [pick]

Output:

CDN config
Cache-Tag strategy
Invalidation flow


The 2026 default for Vercel apps: **`s-maxage` + `stale-while-revalidate` + `Cache-Tag` for invalidation**. Three-line change; massive performance win.

## Authenticated Content: Vary Headers + Private Cache

Help me cache authenticated content.

The challenge: response varies by user. Naive cache = wrong user gets another user's data.

The solutions:

Option 1: Don't cache authenticated

Simplest:

Cache-Control: private, max-age=0

Browser caches; CDN doesn't. User gets cached locally; no shared-cache issues.

Option 2: Vary header

If different responses per user, tell CDN to vary:

Vary: Authorization

CDN caches separately per Authorization value. But: usually defeats CDN benefit (every user is unique). Use rarely.

Option 3: Cache by segment

Cache by user-segment:

Vary: X-User-Segment

If you compute X-User-Segment header (free / pro / enterprise), CDN caches 3 versions. Useful for tier-based pricing pages.

Option 4: ESI / per-user fragments

Cache the page; insert per-user data via Edge Side Includes (ESI) or React Suspense:

Static page cached at edge
Dynamic per-user component fetched separately
Combined client-side

Modern Next.js handles this natively (Server Components + Suspense + dynamic = hybrid).

Option 5: Token-aware private cache

Cache-Control: private, max-age=300

Browser caches per-user; can use for 5 min. CDN doesn't.

User gets fast subsequent requests; first hit goes to origin.

For my auth: [scope]

Output:

Per-endpoint strategy
Vary if needed
ESI / Suspense pattern


The discipline: **default to `private` for authenticated**. Most authenticated content benefits from browser cache + per-user. Don't fight the protocol.

## Stale-While-Revalidate: Magic for UX

Help me use SWR.

The pattern (Cache-Control level):

Cache-Control: public, s-maxage=60, stale-while-revalidate=86400

Behavior:

0-60 seconds: serve from cache (fast)
60s-24h: serve stale immediately + revalidate in background
24h+: cache miss; fetch from origin

The user-perceived effect:

Always fast (cache hit); never blocking on origin
Eventually fresh (background revalidation)

Where SWR shines:

Marketing pages (changes occasionally; freshness within day OK)
Public API responses (counts; lists; aggregations)
Pricing pages (don't need millisecond freshness)
Dashboard data (background-refresh feels modern)

Where SWR doesn't fit:

Real-time data (chat / live feeds)
Time-sensitive data (auction prices)
Per-second-fresh requirements

Frontend SWR (TanStack Query / SWR library):

Server-side SWR (Cache-Control header) and client-side SWR (TanStack Query / SWR library) are different layers; both improve UX:

// Client-side
const { data } = useSWR('/api/products', fetcher, {
  refreshInterval: 60_000,
  revalidateOnFocus: true,
});

Combined with server-side SWR: client uses cached data; revalidates from CDN; CDN serves cached or revalidates from origin. Layered cache.

For my endpoints: [SWR strategy]

Output:

Per-endpoint SWR config
Client + server layers
UX expectations


The win that compounds: **stale-while-revalidate at the CDN level**. Users always fast; data is always fresh-eventually. Best of both worlds.

## Cache Invalidation: The Hardest Problem

Help me invalidate cache.

The two strategies:

Time-based (TTL):

Set short TTL; cache expires; eventually fresh.

Pros: simple Cons: stale window; over-fetching after expiry

Use for: data where stale-OK (marketing pages)

Event-based (Tag invalidation):

Tag responses; invalidate when underlying data changes.

// On product update
await db.product.update({ id, name });
await revalidateTag(`product-${id}`);

Pros: instant freshness; long TTLs OK Cons: requires invalidation discipline; coupled to writes

Use for: data where freshness matters

Hybrid (recommended):

Cache-Control: public, s-maxage=3600, stale-while-revalidate=86400
Cache-Tag: product-123, products

Long TTL with SWR (handles slow writes)
Tag invalidation on writes (handles fast updates)

Best of both.

Invalidation patterns:

On write: revalidateTag immediately after DB update
Cron: refresh popular content periodically
On demand: API endpoint to trigger invalidation
Webhook: external system signals invalidation

Common invalidation bugs:

Forgetting to invalidate (stale data)
Over-invalidating (cache thrash)
Invalidating wrong tag (still stale)
Race conditions (read after write before invalidation)

For my invalidation: [audit]

Output:

Tag strategy
Invalidation triggers
Testing pattern


The discipline: **invalidate on write, with tags**. Without it, you're choosing between fresh-but-slow and stale-but-fast. Tag invalidation gives you both.

## Debugging Cache: Cache-Status Header

Help me debug caching.

The debug tools:

Cache-Status header (RFC 9211):

CDN returns:

Cache-Status: Vercel; hit, Cloudflare; hit

Or:

Cache-Status: Vercel; miss, Cloudflare; expired; key=...

Tells you:

Did edge cache hit / miss?
Which cache?
Why missed (expired / stale / no-cache)?

Per-CDN headers:

Cloudflare: cf-cache-status
Vercel: x-vercel-cache
Fastly: x-served-by + x-cache
CloudFront: x-cache

Tools:

Browser DevTools: Network tab shows cache status
curl -I to see headers
Cloudflare / Vercel dashboards show cache hit-rate

Common debugging steps:

Check Cache-Control on response. Is it set?
Check CDN dashboard cache hit-rate. Is it >80%?
Check x-vercel-cache header. Why miss?
Check Vary headers. Are they fragmenting cache?

Standard cache hit-rate targets:

Static assets: 99%+
Public API: 70-90%
Marketing pages: 80-95%
Authenticated: per-user-dependent (often N/A for shared cache)

For my system: [debug]

Output:

Debug commands
CDN dashboard review
Common issues


The debugging fundamental: **`curl -I https://yoursite.com/api/endpoint`**. Headers tell you everything. Cache-Control set? Cache-Status hit? Vary fragmenting?

## Common Caching Mistakes

Help me avoid mistakes.

The 10 mistakes:

1. No Cache-Control headers Default = "don't cache"; performance left on table.

2. Same Cache-Control everywhere Static + dynamic + sensitive all get same; wrong for some.

3. Cache sensitive data "public" on token / PII = leak.

4. Cache long; no invalidation Stale data shown; customers complain.

5. Vary on Authorization CDN-fragmented per user; defeats benefit.

6. ETag without conditional check Sets ETag but doesn't return 304; wasted.

7. No SWR Users wait on cache miss; missed UX win.

8. Cache user data with public Wrong user gets shown.

9. Forgetting CDN tier Cloudflare free tier doesn't cache HTML by default.

10. Cache-Tag without invalidation Tag set; never used; stale forever.

For my system: [risks]

Output:

Top 3 risks
Mitigations
Audit


The single most-painful mistake: **caching authenticated content with `public`**. User A sees user B's data. Privacy breach. Always `private` for authenticated.

## What Done Looks Like

A working HTTP caching strategy:
- Cache-Control set on every response
- Static assets: `public, max-age=31536000, immutable` with hash-in-filename
- Public API: `public, s-maxage=60, stale-while-revalidate=86400`
- Authenticated: `private, max-age=0, must-revalidate`
- Sensitive: `no-store`
- ETag on responses worth conditional-GET
- 304 returned when If-None-Match matches
- Cache-Tag for grouped invalidation
- Invalidate on writes (revalidateTag)
- CDN cache hit-rate monitored (target 80%+ on public)
- Cache-Status header debugging available
- 30-100x performance improvement on cached endpoints

The proof you got it right: cache hit-rate dashboards show 80%+ on public endpoints; origin compute is 1/10 what it would be without caching; users perceive app as fast even on slow networks.

## See Also

- [Caching Strategies](caching-strategies-chat.md) — Redis / DB caching companion
- [Performance Optimization](performance-optimization-chat.md) — broader perf
- [HTTP Retry & Backoff](http-retry-backoff-chat.md) — companion HTTP layer
- [Multi-region Deployment](multi-region-deployment-chat.md) — region + cache interplay
- [Database Indexing Strategy](database-indexing-strategy-chat.md) — indexed queries first; cache second
- [Webhook Signature Verification](webhook-signature-verification-chat.md) — adjacent HTTP discipline
- [API Pagination Patterns](api-pagination-patterns-chat.md) — pagination + caching
- [API Versioning](api-versioning-chat.md) — version + cache interplay
- [Public API](public-api-chat.md) — broader API design
- [Schema Validation with Zod](schema-validation-zod-chat.md) — validation companion
- [VibeReference: CDN Providers](https://vibereference.dev/cloud-and-hosting/cdn-providers) — Cloudflare / Vercel / Fastly / CloudFront
- [VibeReference: Vercel](https://vibereference.dev/cloud-and-hosting/vercel) — Vercel-specific cache features
- [VibeReference: Cloudflare](https://vibereference.dev/cloud-and-hosting/cloudflare) — Cloudflare cache rules