Caching Strategies: Layers, Invalidation, TTLs, and Shipping Caches Without Stale-Data Bugs
If you're running a SaaS in 2026, the cache decisions you make now will dictate how the product feels at 10x traffic. Most founders skip caching too long, then panic-add Redis after the first slow page complaint, then discover three months later that customers are seeing each other's data because the cache key didn't include tenant_id. The cache layer is one of the highest-leverage and highest-risk parts of your stack — fast, mostly invisible when it works, and capable of producing the worst class of bugs (stale data, leaked data) when it doesn't.
A working caching strategy answers: which layer caches what, what's the TTL per layer, what triggers invalidation, and how do we avoid leaking one tenant's data to another. Done well, the app feels instant and the database stays calm. Done badly, you're on a Sunday call with a Sev-1 tenant-leak incident wondering how a cache miss became a security incident.
This guide is the implementation playbook for caching that scales — the layered architecture, the invalidation patterns, the TTL math, and the rules that prevent stale-data and cross-tenant disasters.
The Cache Pyramid: Five Layers, Each With a Job
Caching isn't one decision; it's five. Each layer has a different latency, a different scope, and a different invalidation pattern. Get the layer assignment right; everything else follows.
Help me design the cache layers.
The five layers (top to bottom):
**1. Browser cache (client-side)**
- Lives in the user''s browser
- Set via Cache-Control / ETag headers
- TTL: minutes to days
- Scope: per-user (good); doesn''t help cold visitors
Use for:
- Static assets (JS, CSS, images, fonts) — long TTL, immutable
- API responses that are user-specific and rarely change
Don''t use for:
- Anything sensitive that shouldn''t persist after logout
- Data that changes frequently (defeats the purpose)
**2. CDN edge cache**
- Lives on CDN PoPs (Cloudflare, Vercel, CloudFront — per [cdn-providers](https://www.vibereference.com/cloud-and-hosting/cdn-providers))
- Set via Cache-Control / Surrogate-Key
- TTL: minutes to hours (for dynamic) or weeks (for assets)
- Scope: global (anyone in same region gets cached response)
Use for:
- Public marketing site
- Static assets
- Public API responses (be careful — see "tenant isolation" below)
- ISR pages (Next.js / etc.)
Don''t use for:
- Authenticated, per-user content (without scoped keys)
- Tenant-private data without strong key isolation
**3. Application-level cache (in-process)**
- Lives in your Node / Python / Go process memory
- TTL: seconds to minutes (process restarts wipe it)
- Scope: per-process
Use for:
- Hot config / feature flags
- Computed values (expensive joins / aggregations)
- Rate-limit counters (when not distributed)
Don''t use for:
- Anything that must be consistent across instances
- Large data (memory pressure)
**4. Distributed cache (Redis / Memcached)**
- Lives in shared cache server (per [database-providers](https://www.vibereference.com/backend-and-data/database-providers))
- TTL: seconds to days
- Scope: shared across all instances
- Atomicity (Redis): pipelines, transactions, Lua scripts
Use for:
- Session data
- API rate limiting
- Job queues / pub-sub
- Cross-instance shared cache
- Computed data with complex invalidation
Don''t use for:
- Source of truth for revenue / inventory data (use database)
- Anything you can''t reproduce from origin if cache fails
**5. Database query cache / materialized views**
- Lives in your DB (Postgres pg_stat_statements, Materialized Views) or DB-adjacent (PgBouncer)
- TTL: depends on refresh strategy
- Scope: query-level
Use for:
- Expensive aggregations refreshed periodically
- Read-heavy reports
- "Top N" rankings, dashboards
Don''t use for:
- Real-time data (refresh latency)
**The pyramid mapping**:
| Use Case | Best Layer | TTL | Why |
|---|---|---|---|
| Static asset (JS / CSS / image) | Browser + CDN | Days-weeks | Immutable; far edge wins |
| Marketing page | CDN | Hours-days | Public; rarely changes |
| Authenticated dashboard data | Distributed (Redis) | Seconds-minutes | Per-tenant isolation |
| Hot config / feature flag | Application | Seconds | Low write rate |
| Session data | Distributed | Hours-days | Shared across instances |
| Computed leaderboard | Database materialized view | Hourly refresh | Heavy compute |
| API response (public) | CDN | Minutes | Cacheable for many users |
| API response (authenticated) | Distributed | Minutes | Tenant-scoped |
| Rate-limit counter | Distributed | Window | Shared state |
For my app:
- The 5 layers and what each holds
- The TTL per layer
- The invalidation strategy per layer
Output:
1. The cache architecture diagram
2. The data classification (public / private / static / dynamic)
3. The layer assignment per data type
4. The TTL table
The biggest unforced error: caching authenticated tenant data at the CDN without scoping the cache key. A user signs in, hits /api/dashboard, the response gets cached at CDN, and the next user hitting the same URL gets the previous user''s data. This is a tenant-leak incident. The fix: never cache authenticated responses at CDN unless the cache key explicitly includes user / tenant identity (Surrogate-Key, custom header) AND Cache-Control: private. When in doubt, mark Cache-Control: private, no-store and cache at distributed (Redis) instead.
TTL Math: Pick the Number, Justify It
Every cache TTL has a tradeoff: shorter = fresher data, more origin load; longer = staler data, less origin load. Pick deliberately.
Help me set TTLs systematically.
The TTL framework:
**Step 1: Categorize the data**
| Category | Update frequency | Example | TTL band |
|---|---|---|---|
| Immutable | Never | Versioned JS bundle | 1 year (max) |
| Static | Rarely | Marketing copy | 1 hour - 1 day |
| Slow-changing | Hourly | Pricing tier list | 5-15 minutes |
| Medium-changing | Per-minute | Dashboard metrics | 30-60 seconds |
| Fast-changing | Per-second | Active-user list | 5-10 seconds (or no cache) |
| Real-time | Sub-second | Trading prices | No cache; SSE / WebSocket |
**Step 2: Compute origin-load impact**
Without cache: every request hits origin.
With cache + TTL T: origin gets at most 1 request per TTL window per cache key.
Example:
- Endpoint gets 1000 req/sec across all users
- Cache key per user: 100 active users
- TTL = 60 seconds
- Origin load = 100 requests / 60 seconds = ~1.7 req/sec to origin
- Reduction: 99.8%
**Step 3: Compute stale-data impact**
Worst case: data was updated at T=0; cache TTL is 60s; user hits at T=59.
How bad is 59 seconds of staleness?
- Marketing copy: fine
- Dashboard analytics: probably fine
- Inventory count: borderline
- Account balance: NOT fine
- Permission change: dangerous (security implication)
**Step 4: Pick TTL based on tolerance**
- Stale-tolerant: long TTL (minutes to hours)
- Stale-sensitive: short TTL (seconds) or invalidate on write
- Stale-intolerant: no cache; or invalidate-on-write strict
**The "p99 freshness" rule**:
For 99% of users, TTL/2 is the average staleness they''ll see.
If you can''t tolerate average staleness > X seconds, set TTL = 2X.
**Specific TTL guidelines**:
| Data | Suggested TTL |
|---|---|
| User profile | 60-300 sec (invalidate on write) |
| Settings / preferences | 60 sec (invalidate on write) |
| Permission / role data | 0 (no cache) or short with strict invalidation |
| Feature flags | 30-60 sec |
| Content (blog post, doc) | 5-15 min (invalidate on publish) |
| Marketing page | 1-24 hr (invalidate on deploy) |
| Search results | 1-5 min |
| Aggregated metrics | 1-5 min |
| Pricing tier definitions | 5-15 min |
| Static asset (versioned) | 1 year |
| Static asset (unversioned) | 1 hour |
**The "explain this TTL in 1 sentence" rule**:
For every cache TTL, you should be able to say:
> "We cache X for Y seconds because Z change at most every Y/2 seconds, and customers tolerate Y/2 seconds of staleness."
If you can''t: rethink. Random TTLs (300, 600, 900) without justification compound into bugs.
For my app:
- TTL per cached data type
- Justification per TTL (the 1-sentence test)
- The explicit "no cache" list
Output:
1. The TTL table with justifications
2. The "no cache" explicit list
3. The TTL review cadence
The biggest TTL mistake: picking 1-hour TTLs because "they sound reasonable" without checking what data updates more often than that. A user-permission change that takes 60 minutes to propagate is a 60-minute security exposure. A pricing update that''s stale for 1 hour is 1 hour of customers seeing wrong prices. TTLs need empirical grounding: how often does this data ACTUALLY change, and what''s the cost of staleness?
Invalidation: The Hard Part
"There are only two hard problems in computer science: cache invalidation and naming things." Invalidation is the part that bites. Plan for it.
Help me design cache invalidation patterns.
The four invalidation strategies:
**Strategy 1: TTL-based (lazy invalidation)**
- Cache expires after fixed time
- Reads after expiration miss → fetch from origin → re-cache
- No active invalidation; just wait
Pros: simple; no infrastructure
Cons: stale data within TTL window; not suitable for must-be-fresh data
Use for: most static / semi-static data
**Strategy 2: Write-through invalidation**
- Cache key gets invalidated when underlying data changes
- On write, application explicitly purges the cache key
- Next read repopulates
Pros: fresh after writes
Cons: requires app to know all cache keys; risk of missing invalidation paths
Use for: user profile, settings, anything must-be-fresh-after-update
**Strategy 3: Tag-based invalidation**
- Each cache entry has tags (e.g., `user:123`, `tenant:abc`, `feature:billing`)
- On data change, purge all entries with matching tag
- One write can invalidate many keys
Pros: handles complex invalidation cleanly
Cons: requires cache backend that supports tags (Cloudflare Cache Tags, Vercel cacheTag, Redis with tag mappings)
Use for: complex data with multiple read paths
**Strategy 4: Event-driven invalidation**
- Database triggers / change-data-capture (CDC) emits events
- Cache listener invalidates affected keys
- Scales to many readers
Pros: automatic; no app-level coupling
Cons: requires CDC infrastructure (Debezium, Postgres logical replication)
Use for: large-scale systems with many reader services
**The Vercel-native pattern (Next.js 15 / 16)**:
Vercel''s `cacheTag` + `updateTag` (per [vercel-runtime-cache](https://www.vibereference.com/cloud-and-hosting/vercel-functions)) gives you tag-based invalidation natively:
```typescript
// In server function
import { cacheTag, updateTag } from 'next/cache';
export async function getUser(userId: string) {
'use cache';
cacheTag(`user:${userId}`);
return await db.users.findById(userId);
}
// On update
export async function updateUser(userId: string, data: any) {
await db.users.update(userId, data);
updateTag(`user:${userId}`); // invalidates cached user
}
Pattern: tag-on-read; invalidate-tag-on-write.
The Redis-native pattern:
// Read
const cached = await redis.get(`user:${userId}`);
if (cached) return JSON.parse(cached);
const user = await db.users.findById(userId);
await redis.set(`user:${userId}`, JSON.stringify(user), 'EX', 300);
return user;
// Write
async function updateUser(userId: string, data: any) {
await db.users.update(userId, data);
await redis.del(`user:${userId}`); // invalidate
}
For multi-key invalidation (tag-based on Redis):
// Tag mapping: SADD "tag:tenant:123" "user:abc" "user:def" ...
// On invalidate:
const keys = await redis.smembers('tag:tenant:123');
if (keys.length > 0) {
await redis.del(...keys);
await redis.del('tag:tenant:123');
}
The "every write invalidates" rule:
Every code path that writes to data X must invalidate the cache for X.
This is non-negotiable. Code review checklist:
- Direct DB write: invalidate cache?
- Background job updates: invalidate cache?
- Admin tool changes: invalidate cache?
- Webhook receives external change: invalidate cache?
- Migration backfill: invalidate cache?
Missed invalidations are the #1 cache bug.
Anti-patterns:
- "We''ll just rely on TTL for must-be-fresh data" — staleness becomes a feature complaint
- Manual cache flushes via SSH ("just nuke Redis if anything''s wrong") — not a strategy
- Different services invalidate cache differently — drift; bugs
- Cache keys without consistent naming — can''t invalidate what you can''t enumerate
For my app:
- The invalidation strategy per data type
- The "every write" audit
- The cache-key naming convention
Output:
- The invalidation strategy matrix
- The audit of write paths
- The cache-key naming convention
The biggest invalidation mistake: **forgetting one write path.** You wire up `/api/users/:id` PUT to invalidate `user:123` cache. Six months later, an admin tool writes directly to the database and the cache stays stale. A user calls support. It takes 3 hours to debug. The fix: list every write path on day one; ensure each invalidates; add an integration test that checks "after write, cache is fresh." Treat cache invalidation as part of the contract for any data-mutation function.
## Cache Keys: Naming, Scoping, and the Tenant-Leak Problem
Bad cache keys produce the worst bug class — leaking one user''s data to another. Spend disproportionate effort on key design.
Help me design cache keys safely.
The cache-key naming convention:
Pattern: {namespace}:{entity}:{id}:{variant?}
Examples:
user:profile:123tenant:abc:dashboard:metricstenant:abc:user:123:permissionsfeature-flag:billing-redesignpricing:tier-list:v2
Required components for tenant data:
- Tenant identifier (always; never optional)
- Entity type (user / dashboard / report / etc.)
- Entity ID (specific record)
- Variant (locale, format, version) if applicable
The tenant-isolation rule:
ANY data that''s tenant-private MUST include tenant:X in the cache key.
Examples:
- ❌
dashboard:metrics— leaks across tenants - ✅
tenant:abc:dashboard:metrics— tenant-scoped
The user-isolation rule:
ANY data that''s user-private (within a tenant) MUST include user:Y in the cache key.
Examples:
- ❌
tenant:abc:notifications— leaks across users in same tenant - ✅
tenant:abc:user:123:notifications— user-scoped
The "negative cache key" anti-pattern:
Some apps cache "no result" responses to avoid hammering DB on missing-record lookups. This is fine, but the key must include enough context that a real value insertion correctly invalidates.
The HTTP-cache-key trap:
CDN cache keys default to URL only. If /api/dashboard returns different data per user, you must:
- Add
Vary: Authorizationheader (so cache key includes auth header) - Or use
Cache-Control: private(forbids CDN cache; only browser caches) - Or include user identity in URL (
/api/users/123/dashboard) - Or use Surrogate-Key + per-user tags for explicit control
Cache key for static / public data:
marketing:home:v3pricing:public:v2blog:post:slug-here
These are intentionally not tenant-scoped (data is public).
Cache key versioning (for safe migrations):
When you change the data shape:
- Increment the version:
user:profile:v2:123 - Old keys age out via TTL
- No risk of mixing old + new data during deploy
The cache-key audit:
Quarterly: review all cache keys
- Are tenant-scoped where required?
- Are they consistent in naming?
- Any "unsafe" keys (missing tenant)?
Anti-patterns:
- Cache keys that don''t include user / tenant for private data
- Inconsistent prefixes across services
- Composite keys without separator (
user123dashboardvsuser:123:dashboard) - Hash-based keys without context (can''t debug; can''t enumerate)
Tooling:
- Type your cache-key builder in TypeScript:
type CacheKey =
| { type: 'user-profile'; userId: string }
| { type: 'tenant-dashboard'; tenantId: string }
| { type: 'feature-flag'; flagId: string };
function buildKey(k: CacheKey): string {
switch (k.type) {
case 'user-profile': return `user:profile:${k.userId}`;
case 'tenant-dashboard': return `tenant:${k.tenantId}:dashboard`;
case 'feature-flag': return `flag:${k.flagId}`;
}
}
This makes it impossible to construct an invalid key.
For my app:
- The cache-key naming convention
- The tenant-leak audit
- The type-safe builder
Output:
- The naming convention doc
- The tenant-leak audit list
- The type-safe key builder
The biggest cache-key mistake: **caching tenant data with a non-tenant-scoped key.** This is a security incident waiting to fire. The first time a customer reports "I see another company''s data on my dashboard," your cache architecture is the problem. The fix is preventive: every cache key for private data MUST include tenant; lint / test for it; never let a `cache.set(key, data)` ship without a key that includes tenant.
## Cache Stampede and the Thundering Herd
When a hot cache entry expires, all concurrent requests miss simultaneously, all hit origin, origin melts. Plan for this.
Help me handle cache stampede.
The problem:
- Cache key X has TTL 60 seconds
- 1000 req/sec for X
- Cache expires at T=0
- All 1000 requests at T=0 miss; all 1000 hit DB
- DB melts under 1000 concurrent reads
- Cache repopulates for one request; others wait or fail
This is "cache stampede" or "thundering herd."
Mitigations:
1. Probabilistic early refresh (XFetch algorithm)
- Some fraction of requests just before TTL expires, refresh cache
- Spreads load across the TTL window
2. Stale-while-revalidate
- Serve stale cached value immediately
- Asynchronously refresh in background
- Per
Cache-Control: stale-while-revalidate=Nheader
Cache-Control: max-age=60, stale-while-revalidate=300
Means: cache for 60s; after that, serve stale up to 300s while refreshing in background.
3. Single-flight (request coalescing)
- When multiple concurrent requests miss the cache
- Only ONE goes to origin; others wait for that result
- Implement via in-memory mutex or Redis lock
async function getWithCoalescing(key: string) {
// Try cache first
const cached = await cache.get(key);
if (cached) return cached;
// Acquire lock
const lockAcquired = await redis.set(`lock:${key}`, '1', 'NX', 'EX', 5);
if (lockAcquired) {
// Lock holder fetches from origin
const value = await origin.fetch(key);
await cache.set(key, value, 'EX', 60);
await redis.del(`lock:${key}`);
return value;
} else {
// Wait briefly; retry cache
await sleep(100);
return await cache.get(key);
}
}
4. Background refresh
- Cron / worker proactively refreshes hot cache entries before expiration
- Origin load is predictable
5. Jittered TTLs
- Don''t set TTL = exactly 60 seconds for all entries
- Set TTL = 60 ± random(10) seconds
- Spreads expiration across time
The pragmatic recipe for indie SaaS:
- Use stale-while-revalidate at CDN
- Use Redis lock for hot keys at app level
- Add jitter to TTLs (10% randomness)
- Monitor cache hit rate (should be >90% for hot data)
When NOT to optimize for stampede:
- Low-traffic endpoints (no stampede possible)
- Endpoints with heterogeneous keys (each key a few req/sec)
Premature stampede protection adds complexity. Add when monitoring shows it.
For my app:
- The stampede risk per cached endpoint
- The mitigations to implement
- The monitoring to detect
Output:
- The hot-key list
- The stampede mitigations
- The cache hit-rate dashboard
The biggest stampede mistake: **assuming it won''t happen until it does.** A homepage / dashboard endpoint serving 10K req/sec with a 60-second TTL is one cache miss away from a database meltdown. Stale-while-revalidate is a 1-line header change that prevents this; add it before you need it. Single-flight is harder; add it when monitoring shows hot-key contention.
## Cache Observability: Hit Rate, Latency, Memory
A cache without monitoring is a cache without correctness guarantees.
Help me observe my cache.
The metrics:
1. Hit rate
- (cache hits) / (cache hits + cache misses)
- Goal: >90% for hot data; >80% overall
- Drop in hit rate signals: TTL too short / invalidation bug / data shape change
2. Latency
- p50, p95, p99 cache read latency
- Redis: ~1ms p95 typical
- Distributed cache: <5ms p95
- Application cache: <0.1ms
3. Memory usage
- Cache size (bytes)
- Eviction rate (when full, LRU eviction)
- Goal: <80% of available memory; alert at 90%
4. Top keys
- Most-frequently-accessed keys
- Largest keys (memory consumers)
- Helps spot stampede risk + memory hogs
5. Stale-data incidents
- How often does production data not match cache?
- Hard to measure without explicit checks
- Useful: random 1% of reads also fetch from origin and compare
Tools:
- Redis Insight / Redis CLI (
INFO stats,MEMORY STATS) - Datadog / Grafana with Redis exporter
- App-level metrics (StatsD / OpenTelemetry)
- Custom dashboard in your observability stack (per error-monitoring-providers)
Alerts:
- Hit rate <50%: investigate (regression)
- Memory >90%: scale up Redis
- Stampede detected (single-flight contention high): review hot keys
- Stale-data report from customer: drop everything; debug
Cache "tests":
Write tests that verify:
- After write, cache returns new value (not old)
- Cache key includes tenant for tenant-scoped data
- TTL behaves as expected (test with mocked clock)
- Invalidation on write actually clears cache
For my app:
- The metrics I track today
- The metrics I need to add
- The dashboard / alerting plan
Output:
- The cache observability plan
- The dashboard mockup
- The alert thresholds
The biggest observability mistake: **shipping cache without metrics.** A cache that "works most of the time" is invisible until it doesn''t. Hit rate, latency, and memory are the minimum; add them on day one, not month six. Without metrics, the next stale-data incident takes 4 hours instead of 4 minutes to debug.
## Quarterly Cache Review
Caches drift. Build the review.
The quarterly cache review:
1. Hit-rate audit
- Per cached endpoint: hit rate this quarter
- Drops indicate: TTL too short, invalidation bug, data shape change
- Investigate any rate <50%
2. Memory pressure
- Are we approaching Redis memory limits?
- Top 20 keys by memory consumption
- Eviction rate: any keys getting evicted before TTL?
3. Stale-data incidents
- Customer reports of stale data this quarter
- Root cause per incident
- Pattern detection (always same endpoint? always same write path?)
4. New cache opportunities
- Slow endpoints that aren''t cached yet
- New features that should be cached on launch
5. TTL review
- Are TTLs still appropriate?
- Has data update frequency changed?
6. Tenant-isolation audit
- New cache keys added this quarter
- Any missing tenant scope?
Cache decommissioning:
Some caches outlive their utility. Quarterly: which caches can we remove?
- Rarely-hit (<1% of requests)
- Underlying query is now fast (DB optimization made cache redundant)
- Data flow changed (cache now wrong layer)
Removing dead caches simplifies code + reduces stale-data risk.
Output:
- The QBR template
- The owner (eng lead)
- The decision log
The biggest review-cadence mistake: **never reviewing.** Caches added in year 1 might be stale assumptions in year 2. A cache designed for 1K req/sec might be wrong at 100K req/sec. Quarterly review keeps the cache layer aligned with current reality. Without it, you''re carrying assumptions that compound into bugs.
---
## What "Done" Looks Like
A working caching strategy in 2026 has:
- 5 layers explicitly assigned roles (browser / CDN / app / distributed / DB)
- TTL per data type with empirical justification (not vibes)
- Tenant-scoped cache keys for all private data
- Invalidation wired into every write path
- Stale-while-revalidate or single-flight for hot keys
- Hit-rate / latency / memory monitoring + alerts
- Type-safe cache-key builder
- Quarterly review baked in
The hidden cost of weak caching: **either a slow product (no cache; database overloaded) or stale-data bugs (cache without invalidation discipline).** Both kill trust. The middle ground — explicit layers, tenant-scoped keys, write-time invalidation, monitored hit rates — is more work upfront but compounds into a fast, correct, debuggable system. Skip it; pay later.
## See Also
- [Performance Optimization](performance-optimization-chat.md) — broader perf context
- [Database Migrations](database-migrations-chat.md) — schema changes affect cache shape
- [Multi-Tenancy](multi-tenancy-chat.md) — tenant-isolation principles
- [Audit Logs](audit-logs-chat.md) — cache invalidation events
- [Rate Limiting & Abuse](rate-limiting-abuse-chat.md) — cache + rate-limit overlap
- [Public API](public-api-chat.md) — API caching strategy
- [Service Level Agreements](service-level-agreements-chat.md) — SLA depends on cache reliability
- [Real-Time Collaboration](real-time-collaboration-chat.md) — when NOT to cache
- [Backups & Disaster Recovery](backups-disaster-recovery-chat.md) — cache loss recovery
- [VibeReference: CDN Providers](https://www.vibereference.com/cloud-and-hosting/cdn-providers) — CDN layer
- [VibeReference: Database Providers](https://www.vibereference.com/backend-and-data/database-providers) — Redis / Postgres
- [VibeReference: Vercel Functions](https://www.vibereference.com/cloud-and-hosting/vercel-functions) — Vercel runtime cache
- [VibeReference: Error Monitoring Providers](https://www.vibereference.com/devops-and-tools/error-monitoring-providers) — observability
- [LaunchWeek: SEO Strategy](https://www.launchweek.com/2-content/seo-strategy) — TTFB / Core Web Vitals depend on cache
[⬅️ Day 6: Grow Overview](README.md)