Error Handling & Custom Error Pages: When Things Go Wrong, Don't Let Users See Your Stack Traces

⬅️ Day 6: Grow Overview

If your SaaS in 2026 still shows users default browser "404 Not Found" pages, raw stack traces on 500 errors, or "Something went wrong" messages with no recovery path, you're losing trust on every error. The naive approach: ship; let frameworks default-render errors; hope nothing breaks. The reality: 5-15% of user sessions hit some error (404, 500, expired session, network timeout, deleted resource); how you handle those errors is half the difference between a polished product and an amateur one. Most indie SaaS skips error UX, then realizes it's a real problem when support tickets fill up with screenshots of stack traces.

A working error-handling strategy answers: what error states exist (404, 403, 500, 503, network, app-level), how do you display each (generic vs specific), how do you preserve context (don't lose user's work), how do you log (capture context for debugging), how do you recover (retry / fallback / contact-support links), how do you avoid information leakage (no internal paths in production), and how do you measure (error rates as KPIs).

This guide is the implementation playbook for error UX + handling. Companion to Logging Strategy & Structured Logs, HTTP Retry & Backoff, Incident Response, Schema Validation with Zod, and Form Validation UX.

Why Error UX Matters

Get the failure modes clear first.

Help me understand error UX failures.

The 8 categories:

**1. Default browser 404**
"This site can't be reached" — generic; no recovery; users bounce.

**2. Stack-trace 500s in production**
Internal paths visible; security risk; unprofessional.

**3. "Something went wrong" with no context**
User can't tell: their fault? Server's fault? Network? Try again? Email support?

**4. Lost user work on error**
Form crashes; user retypes 200 chars. Editor dies; lost edits.

**5. Infinite loading on network failure**
Spinner forever; user waits 5 min before giving up.

**6. Generic message hiding actionable info**
Server returns "rate limited; retry in 30s"; UI shows "Error."

**7. Wrong error to wrong user**
Showing admin-only debug info to end users; or showing user-friendly message to admins debugging.

**8. Errors that aren't logged**
User reports issue; logs show nothing; can't reproduce.

For my product:
- Top error scenarios
- Current handling
- Worst user experience

Output:
1. Top 3 errors
2. UX gaps
3. Logging gaps

The biggest unforced error: assuming defaults are good enough. Browser default 404 looks like a broken site. Default Next.js 500 page in dev shows code; in production shows nothing useful. Custom pages signal "we ship intentionally."

The Error Categories You Need to Handle

Help me categorize errors.

The 6 categories:

**1. 404 — Not Found**

Triggers:
- Invalid URL (typo)
- Deleted resource (post deleted; URL still shared)
- Wrong tenant context (visiting another org's URL)

UX needed:
- Friendly explanation
- Suggested actions (search; home; recently visited)
- Links to popular content
- Search box

**2. 403 — Forbidden / unauthorized**

Triggers:
- User logged in but not authorized for this resource
- Permission revoked
- Tenant boundary

UX needed:
- Clear "you don't have access" message
- Actions: contact admin; switch account; go home
- Don't reveal the resource exists if shouldn't (just say "not found")

**3. 401 — Unauthenticated / session expired**

Triggers:
- User logged out (session expired)
- Token invalid / revoked

UX needed:
- Redirect to login (preserve return URL)
- Friendly "your session expired" message at login
- Don't dump them on home page

**4. 500 / 502 / 503 — Server errors**

Triggers:
- App bug
- Upstream down
- Database connection issue
- Capacity exhausted

UX needed:
- Acknowledge our fault (not user's)
- Retry button
- Contact support link
- Status page link
- Log error with context

**5. Network errors (frontend-side)**

Triggers:
- User's connection dropped
- Request timeout
- CORS issue

UX needed:
- Distinguish from server errors
- "Check your connection" message
- Retry option
- Save user's work locally if possible

**6. Application-level errors**

Triggers:
- Validation failed
- Quota exceeded
- Resource state changed (item already deleted by other user)
- Rate limited

UX needed:
- Specific message (per error)
- Actionable next step
- Often inline (not a full error page)

For my product:
- Per-category audit
- Top error per category

Output:
1. Per-category UX
2. Implementation priority
3. Tests

The principle: 404 vs 403 vs 500 are NOT the same. Each needs a different message + recovery path. Generic "Error" defeats the purpose.

Custom 404 Page

Help me build a 404 page.

The components:

**Headline**: "We couldn't find that page."

Don't: "Error 404." (technical jargon)
Don't: "Oops!" (twee; doesn't help)

**Explanation (1-2 sentences)**:
"The link you followed may be broken, or the page may have been removed."

**Actions** (the most important part):
- Search box
- "Go to home" button
- 3-5 popular destinations: "Try [Dashboard], [Pricing], [Docs]"
- "Report broken link" link → contact / form

**Visual**:
- Brand-consistent
- Friendly illustration (optional, not required)
- Calm tone (not "DANGER")

**Implementation (Next.js 16 App Router)**:

```typescript
// app/not-found.tsx
export default function NotFound() {
  return (
    <div className="min-h-screen flex items-center justify-center p-8">
      <div className="max-w-md text-center">
        <h1 className="text-4xl font-bold mb-4">Page not found</h1>
        <p className="text-gray-600 mb-8">
          The link you followed may be broken, or the page may have been removed.
        </p>
        
        <div className="space-y-4">
          <SearchBox />
          
          <div className="flex gap-2 justify-center">
            <Link href="/" className="btn-primary">Go home</Link>
            <Link href="/dashboard" className="btn-secondary">Dashboard</Link>
          </div>
          
          <p className="text-sm text-gray-500">
            <Link href="/contact" className="underline">Report broken link</Link>
          </p>
        </div>
      </div>
    </div>
  );
}

Triggering 404:

// app/posts/[slug]/page.tsx
import { notFound } from 'next/navigation';

export default async function Page({ params }: { params: { slug: string } }) {
  const post = await db.post.findUnique({ where: { slug: params.slug } });
  if (!post) notFound(); // Renders not-found.tsx
  return <PostView post={post} />;
}

HTTP status: must return 404 (not 200). SEO + analytics depend on it.

In Next.js, notFound() returns 404 automatically. In other frameworks, set explicitly:

res.status(404);

For my framework: [adapt]

Output:

404 page
Trigger logic
Search / suggestion components


The win that compounds: **a useful 404 with search keeps users on-site**. Default 404 = bounce. Useful 404 = re-engage. Saved sessions matter when summed across thousands of mistyped URLs.

## Custom 500 / 503 Page

Help me build a 500 page.

The components:

Headline: "Something went wrong on our end."

Note: "our end." Not "your end." Not "weird browser."

Explanation: "We've been notified and are working on it."

(Only true if you actually log + page on errors. If you don't — fix that first.)

Actions:

Retry button (refresh page)
"Go home" button
Contact support: support@yourdomain.com
Status page link: status.yourdomain.com (if applicable)
Error ID for support reference: "Error: a4b2c8d1"

The error ID (critical detail):

When server logs the error, generate a UUID; include in response; show on the error page.

// Server-side
const errorId = crypto.randomUUID();
logger.error('Internal error', { errorId, error: err.stack, requestPath: req.url });

return Response.json({ error: 'Internal error', errorId }, { status: 500 });

// Client-side
<p>Error ID: {errorId}</p>
<p>Include this when contacting support.</p>

User can copy-paste error ID; you find the log instantly. Beats "what time was it; what were you doing"-detective work.

Implementation (Next.js):

// app/error.tsx — handles errors in a page
'use client';

export default function Error({
  error,
  reset,
}: {
  error: Error & { digest?: string };
  reset: () => void;
}) {
  return (
    <div className="min-h-screen flex items-center justify-center p-8">
      <div className="max-w-md text-center">
        <h1 className="text-4xl font-bold mb-4">Something went wrong</h1>
        <p className="text-gray-600 mb-2">
          We've been notified and are working on it.
        </p>
        {error.digest && (
          <p className="text-sm text-gray-500 mb-6">Error ID: {error.digest}</p>
        )}
        
        <div className="flex gap-2 justify-center mb-4">
          <button onClick={reset} className="btn-primary">Try again</button>
          <Link href="/" className="btn-secondary">Go home</Link>
        </div>
        
        <p className="text-sm">
          Still broken? <a href="mailto:support@yourdomain.com">Contact support</a>
          {' '}or{' '}
          <a href="https://status.yourdomain.com">check status</a>
        </p>
      </div>
    </div>
  );
}

// app/global-error.tsx — fallback for root errors
'use client';

export default function GlobalError({ error, reset }) {
  return (
    <html>
      <body>
        <Error error={error} reset={reset} />
      </body>
    </html>
  );
}

HTTP status: 500 (or 502 / 503 as appropriate).

Production vs dev:

Development: show stack trace (helps debug) Production: show user-friendly + error ID

Frameworks usually handle this. Don't expose stack to production users.

For my app: [framework]

Output:

500 page
Error ID flow
Logging integration


The single most-useful detail: **error ID shown to user**. Costs you 1 line of code; saves hours of support detective-work. Always include.

## Error Boundaries (React-Specific)

Help me set up error boundaries.

React error boundaries catch render-time errors in components.

Without: one broken component crashes entire page. With: contained to that component; rest of UI works.

// ErrorBoundary.tsx
'use client';
import { Component, ReactNode } from 'react';

interface Props { children: ReactNode; fallback?: ReactNode }
interface State { hasError: boolean; error?: Error }

export class ErrorBoundary extends Component<Props, State> {
  state: State = { hasError: false };
  
  static getDerivedStateFromError(error: Error) {
    return { hasError: true, error };
  }
  
  componentDidCatch(error: Error, info: { componentStack: string }) {
    console.error('Component error', error, info);
    // Send to Sentry / etc.
  }
  
  render() {
    if (this.state.hasError) {
      return this.props.fallback ?? (
        <div className="p-4 border border-red-300 bg-red-50 rounded">
          <p>This section failed to load.</p>
          <button onClick={() => this.setState({ hasError: false })}>
            Retry
          </button>
        </div>
      );
    }
    return this.props.children;
  }
}

Where to wrap:

<Layout>
  <ErrorBoundary>
    <Sidebar /> {/* If broken, sidebar shows fallback; rest works */}
  </ErrorBoundary>
  
  <ErrorBoundary>
    <MainContent />
  </ErrorBoundary>
  
  <ErrorBoundary>
    <Footer />
  </ErrorBoundary>
</Layout>

Each major surface gets its own boundary. Granularity: not too coarse (entire app crashes); not too fine (one button breaks; entire layout shows error).

Next.js 16 App Router:

error.tsx in any route segment IS the error boundary for that segment. Wrap nested routes in their own error.tsx for granular handling.

Suspense for loading + ErrorBoundary for errors:

<ErrorBoundary fallback={<ErrorFallback />}>
  <Suspense fallback={<Spinner />}>
    <AsyncComponent />
  </Suspense>
</ErrorBoundary>

For my app: [framework]

Output:

ErrorBoundary component
Where to wrap
Per-section fallbacks


The discipline: **wrap major surfaces in error boundaries**. One bug in the comments component shouldn't kill the article. Granular boundaries = graceful degradation.

## Network Errors and Loading States

Help me handle network failures.

The progression:

Request fires → loading state shown
Request succeeds → data shown
Request fails → error state shown
Request times out → timeout state shown

The 4-state UI:

type FetchState<T> = 
  | { status: 'idle' }
  | { status: 'loading' }
  | { status: 'success'; data: T }
  | { status: 'error'; error: Error };

function useData<T>(url: string): FetchState<T> {
  const [state, setState] = useState<FetchState<T>>({ status: 'idle' });
  
  useEffect(() => {
    setState({ status: 'loading' });
    
    const controller = new AbortController();
    const timeout = setTimeout(() => controller.abort(), 30_000);
    
    fetch(url, { signal: controller.signal })
      .then(r => {
        if (!r.ok) throw new Error(`HTTP ${r.status}`);
        return r.json();
      })
      .then(data => setState({ status: 'success', data }))
      .catch(err => setState({ status: 'error', error: err }))
      .finally(() => clearTimeout(timeout));
    
    return () => { clearTimeout(timeout); controller.abort(); };
  }, [url]);
  
  return state;
}

// Usage
function Component() {
  const state = useData<User[]>('/api/users');
  
  if (state.status === 'loading') return <Spinner />;
  if (state.status === 'error') return (
    <div>
      <p>Failed to load.</p>
      {state.error.name === 'AbortError' 
        ? <p>Request timed out. Check connection.</p>
        : <p>Something went wrong. <button onClick={retry}>Retry</button></p>
      }
    </div>
  );
  return <UserList users={state.data} />;
}

Modern: TanStack Query / SWR:

These libraries handle the 4 states + retry + caching:

import { useQuery } from '@tanstack/react-query';

const { data, isLoading, error, refetch } = useQuery({
  queryKey: ['users'],
  queryFn: () => fetch('/api/users').then(r => r.json()),
  retry: 3,
  staleTime: 60_000,
});

if (isLoading) return <Spinner />;
if (error) return <ErrorState error={error} onRetry={refetch} />;
return <UserList users={data} />;

Distinguish error types:

Network failure (connection): "Check your internet"
Timeout: "Took too long; try again"
4xx: user error → specific message
5xx: server error → contact support / retry

Don't infinite-spin:

If something takes >30s, show timeout state. User should never wonder if it's working.

For my data fetching: [library]

Output:

State machine for fetch
Library pick
Error UI per type


The reliability win: **TanStack Query handles 80% of network-error UX automatically**. Retries, caching, deduplication, stale-while-revalidate. Skip writing custom hooks.

## Don't Lose User Work

Help me preserve user work on errors.

The pain: user types 500-character message; hits send; error; loses message.

Anti-patterns:

Form clears on error
Editor loses unsaved content
Cart empties on checkout error
Settings reset on save error

The 4 strategies:

1. Optimistic UI: Show success immediately; revert if error. User sees "saved!" → if it fails, "saved... actually didn't" + retry.

async function saveDraft(text: string) {
  setOptimisticText(text); // Show as saved
  try {
    await api.save(text);
  } catch (e) {
    setOptimisticText(null); // Revert
    setError(e);
  }
}

2. Local persistence (localStorage / IndexedDB):

useEffect(() => {
  const draft = formData.message;
  if (draft) {
    localStorage.setItem('compose-draft', draft);
  }
}, [formData.message]);

useEffect(() => {
  // On mount, restore
  const saved = localStorage.getItem('compose-draft');
  if (saved) setFormData(prev => ({ ...prev, message: saved }));
}, []);

// On successful send: clear
async function send() {
  await api.send(formData);
  localStorage.removeItem('compose-draft');
}

User refreshes / errors / closes tab → draft persists; restored.

3. Debounced server save:

For longer-form (editor):

const debouncedSave = useDebouncedCallback(async (content: string) => {
  await api.saveDraft({ content });
}, 2000);

useEffect(() => {
  if (content) debouncedSave(content);
}, [content]);

Auto-save on idle; user never loses more than 2 seconds.

4. Error preserves form state:

When mutation fails, don't clear form:

async function submit() {
  setStatus('submitting');
  try {
    await api.submit(formData);
    setStatus('success');
    setFormData(initialState); // Clear ONLY on success
  } catch (e) {
    setStatus('error');
    setError(e);
    // formData preserved for retry
  }
}

For my forms / editors:

High-pain loss scenarios
Pick strategy per

Output:

Per-surface strategy
Implementation
Test cases


The win that earns user trust: **never lose work**. Users have been burned 100x by other apps. When yours preserves, they notice. Subtle but huge for retention.

## Logging Errors with Context

Help me log errors.

The principles:

1. Capture context:

Bad: "Error: undefined" Good: "Error rendering Post page; userId=abc; postId=xyz; trace=..."

logger.error('Post page render failed', {
  errorMessage: error.message,
  errorStack: error.stack,
  userId: getUserId(req),
  postId: params.slug,
  url: req.url,
  userAgent: req.headers.get('user-agent'),
  errorId: crypto.randomUUID(),
});

2. Use Sentry / Bugsnag / Rollbar:

These tools handle:

Stack-trace deobfuscation (sourcemaps)
Aggregation (group similar errors)
Alerts (Slack on new error types)
Release tracking (which version introduced)
User context

// Sentry setup (Next.js)
import * as Sentry from '@sentry/nextjs';

Sentry.init({
  dsn: env.SENTRY_DSN,
  tracesSampleRate: 0.1,
  // ...
});

// In error boundary / handler
Sentry.captureException(error, {
  user: { id: userId },
  contexts: { post: { id: postId } },
});

3. Don't log PII unintentionally:

Whitelist what to log; don't log entire request bodies.

Bad: logger.error('Error', { request: req.body }) (could include passwords / tokens) Good: logger.error('Error', { userId: req.userId, action: 'create_post' })

4. Sample / batch high-volume errors:

If 1 error = 10K logs / minute, you're paying. Sample or aggregate.

5. Distinguish levels:

ERROR: unexpected; investigate
WARN: expected-but-bad (rate limit hit; user error)
INFO: normal operations
DEBUG: development only

Don't ERROR-level expected user-mistakes (404 from typo).

For my logging: [tool]

Output:

Logging context strategy
Tool integration
PII discipline


The single most-useful tool to install: **Sentry / Bugsnag / Rollbar** with sourcemaps. Every JS error in production has full stack trace in your dashboard within seconds. $26/mo for indie tier; pays for itself the first day.

## Don't Leak Information

Help me think about information leakage.

What NOT to expose to end users:

Internal file paths ("/var/www/app/posts/..." in stack trace)
Environment variables ("DATABASE_URL = postgres://...")
Database schema ("relation 'users_temp' does not exist")
Internal service names ("auth-service-prod")
Raw SQL ("SELECT * FROM users WHERE...")
Stack traces in production
API keys / tokens
User IDs (other users' IDs that hint at scale)

Safe to show:

Generic message ("Something went wrong")
Error ID (for support reference)
Action-relevant info ("Email format invalid"; "Quota exceeded")
Status page link
Support contact

Frameworks default OK in production:

Next.js / Rails / Django all hide stack traces in prod by default. Don't override.

Watch for:

Custom error responses that include too much
Logs accidentally rendered in HTML
Debug pages left enabled (/debug, /admin)
Open dev-tools showing console errors with internal info
Network tab showing internal-API URLs

Audit checklist:

Production stack traces are not shown to users
API errors return generic messages + error ID
Internal-only routes have auth gates
No console.log(env) in production code
Sourcemaps aren't publicly accessible (most CDN tools hide automatically)

For my app: [audit]

Output:

Leakage audit
Production-mode tests
Gates


The single embarrassing mistake: **shipping with `NODE_ENV !== 'production'`**. Stack traces leak; full env visible; debug routes open. Always verify production builds in production-like staging before going live.

## Measuring Error UX

Help me measure errors.

The KPIs:

Error rates:

404 rate (% of pageviews)
500 / 503 rate
Client-side error rate (Sentry / Bugsnag)
Failed-form-submission rate
Network-error rate

Targets (typical):

500/503: <0.1% of pageviews
404: <2% (some always); investigate spikes
Client errors: <1% of sessions

Tools:

Sentry / Bugsnag / Rollbar — error aggregation
PostHog / FullStory — session replay (see what user did)
Datadog / New Relic / Honeycomb — server-side error tracking
Vercel Analytics / Cloudflare — edge-level error rates

Alerts:

New error type → alert
Spike in known error → alert
Sustained 5xx > threshold → page on-call

Weekly review:

Top 10 errors by frequency. For each:

Is it a real bug? Fix.
Is it expected (4xx)? OK.
Is it user-facing? Improve UX or fix.

For my reporting: [tools]

Output:

KPI dashboard
Alert rules
Cadence


The discipline: **review top errors weekly**. Errors compound — small ignored one becomes a wave. 30 minutes/week on top 10 errors keeps the system clean.

## Common Error-Handling Mistakes

Help me avoid mistakes.

The 10 mistakes:

1. Default browser 404 Looks broken; bounces users.

2. Stack traces in production Information leak; unprofessional.

3. Generic "error" with no context Users can't tell what to do.

4. Lost user work on form errors Retypes drive retention down.

5. No error ID for support reference Detective work to reproduce.

6. No error boundaries (React) One broken component crashes whole page.

7. Infinite loading on network failure Users wait forever; never told.

8. Logging without context Logs say "Error: undefined"; useless.

9. PII in logs Compliance + security risk.

10. No error monitoring Bugs in production no one knows about.

For my app: [risks]

Output:

Top 3 risks
Mitigations
Audit plan


The single most-painful mistake: **shipping without Sentry / equivalent**. Bugs happen; users hit them; bug reports are vague; you can't reproduce. Install error monitoring Day 1; it's the cheapest insurance.

## What Done Looks Like

A working error-handling strategy delivers:
- Custom 404 page with search + suggestions
- Custom 500 page with error ID + retry + support link
- Error boundaries around major React surfaces
- 4-state fetch (idle / loading / success / error) UI
- TanStack Query / SWR for data fetching
- Local persistence for in-progress user work
- Sentry / Bugsnag / Rollbar with sourcemaps
- Error logs with context (user, action, request)
- No PII in logs; no stack traces to users
- Weekly review of top errors
- Status page link in error pages
- 500 / 503 rate < 0.1%
- Distinct messages for 401 / 403 / 404 / 500

The proof you got it right: a customer who hits an obscure error gets a clear message, an error ID, and a path forward. They contact support; you find the log instantly via error ID; fix in days. Trust preserved.

## See Also

- [Logging Strategy & Structured Logs](logging-strategy-structured-logs-chat.md) — log discipline
- [HTTP Retry & Backoff](http-retry-backoff-chat.md) — handles upstream errors
- [Incident Response](incident-response-chat.md) — for ongoing 5xx rates
- [Schema Validation with Zod](schema-validation-zod-chat.md) — 4xx errors come from here
- [Form Validation UX](form-validation-ux-chat.md) — form-side error handling
- [Customer Support Chat](customer-support-chat.md) — escalation from error pages
- [Service Level Agreements](service-level-agreements-chat.md) — uptime impact
- [Status Page (vibeweek)](status-page-chat.md) — link from error pages
- [Audit Logs](audit-logs-chat.md) — admin-facing error visibility
- [Performance Optimization](performance-optimization-chat.md) — slow ≈ error
- [VibeReference: Error Monitoring Providers](https://vibereference.dev/devops-and-tools/error-monitoring-providers) — Sentry / Bugsnag / Rollbar landscape
- [VibeReference: Status Page Providers](https://vibereference.dev/cloud-and-hosting/status-page-providers) — link from error pages
- [VibeReference: Observability Providers](https://vibereference.dev/devops-and-tools/observability-providers) — broader monitoring