VibeWeek
Home/Grow/Inbound Webhooks: Receive Events From Stripe, GitHub, and Your Customers Without Losing Data

Inbound Webhooks: Receive Events From Stripe, GitHub, and Your Customers Without Losing Data

⬅️ Growth Overview

Inbound Webhook Strategy for Your New SaaS

Goal: Build an inbound-webhook receiving infrastructure that handles every event from third parties (Stripe, GitHub, customers' systems) reliably — no lost events, no duplicate processing, no missed signature verification, no 6-hour outages from a botched deploy. Avoid the failure modes where founders process webhooks inline (exposing the app to third-party flakiness) or skip signature verification (creating a security hole the size of the integration).

Process: Follow this chat pattern with your AI coding tool such as Claude or v0.app. Pay attention to the notes in [brackets] and replace the bracketed text with your own content.

Timeframe: Webhook receiver pattern shipped in 1-2 days. Idempotency + retry + monitoring in week 1. Replay tooling for incident recovery in week 2. Quarterly review of webhook health baked in from launch onward.


Why Most Founder Webhook Handlers Are Broken

Three failure modes hit founders the same way:

  • Inline processing. Founder writes a webhook handler that does all the work synchronously inside the HTTP request. Stripe sends a payment_intent.succeeded event; your handler updates the database, sends an email, fires analytics events, writes audit logs. The handler takes 8 seconds; Stripe times out at 5 seconds and retries; your code processes the same event twice. Customer gets two welcome emails. By the time you notice, you have 3,000 duplicate records.
  • No signature verification. The webhook receiver accepts any POST request to /webhooks/stripe. An attacker discovers the endpoint and crafts a fake payment_intent.succeeded for a $0 invoice with their email. Your system creates a paid account they never paid for. Free customer; angry support team.
  • No replay capability. A bug ships; webhook handler crashes for 12 hours; the third-party retries some events, drops others. When you fix the bug, there's no way to recover the missing events. Customers' subscriptions don't activate; orders don't fulfill; the founder spends 3 days reconciling state by hand.

The version that works is structured: receive the webhook in a thin HTTP handler that ONLY validates the signature and queues the event; process events asynchronously via a queue with idempotency; track every event with a unique ID; provide replay tooling for incident recovery; monitor webhook health continuously.

This guide assumes you have already done Public API (the outbound side; this guide is the inbound complement), have shipped Audit Logs (webhook events feed audit), and have considered Background Jobs Providers (the queue layer this guide depends on).


1. Understand the Common Inbound-Webhook Sources

Most indie SaaS receive webhooks from a small set of providers. Plan for them all upfront.

Help me catalog the inbound-webhook sources for [your product] at [your-domain.com].

Common sources for indie B2B SaaS in 2026:

**1. Payment provider** (per [Payment Providers](https://www.vibereference.com/auth-and-payments/payment-providers))
- Stripe / Polar / Paddle / etc.
- Events: payment_intent.succeeded, customer.subscription.updated, charge.dispute.created, invoice.paid, etc.
- Frequency: high (every customer transaction)
- Critical: subscription state must stay in sync with payment provider

**2. Auth provider**
- Clerk / Supabase / Better Auth / WorkOS
- Events: user.created, user.deleted, session.revoked, organization.member.added (for SSO)
- Frequency: moderate

**3. Email / messaging providers**
- Resend / Postmark / SendGrid
- Events: email.delivered, email.bounced, email.spam_complaint, email.unsubscribed
- Critical for [Email Deliverability](email-deliverability-chat.md) tracking

**4. GitHub / version control**
- For products that integrate with code repos
- Events: push, pull_request.opened, repository.created, etc.

**5. Customer integrations**
- If your product offers webhook subscriptions for customer integrations (per [Public API](public-api-chat.md))
- Customer's system sends events TO yours
- Reverse of the outbound webhook system

**6. Third-party SaaS your product integrates with**
- Slack, Microsoft, Salesforce, HubSpot, Linear, etc.
- Events vary by integration

**7. Internal services**
- Your own services calling each other
- May or may not use HTTP webhooks; could be queues directly

For each source, document:
- Provider name
- Endpoint URL pattern (e.g., /webhooks/stripe)
- Auth mechanism (signature header, basic auth, OAuth, etc.)
- Critical events your product depends on
- Failure mode if the webhook is missed (revenue impact, data integrity, customer impact)
- Source-specific quirks (Stripe's retry policy, GitHub's HMAC scheme, etc.)

Output the full catalog as a table, then prioritize by criticality.

The single most undervalued upfront work: cataloging which webhooks you actually depend on. Most teams discover this incrementally — usually after a missed event causes a customer incident.


2. Always Verify Signatures

Skipping signature verification creates a security hole. Never skip.

Help me implement signature verification for each webhook source.

The pattern:

**Stripe**:
- Header: `Stripe-Signature`
- Format: `t=<timestamp>,v1=<hmac>`
- Verification: HMAC-SHA256 of `<timestamp>.<body>` using your webhook secret
- Reject if: timestamp is older than 5 minutes (replay protection); HMAC mismatch
- Code:
```ts
import { Webhook } from 'stripe' // or use stripe.webhooks.constructEvent()
const event = stripe.webhooks.constructEvent(rawBody, signature, webhookSecret)
// Throws if invalid

GitHub:

  • Header: X-Hub-Signature-256
  • Format: sha256=<hmac>
  • Verification: HMAC-SHA256 of body using your webhook secret
  • Reject on mismatch

Resend / Postmark:

  • Various; check provider docs
  • Each has specific signature scheme

Custom inbound from your customers:

  • You define the signature scheme
  • Recommend HMAC-SHA256 with timestamp (matches Stripe pattern); document in your docs

Critical implementation rules:

  1. Use the raw body, not the parsed JSON. Some frameworks auto-parse; you need the raw bytes for HMAC.
  2. Timing-safe comparison. Use crypto.timingSafeEqual() not ===. Prevents timing attacks.
  3. Reject before processing. The signature check is the FIRST thing your handler does. If it fails, return 401 and don't queue anything.
  4. Log signature failures. Repeated failures from same IP signal an attack; alert.
  5. Rotate secrets periodically. Most providers support multiple active secrets during rotation.

Don't:

  • Use the same webhook secret across environments (dev / staging / prod each have their own)
  • Store the secret in client-side code
  • Skip verification "for testing" in production
  • Trust HTTP basic auth alone; add a signature

Output:

  1. The signature-verification middleware for [your framework]
  2. The per-provider verification code
  3. The rotation procedure for compromised secrets
  4. The alerting rule for repeated signature failures

Three principles:

- **Signature verification is non-negotiable.** Every webhook handler that processes anything meaningful must verify. Skipping creates a security hole proportional to what the handler does.
- **Use raw body bytes, not parsed JSON.** Body parsing reorders keys; HMACs change; verification fails. Always preserve raw body for signature check.
- **Timing-safe comparison.** Prevents timing attacks on signature comparison.

---

## 3. Receive Thinly; Process Asynchronously

The most consequential pattern. Decouples HTTP receipt from business logic.

Help me design the receive-thin / process-async pattern.

The flow:

Phase 1: Receive (HTTP handler, <500ms target)

  • Verify signature (per step 2)
  • Parse minimal headers (event type, event ID)
  • Insert event into a webhook_events table:
    CREATE TABLE webhook_events (
      id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
      source TEXT NOT NULL,                  -- 'stripe', 'github', etc.
      external_event_id TEXT NOT NULL,       -- the source's ID for the event
      event_type TEXT NOT NULL,
      raw_body JSONB NOT NULL,               -- the full body, for replay
      received_at TIMESTAMP NOT NULL DEFAULT NOW(),
      processed_at TIMESTAMP,                -- NULL if not yet processed
      processing_status TEXT NOT NULL DEFAULT 'pending', -- pending / processing / processed / failed
      failure_count INT NOT NULL DEFAULT 0,
      last_error TEXT,
      UNIQUE(source, external_event_id)       -- idempotency
    );
    
  • Enqueue processing job (per Background Jobs Providers)
  • Return 200 OK to the source immediately

Phase 2: Process (async job, can take longer)

  • Job worker picks up the event from the queue
  • Marks event as 'processing'
  • Executes business logic (update DB, send email, fire downstream events, etc.)
  • Marks event as 'processed' on success
  • On failure: increment failure_count, mark 'failed' if max retries exceeded, schedule retry otherwise

Why this pattern matters:

  • Source SLA: Stripe expects 200 within 5-10 seconds. Inline processing risks timing out during slow operations (DB locks, email delivery, downstream API calls).
  • Idempotency: the UNIQUE constraint on (source, external_event_id) means duplicate webhooks are detected at insert time; Phase 2 either skips or merges.
  • Replay: every event is in the table; recovering from bugs means re-running Phase 2 against processing_status = 'failed' events.
  • Observability: dashboards show webhook receive rate, processing rate, failure rate over time.

Critical:

  • Never run the business logic in the HTTP handler
  • Never skip the database persistence step
  • Never trust that the job queue is "fast enough" — Phase 1 must be self-sufficient if Phase 2 is delayed

Output:

  1. The webhook_events schema migration
  2. The Phase 1 (thin receive) handler code
  3. The Phase 2 (job processor) code
  4. The retry / dead-letter / max-failure logic
  5. The dashboard queries for webhook health

The single most important insight: **the HTTP handler's only job is to record the event.** Everything else happens asynchronously. This decouples your reliability from the source's expectation.

---

## 4. Idempotency Is Mandatory

Webhooks retry. Your handler must process duplicates safely.

Design the idempotency strategy.

Why duplicates happen:

  • Source's network blips; they retry the webhook
  • Your handler returns 200 but the source didn't receive the response
  • Manual replays you trigger for incident recovery
  • Source's own duplicate-emission bugs (rare but real)

The pattern:

Layer 1: HTTP-level deduplication (Phase 1)

  • The UNIQUE constraint on (source, external_event_id) blocks duplicate inserts
  • On conflict: return 200 OK without queueing (already processed or processing)
  • This is the cheap defense

Layer 2: Logic-level idempotency (Phase 2)

  • Even with Layer 1, a duplicate enqueue could happen during partial failures
  • The processing logic must be safe to run multiple times against the same event
  • Patterns:
    • Use idempotency keys for downstream API calls
    • Check current state before mutation: "Has subscription already been activated? If yes, skip"
    • Use database transactions with conditional updates: UPDATE subscriptions SET status='active' WHERE id=? AND status='inactive'

Layer 3: Side-effect deduplication (Phase 2)

  • Sending email twice is a different problem than updating DB twice
  • Pattern: persist a "side-effect log" — INSERT INTO sent_emails (event_id, type, sent_at) — then check before sending
  • Idempotency keys on third-party calls (Stripe, Resend, etc.)

The hard cases:

  • Out-of-order events: webhook for subscription.cancelled arrives BEFORE subscription.activated

    • Pattern: process events with a lookup of current state; merge gracefully
    • Example: cancellation arrives but no subscription exists yet → wait for the activation event; or create a tombstone
  • Stale events: webhook for subscription.activated arrives 12 hours late after a cancellation

    • Pattern: check the source's current state via API before applying old-event logic
    • For Stripe: stripe.subscriptions.retrieve(id) returns current state; trust that over the late event
  • Conflicting events: simultaneous events that contradict

    • Pattern: process serially per-account; the later-received-and-processed wins (which may not match the chronologically-newer event; mitigated by stale-event check)

Output:

  1. The idempotency-key strategy for downstream API calls
  2. The conditional-update pattern for state mutations
  3. The side-effect log schema (for emails, notifications, etc.)
  4. The out-of-order event handling logic
  5. The stale-event detection logic

The biggest mistake: **assuming "we'll just process events in order."** Webhooks are NOT ordered. Across providers, across networks, across retries — order is not guaranteed. Design for arbitrary order.

---

## 5. Build Replay Tooling From Day 1

Bugs happen. The replay tool is what saves you when they do.

Design the webhook replay tooling.

Internal admin UI (per Internal Admin Tools):

A page at /admin/webhook-events that shows:

  • Recent events (filterable by source, event_type, status, date range)
  • Failed events (most useful view; default filter)
  • Per-event detail: full body, processing history, error messages, timing

For each event, actions:

  • Replay: re-run Phase 2 against this event (preserves the original received_at; updates processed_at)
  • Skip: mark as processed without running (for events that are no longer relevant, e.g., a cancellation event for a customer who's already moved on)
  • Inspect: show full body, headers, processing logs

Bulk replay:

  • Filter to a set of events (e.g., "all failed events between 14:00 and 16:00 yesterday")
  • Replay all in batch
  • Track progress; surface any new failures

Audit-logged:

  • Every replay action logged per Audit Logs
  • Who replayed, when, which events

API for replay (optional):

  • Endpoint: POST /admin/webhook-events/:id/replay
  • Authenticated; admin-only
  • For programmatic replay during incidents

Common replay scenarios:

  1. Bug deployed; events failed for 2 hours:

    • Filter to source X, event_type Y, processing_status='failed', between [bug time] and [fix time]
    • Bulk replay
    • Watch for new failures (might indicate the bug isn't fully fixed)
  2. Source claims they sent an event we never received:

    • Their dashboard might show a delivery failure on their side
    • Re-trigger from their UI if possible (Stripe, GitHub allow this)
    • If we received but processed wrong: replay against our stored event
  3. Recovering from a major outage:

    • Source's webhook deliveries were dropped for hours
    • Source's API exposes "list events since X timestamp" — query and re-deliver to ourselves
    • Or: ask source to redeliver via their support

Output:

  1. The admin UI mockup
  2. The replay function code
  3. The bulk-replay logic with progress tracking
  4. The audit-log integration
  5. The runbook: "we discovered a webhook bug" → "events recovered" workflow

The single most useful artifact: **the bulk-replay UI.** When a webhook bug ships, you need to replay 1,000 events from a 3-hour window without reprocessing the rest. A good admin UI saves hours; a bad one means manual SQL queries at 2am.

---

## 6. Monitor Webhook Health Continuously

Webhooks fail silently. Monitor.

Build the webhook-health dashboard.

Required metrics (per source):

1. Receive rate

  • Events received per hour, per day, by source + event_type
  • Sudden drops indicate the source isn't sending; sudden spikes indicate either source-side issues or DDoS

2. Processing rate

  • Phase 2 processing throughput
  • Should match receive rate (with some lag); persistent gap indicates queue backlog

3. Failure rate

  • % of events that fail processing
  • Healthy: <0.1% steady-state
  • Investigate if >1%

4. Latency

  • Time from received_at to processed_at
  • p50 / p95 / p99
  • Healthy: p95 under 30 seconds for most events

5. Signature verification failures

  • Rate of incoming requests that fail verification
  • Steady low rate (a few per day) is normal background noise
  • Spikes indicate either an attack or a misconfigured legitimate sender

6. Per-source health

  • Each source has its own metrics
  • Stripe / GitHub / customer webhooks tracked separately

Alerts:

  • Receive rate drops >50% from baseline → alert (source not sending)
  • Failure rate >5% over 5 minutes → alert (bug in handler)
  • Queue backlog > 100 events for >10 min → alert (worker capacity issue)
  • Signature failure rate > 10/min → alert (potential attack)

Quarterly review:

  • Top failure reasons; investigate root cause
  • Average processing latency trend
  • Per-source health changes
  • Audit handlers that haven't been updated in 6+ months (drift risk)

Output:

  1. The PostHog / observability dashboard config
  2. The alert rules with thresholds
  3. The quarterly review template
  4. The runbook: "webhook health alert fires" → triage workflow

The single most consequential alert: **receive-rate drop.** A handler that's silently failing to receive webhooks looks healthy (no errors, no failures) but is bleeding events. Track the receive-rate baseline and alert on deviations.

---

## 7. Handle Each Source's Quirks

Each provider has specific gotchas. Document.

Document the per-source quirks for the most common providers.

Stripe:

  • Retry policy: 3 attempts (1 minute, 5 minutes, 1 hour) before giving up
  • Will mark webhook endpoint as failed if too many failures; you'll see this in Stripe dashboard
  • Stripe events ARE eventually delivered if your endpoint comes back; up to 3 days of buffering
  • Use stripe.events.retrieve(id) to fetch the canonical version of any event
  • Test events come from evt_test_* prefix; live from evt_*

GitHub:

  • Retry policy: aggressive retry with exponential backoff
  • Delivery history visible in repo settings
  • Event types: pushes, PRs, issues, releases, etc.
  • HMAC-SHA256 signature; previous SHA-1 deprecated

Resend / Postmark:

  • Real-time delivery
  • Limited retry policy compared to Stripe
  • Track bounce / spam events specifically for Email Deliverability

Customer-sent inbound webhooks (you defined the spec):

  • Per Public API: customers send webhook events TO your product
  • You define signature scheme, retry expectations, dead-letter behavior
  • Document explicitly in your API docs

Slack / Discord / Microsoft:

  • Various; check provider docs
  • Most use HMAC-style signatures
  • Microsoft particularly: retry-after headers + specific status codes

Each provider's "verify your webhook" tool:

  • Stripe: webhook endpoint testing in Dashboard
  • GitHub: webhook delivery history with redeliver button
  • Use these during development; never test signature verification by skipping it

Test environment isolation:

  • Each environment (dev / staging / prod) has its own webhook secrets
  • Each environment has its own URL endpoints
  • Never have prod and dev share secrets

Output:

  1. The per-source quirks reference (one section per provider you receive from)
  2. The "what to read in vendor docs first" cheat sheet
  3. The development-testing process: how to test webhook handlers locally (ngrok / Cloudflare Tunnel / vendor's testing tool)
  4. The environment-isolation rules

The single most undervalued tool: **vendor testing dashboards.** Stripe's "Send test event" button + GitHub's "Recent Deliveries" + Resend's webhook test feature all exist for a reason. Use them during development; don't test signature verification by disabling it.

---

## 8. Use a Webhook-Specific Service (Optional)

For some teams, dedicated webhook infrastructure is worth the cost.

Decide whether to use a webhook-specific service.

Options:

Hookdeck — managed inbound webhook infrastructure

  • Receives webhooks; queues; replays
  • Reduces "thin receive" work to a vendor
  • Strong for high-volume teams or teams that don't want to operate the queue layer
  • Pricing: $0 → $30+/mo

Inngest — event-driven workflow platform per Background Jobs Providers

  • Can directly receive webhooks and trigger functions
  • Strong if you're already using Inngest for background jobs
  • Combines webhook receipt with the broader workflow system

Trigger.dev — similar to Inngest

RequestBin / Webhook.site — for development debugging only

  • Not for production
  • Lets you inspect webhook payloads during integration

When to use a service vs. DIY:

  • DIY (your own webhook_events table + queue): indie scale, you want full control, simple
  • Hookdeck / Inngest / Trigger.dev: when webhook volume is high, multiple sources, replay tooling is valuable, willing to pay for managed

For most indie SaaS in 2026: DIY is fine. Add a service when:

  • You receive webhooks from 5+ sources
  • You're processing 10K+ events per day
  • Replay / observability is becoming a real time sink
  • You're already using Inngest / Trigger.dev for jobs (extend it)

Output:

  1. The DIY-vs-service decision tree for my situation
  2. The recommended service if I'm choosing one
  3. The migration plan from DIY to managed (or staying DIY)

The realistic answer for most indie SaaS in 2026: **start DIY; revisit at $50K-$100K MRR or 10K events/day.** The DIY pattern is small; the managed services are worth it when scale demands them.

---

## What Done Looks Like

By end of week 2 of building inbound-webhook discipline:
1. **Catalog of webhook sources** with criticality
2. **Signature verification** wired for every source
3. **Thin-receive / async-process pattern** implemented
4. **webhook_events table** persisting every event
5. **Replay tooling** in admin UI
6. **Health monitoring** with alerts
7. **Per-source quirks** documented in runbook

Within 90 days:
- Zero "lost webhook event" incidents
- Replay tooling used at least once successfully (recovering from a bug or vendor issue)
- Signature failures consistently low (background noise only)
- Webhook latency stable

Within 12 months:
- Webhook handling is invisible because it works
- Engineering velocity uncompromised by webhook concerns
- Audit trail intact for every external event
- Service migration considered if scale warrants

---

## Common Pitfalls

- **Inline processing.** Risks timeouts; ship asynchronous.
- **Skipping signature verification.** Security hole.
- **No idempotency.** Duplicates corrupt state.
- **No replay tool.** Bug recovery becomes manual SQL.
- **Trusting event order.** Webhooks aren't ordered; design for arbitrary sequence.
- **Sharing secrets across environments.** Test events leak into prod or vice versa.
- **Body-parsing before signature verification.** Most signature schemes use raw bytes.
- **No monitoring.** Webhooks fail silently; you discover hours later.
- **Hard-coded webhook secrets.** Use env vars; rotate periodically.
- **No timeout on Phase 2 processing.** Stuck jobs hold queue capacity; configure max processing time.

---

## Where Inbound Webhooks Plug Into the Rest of the Stack

- [Public API](public-api-chat.md) — outbound complement; this guide covers inbound
- [Audit Logs](audit-logs-chat.md) — every received event audit-logged
- [Background Jobs Providers](https://www.vibereference.com/backend-and-data/background-jobs-providers) — the queue layer for async processing
- [Database Migrations](database-migrations-chat.md) — webhook_events schema needs migration
- [Multi-Tenant Data Isolation](multi-tenancy-chat.md) — events scoped per tenant
- [Internal Admin Tools](internal-admin-tools-chat.md) — replay UI lives in admin
- [Status Page](status-page-chat.md) — webhook outages communicated via status
- [Incident Response](incident-response-chat.md) — webhook bugs are a specific incident type
- [Refunds & Chargebacks](refunds-chargebacks-chat.md) — Stripe's `charge.dispute.created` flows here
- [Email Deliverability](email-deliverability-chat.md) — bounce/spam events flow here
- [Payment Providers](https://www.vibereference.com/auth-and-payments/payment-providers) — Stripe / Polar / Paddle send the most webhooks for most products
- [Customer Health Scoring](customer-health-scoring-chat.md) — usage signals from third parties feed scoring

---

## What's Next

Inbound webhook handling is one of those infrastructure topics that founders only think about after their first major outage. The team that builds the receive-thin / process-async pattern in week 2 of launch handles every third-party integration calmly; the team that ships inline-processing-with-no-idempotency spends quarter 2 cleaning up duplicate-state corruption.

Build the discipline now. The patterns are small; the failure modes are catastrophic. By year 2, the webhook layer is invisible because it works correctly.

---

[⬅️ Growth Overview](README.md)