Sandbox & Test Mode for SaaS APIs: Chat Prompts

⬅️ Back to 6. Grow

If your SaaS exposes APIs that customers integrate against, you need a test mode — a sandboxed parallel environment where developers can issue calls with fake data, simulated webhooks, and zero real-world consequences. Stripe's live/test toggle is the gold standard. Twilio has test credentials. Plaid has Sandbox + Development + Production. Without test mode, customers are forced to develop against production (creating real charges, real shipments, real emails), and your product feels broken for builders.

Building test mode is non-trivial. It's not just "a flag in the database." It's: separate API keys, separate data scopes, simulated external integrations (Stripe webhooks, email delivery, SMS sends, etc.), realistic test data that mimics production behavior, no cross-contamination between modes, observability for both, and a UX that makes the current mode unmistakable.

This is the chat-prompt playbook for shipping test mode that developers actually use, doesn't pollute production, and scales with your product complexity.

When You Need Test Mode

Use test mode when:

You have a public API that external developers integrate against
Your product takes real-world actions (charges money, sends emails, makes API calls to other services, ships products)
You want developers to onboard without fear of breaking things
You support webhook deliveries that customers need to test

Don't bother when:

Pure internal product (no external API)
Read-only API (no real-world side effects)
Product is pre-API; "test" doesn't yet have meaning

Architecture: Two Modes, Same Codebase

The most important architectural decision: same code, mode flag. NOT a separate environment / repo / deployment.

I want to add test mode to my SaaS API. Help me design the architecture.

Pattern (Stripe-style):
- Each customer account has TWO sets of API keys: live + test
- API keys are prefixed: `sk_live_...` and `sk_test_...`
- Same API endpoints; mode determined by which key is used
- Same database, but every record has `mode: 'live' | 'test'` column
- Queries are automatically scoped to the requesting key's mode
- Test-mode external integrations (Stripe webhooks, emails, etc.) are simulated; never reach the real world

Why one codebase + mode flag (not separate environments):
- Code paths stay identical → no "works in test, breaks in live" surprises
- Bug fixes apply to both modes
- New features ship to both at once
- Tests can run against either mode

Implement:
1. Add `mode` enum to all relevant tables (or to a higher-level scope like account)
2. Middleware that reads the API key, determines mode, sets a request-scoped `mode` variable
3. Database query helpers that auto-filter by mode (don't trust dev to remember)
4. External-integration shims that switch behavior by mode (e.g., test mode emails go to a queue + dashboard, not real recipients)

Stack: Next.js App Router + Drizzle + Postgres.

API Key Scheme

Build the API key system with live + test modes:

Schema:
```sql
api_keys:
  id, account_id, mode (live | test), prefix (e.g., 'sk_live_' or 'sk_test_'),
  hashed_secret (bcrypt or scrypt), label, created_by, created_at, last_used_at, revoked_at

api_key_scopes:
  api_key_id, scope (read / write / specific resource permissions)

UI:

Settings → Developers → API Keys
Two tabs / sections: Live Keys, Test Keys
Each section: list keys + "Create new key" button
Create flow: name the key, choose scopes, see secret ONCE (then it's hashed)
Revoke action; rotation action

Behavior:

API endpoint accepts Authorization: Bearer sk_live_xxx or sk_test_xxx
Middleware looks up the key, validates, sets request.mode = 'live' | 'test'
All subsequent code reads request.mode for branching

Stack: Next.js + Drizzle + Argon2 / bcrypt for hashing.


## Mode-Scoped Data

Implement mode-scoped data access so live and test data never mix:

Schema strategy:

Add mode column to every customer-data table (subscriptions, invoices, customers, etc.)
ALL queries from API endpoints automatically filter by request.mode
ALL inserts auto-stamp mode: request.mode

Implementation:

A helper getDb(mode) returns a Drizzle query builder with mode baked in via a where clause
OR: use Postgres Row-Level Security (RLS) with a session variable app.current_mode
OR: use a middleware that intercepts queries and adds the filter

Recommended: typed query helpers (no RLS magic; explicit). Show me the implementation.

Edge cases:

Cross-mode lookups (admin viewing both): require explicit override in code, never default
Migrations that touch both modes: be explicit
Counts / aggregations: always scope by mode

Stack: Next.js + Drizzle.


## External Integrations: Simulating in Test Mode

Test mode must simulate external side effects rather than executing them.

For each integration, decide test-mode behavior:

Integration	Live	Test
Send email	Real email sent	Email logged to test-mode dashboard; never delivered
Charge payment	Stripe live	Stripe test (real Stripe API but in test mode)
Send SMS	Twilio live	Twilio test or simulated; logged not delivered
Webhook delivery	Real HTTP POST to customer's endpoint	Real HTTP POST (so customer can test their handler)
Ship product	Real shipping API call	Logged; no real shipment
Send Slack notification	Real Slack	Logged or sent to test channel
Generate AI content	Real LLM call (cost is real)	Real LLM call OR cached response
Push mobile notification	Real APNs / FCM	Skipped or test-token-only

Key principle: webhooks SHOULD fire in test mode (developers need to test their webhook handlers). Other side effects (email, SMS, shipments) should NOT.

Implement:

An IntegrationClient interface with live + test implementations
Factory pattern: getClient(integrationName, mode) returns the right one
Test-mode dashboard surfaces what would have happened (sent emails, attempted charges, etc.)

Stack: Next.js + Drizzle + your integration providers.


## Test Data: Realistic Without Being Real

Test mode needs realistic data without real-world consequences.

Patterns:

Pre-seeded test data: when a developer creates a new account, populate test mode with sample customers, transactions, invoices, etc. — pre-built dataset
Realistic generators: when developer creates a "test customer" via API, accept any input (don't hard-validate) and let them create realistic but fake data
Special test values that trigger predictable behavior:
- Test card "4242 4242 4242 4242" → succeeds
- Test card "4000 0000 0000 0002" → declined
- Test email "test+webhook-fail@example.com" → simulates webhook delivery failure

Document these special values clearly in API docs.

Implement:

A seeded-data generator that runs on first test-mode use per account
Special test-input handling for predictable test cases

Stack: Next.js + Drizzle + Faker.js for data generation.


## UX: Making Mode Unmistakable

The biggest danger: developer thinks they're in test mode but they're actually in live, and a "test" charge is real money.

UX patterns:

Visible mode indicator at the top of every dashboard page when in test mode:
- Banner: "🧪 You're in TEST mode — actions don't affect production"
- Color: yellow / orange (not error red; not normal black)
Subtle indicator in live mode (no banner; default state)
Mode toggle in settings or top-right corner, prominent
Per-resource indicators: invoices / customers / subscriptions in test mode show a small "TEST" badge
Email + receipt copies in test mode include "TEST MODE - not a real receipt" header
Webhook payloads in test mode include livemode: false field (Stripe convention) so customer's webhook handler can branch

Build:

The mode-banner component
The mode toggle + persistence
Per-resource badges
Email template variants for test mode

Stack: Next.js + Tailwind + shadcn/ui.


## API Documentation for Test Mode

Document test mode in your API docs:

Sections:

Quick start in test mode: how to get test API keys; basic example
Test card numbers / test inputs: predictable test values
Webhook testing: how to set up a webhook URL; tools (ngrok / webhook.site / your inbox) to receive test webhooks; expected payloads
Switching to live: what to update; common gotchas
What's different in test mode: emails not delivered; SMS not sent; etc.
Test data limits: do test accounts have lower rate limits? Storage quotas?

Use a clear visual distinction in code samples: "use sk_test_... for these examples".

For high-volume APIs:

Provide a CLI tool for test-mode interaction (e.g., myproduct test-charge to simulate)
Provide a postman / insomnia collection scoped to test endpoints

Stack: Mintlify / GitBook / your docs platform.


## Webhook Testing Tools

Build webhook testing tools for developers:

Webhook playground: in dashboard, "Send test webhook" button — manually fire a webhook of any event type to the customer's configured URL with sample payload
Webhook delivery log: list all webhook attempts with status, response code, latency, retry attempts
Replay: re-send any past webhook (useful when customer's handler had a bug; they fixed; they want to replay missed events)
CLI listener: a Stripe-style CLI tool that subscribes to test-mode events and logs to local terminal — eliminates ngrok for some use cases

Implement:

Dashboard button for manual webhook send
Persistent delivery log
Replay action
Optional: CLI tool

Stack: Next.js + Drizzle + your webhook delivery infrastructure.


## Common Pitfalls

**Single API key with a "test mode flag in body".** Easy to accidentally hit production. Always use separate test + live keys with distinct prefixes.

**Test data leaking into live queries.** Forgot to scope a query by mode; live customers see test customers in their dashboard. Use typed helpers; default-deny.

**Test mode emails reaching real recipients.** Forgot to short-circuit the email send for test mode; sent test emails to real customers. Centralize the integration shim.

**Test mode charges reaching real Stripe.** Forgot to use Stripe test keys in your test-mode integration. Strict separation; never share Stripe keys between modes.

**Webhooks not firing in test mode.** Customer can't test their webhook handler; gets to production and discovers bugs. Webhooks SHOULD fire in test mode; fake their content but real their delivery.

**No visible mode indicator.** Developer in production thinks they're in test; charges customers real money. Always show mode prominently.

**Mode-switching by URL or query parameter.** Confusing; risky. Mode-switching by API key (immutable per request) is safer.

**No way to seed test data.** New developer in test mode sees an empty product; can't tell what's real. Pre-seed test accounts.

**Test mode rate limits same as production.** Developer iterating fast hits rate limits in test; bad DX. Higher rate limits for test mode.

**Different data models for test vs live.** Means code paths diverge; bugs in one don't surface in the other. Same model; different mode column.

**Test mode "free" without limits.** Some abusers create unlimited test accounts to misuse compute / storage. Rate limit + quota even in test.

**No way to clear test data.** Developer's test mode fills with junk over time; want to start fresh. "Reset test data" button per account.

**Test mode with different TLS / domain / region.** Should be same domain + endpoint; only key differs. Keeps integration code identical.

**Production code path that branches on mode for product behavior.** "If test mode, skip this validation" — creates divergence. Test mode = same product; only side effects differ.

**Switching modes loses state in the dashboard.** Developer toggles to live; their work in progress disappears. Persist UI state per mode where reasonable.

**Mode indicator only on certain pages.** Banner on dashboard but not on invoice detail page. Apply to ALL pages globally.

**No webhook replay.** Customer's handler has a bug; events missed; can't recover. Allow replay of any historical webhook.

**Test mode that costs real money internally.** Calling LLMs / other paid APIs in test mode adds up. Cache test responses or use cheaper models for test.

**Forgetting to namespace logs / observability.** Test traffic and live traffic blended in your logs / Datadog dashboards; signal lost. Tag every span / log with mode.

**Customer-side: integration code that works in test, breaks in live.** Often due to test mode being more permissive. Run integration tests against BOTH modes in CI.

## Customer-Facing Operations

Build the customer-facing UX for working with test mode:

Mode toggle: top-right of dashboard; one-click switch between live + test
API keys page: separate sections for live + test; create / rotate / revoke each
Test data tools:
- "Seed test data" button (re-populate with sample data)
- "Clear all test data" (with confirmation)
Webhook playground: manually send test webhooks
Delivery log: see all webhook attempts with retry / replay actions
Documentation links: contextual "How to test this" links throughout the dashboard

Implementation:

Persist active mode per browser tab via cookie (so multi-tab live + test workflows are possible)
All dashboard endpoints use the active mode for queries

Stack: Next.js + cookies + Drizzle.


## See Also

- [Public API](./public-api-chat.md) — the API this test mode runs against
- [Developer Portal & API Sandbox](./developer-portal-api-sandbox-chat.md) — broader developer experience
- [API Keys](./api-keys-chat.md) — key issuance, rotation, revocation
- [API Versioning](./api-versioning-chat.md)
- [API Pagination Patterns](./api-pagination-patterns-chat.md)
- [API HTTP Caching](./api-http-caching-chat.md)
- [Webhook Signature Verification](./webhook-signature-verification-chat.md)
- [Outbound Webhooks](./outbound-webhooks-chat.md)
- [Inbound Webhooks](./inbound-webhooks-chat.md)
- [Idempotency Patterns](./idempotency-patterns-chat.md)
- [Rate Limiting & Abuse](./rate-limiting-abuse-chat.md)
- [Quotas, Limits & Plan Enforcement](./quotas-limits-plan-enforcement-chat.md)
- [Logging Strategy / Structured Logs](./logging-strategy-structured-logs-chat.md)
- [Audit Logs](./audit-logs-chat.md)
- [Background Jobs & Queue Management](./background-jobs-queue-management-chat.md)
- [Cron / Scheduled Tasks](./cron-scheduled-tasks-chat.md)
- [Multi-Tenancy](./multi-tenancy-chat.md)
- [Roles & Permissions](./roles-permissions-chat.md)
- [Plan Upgrade, Downgrade & Mid-Cycle Billing Changes](./plan-upgrade-downgrade-billing-changes-chat.md)
- [Account Suspension & Fraud Holds](./account-suspension-fraud-holds-chat.md)
- [In-App Status Banners & System Notifications](./in-app-status-banners-system-notifications-chat.md)
- [Settings & Account Pages](./settings-account-pages-chat.md)
- [Microcopy & Product Copy Systems](./microcopy-product-copy-systems-chat.md)
- [Approval Workflows & Multi-Step Routing](./approval-workflows-multi-step-routing-chat.md)
- [Stripe (VibeReference)](https://viberef.dev/auth-and-payments/stripe.md) — example of best-in-class test mode
- [API Mocking & Mock Data Platforms (VibeReference)](https://viberef.dev/devops-and-tools/api-mocking-mock-data-platforms.md)