Sandbox & Test Mode for SaaS APIs: Chat Prompts
If your SaaS exposes APIs that customers integrate against, you need a test mode — a sandboxed parallel environment where developers can issue calls with fake data, simulated webhooks, and zero real-world consequences. Stripe's live/test toggle is the gold standard. Twilio has test credentials. Plaid has Sandbox + Development + Production. Without test mode, customers are forced to develop against production (creating real charges, real shipments, real emails), and your product feels broken for builders.
Building test mode is non-trivial. It's not just "a flag in the database." It's: separate API keys, separate data scopes, simulated external integrations (Stripe webhooks, email delivery, SMS sends, etc.), realistic test data that mimics production behavior, no cross-contamination between modes, observability for both, and a UX that makes the current mode unmistakable.
This is the chat-prompt playbook for shipping test mode that developers actually use, doesn't pollute production, and scales with your product complexity.
When You Need Test Mode
Use test mode when:
- You have a public API that external developers integrate against
- Your product takes real-world actions (charges money, sends emails, makes API calls to other services, ships products)
- You want developers to onboard without fear of breaking things
- You support webhook deliveries that customers need to test
Don't bother when:
- Pure internal product (no external API)
- Read-only API (no real-world side effects)
- Product is pre-API; "test" doesn't yet have meaning
Architecture: Two Modes, Same Codebase
The most important architectural decision: same code, mode flag. NOT a separate environment / repo / deployment.
I want to add test mode to my SaaS API. Help me design the architecture.
Pattern (Stripe-style):
- Each customer account has TWO sets of API keys: live + test
- API keys are prefixed: `sk_live_...` and `sk_test_...`
- Same API endpoints; mode determined by which key is used
- Same database, but every record has `mode: 'live' | 'test'` column
- Queries are automatically scoped to the requesting key's mode
- Test-mode external integrations (Stripe webhooks, emails, etc.) are simulated; never reach the real world
Why one codebase + mode flag (not separate environments):
- Code paths stay identical → no "works in test, breaks in live" surprises
- Bug fixes apply to both modes
- New features ship to both at once
- Tests can run against either mode
Implement:
1. Add `mode` enum to all relevant tables (or to a higher-level scope like account)
2. Middleware that reads the API key, determines mode, sets a request-scoped `mode` variable
3. Database query helpers that auto-filter by mode (don't trust dev to remember)
4. External-integration shims that switch behavior by mode (e.g., test mode emails go to a queue + dashboard, not real recipients)
Stack: Next.js App Router + Drizzle + Postgres.
API Key Scheme
Build the API key system with live + test modes:
Schema:
```sql
api_keys:
id, account_id, mode (live | test), prefix (e.g., 'sk_live_' or 'sk_test_'),
hashed_secret (bcrypt or scrypt), label, created_by, created_at, last_used_at, revoked_at
api_key_scopes:
api_key_id, scope (read / write / specific resource permissions)
UI:
- Settings → Developers → API Keys
- Two tabs / sections: Live Keys, Test Keys
- Each section: list keys + "Create new key" button
- Create flow: name the key, choose scopes, see secret ONCE (then it's hashed)
- Revoke action; rotation action
Behavior:
- API endpoint accepts
Authorization: Bearer sk_live_xxxorsk_test_xxx - Middleware looks up the key, validates, sets
request.mode = 'live' | 'test' - All subsequent code reads
request.modefor branching
Stack: Next.js + Drizzle + Argon2 / bcrypt for hashing.
## Mode-Scoped Data
Implement mode-scoped data access so live and test data never mix:
Schema strategy:
- Add
modecolumn to every customer-data table (subscriptions, invoices, customers, etc.) - ALL queries from API endpoints automatically filter by
request.mode - ALL inserts auto-stamp
mode: request.mode
Implementation:
- A helper
getDb(mode)returns a Drizzle query builder withmodebaked in via a where clause - OR: use Postgres Row-Level Security (RLS) with a session variable
app.current_mode - OR: use a middleware that intercepts queries and adds the filter
Recommended: typed query helpers (no RLS magic; explicit). Show me the implementation.
Edge cases:
- Cross-mode lookups (admin viewing both): require explicit override in code, never default
- Migrations that touch both modes: be explicit
- Counts / aggregations: always scope by mode
Stack: Next.js + Drizzle.
## External Integrations: Simulating in Test Mode
Test mode must simulate external side effects rather than executing them.
For each integration, decide test-mode behavior:
| Integration | Live | Test |
|---|---|---|
| Send email | Real email sent | Email logged to test-mode dashboard; never delivered |
| Charge payment | Stripe live | Stripe test (real Stripe API but in test mode) |
| Send SMS | Twilio live | Twilio test or simulated; logged not delivered |
| Webhook delivery | Real HTTP POST to customer's endpoint | Real HTTP POST (so customer can test their handler) |
| Ship product | Real shipping API call | Logged; no real shipment |
| Send Slack notification | Real Slack | Logged or sent to test channel |
| Generate AI content | Real LLM call (cost is real) | Real LLM call OR cached response |
| Push mobile notification | Real APNs / FCM | Skipped or test-token-only |
Key principle: webhooks SHOULD fire in test mode (developers need to test their webhook handlers). Other side effects (email, SMS, shipments) should NOT.
Implement:
- An
IntegrationClientinterface with live + test implementations - Factory pattern:
getClient(integrationName, mode)returns the right one - Test-mode dashboard surfaces what would have happened (sent emails, attempted charges, etc.)
Stack: Next.js + Drizzle + your integration providers.
## Test Data: Realistic Without Being Real
Test mode needs realistic data without real-world consequences.
Patterns:
- Pre-seeded test data: when a developer creates a new account, populate test mode with sample customers, transactions, invoices, etc. — pre-built dataset
- Realistic generators: when developer creates a "test customer" via API, accept any input (don't hard-validate) and let them create realistic but fake data
- Special test values that trigger predictable behavior:
- Test card "4242 4242 4242 4242" → succeeds
- Test card "4000 0000 0000 0002" → declined
- Test email "test+webhook-fail@example.com" → simulates webhook delivery failure
Document these special values clearly in API docs.
Implement:
- A seeded-data generator that runs on first test-mode use per account
- Special test-input handling for predictable test cases
Stack: Next.js + Drizzle + Faker.js for data generation.
## UX: Making Mode Unmistakable
The biggest danger: developer thinks they're in test mode but they're actually in live, and a "test" charge is real money.
UX patterns:
- Visible mode indicator at the top of every dashboard page when in test mode:
- Banner: "🧪 You're in TEST mode — actions don't affect production"
- Color: yellow / orange (not error red; not normal black)
- Subtle indicator in live mode (no banner; default state)
- Mode toggle in settings or top-right corner, prominent
- Per-resource indicators: invoices / customers / subscriptions in test mode show a small "TEST" badge
- Email + receipt copies in test mode include "TEST MODE - not a real receipt" header
- Webhook payloads in test mode include
livemode: falsefield (Stripe convention) so customer's webhook handler can branch
Build:
- The mode-banner component
- The mode toggle + persistence
- Per-resource badges
- Email template variants for test mode
Stack: Next.js + Tailwind + shadcn/ui.
## API Documentation for Test Mode
Document test mode in your API docs:
Sections:
- Quick start in test mode: how to get test API keys; basic example
- Test card numbers / test inputs: predictable test values
- Webhook testing: how to set up a webhook URL; tools (ngrok / webhook.site / your inbox) to receive test webhooks; expected payloads
- Switching to live: what to update; common gotchas
- What's different in test mode: emails not delivered; SMS not sent; etc.
- Test data limits: do test accounts have lower rate limits? Storage quotas?
Use a clear visual distinction in code samples: "use sk_test_... for these examples".
For high-volume APIs:
- Provide a CLI tool for test-mode interaction (e.g.,
myproduct test-chargeto simulate) - Provide a postman / insomnia collection scoped to test endpoints
Stack: Mintlify / GitBook / your docs platform.
## Webhook Testing Tools
Build webhook testing tools for developers:
- Webhook playground: in dashboard, "Send test webhook" button — manually fire a webhook of any event type to the customer's configured URL with sample payload
- Webhook delivery log: list all webhook attempts with status, response code, latency, retry attempts
- Replay: re-send any past webhook (useful when customer's handler had a bug; they fixed; they want to replay missed events)
- CLI listener: a Stripe-style CLI tool that subscribes to test-mode events and logs to local terminal — eliminates ngrok for some use cases
Implement:
- Dashboard button for manual webhook send
- Persistent delivery log
- Replay action
- Optional: CLI tool
Stack: Next.js + Drizzle + your webhook delivery infrastructure.
## Common Pitfalls
**Single API key with a "test mode flag in body".** Easy to accidentally hit production. Always use separate test + live keys with distinct prefixes.
**Test data leaking into live queries.** Forgot to scope a query by mode; live customers see test customers in their dashboard. Use typed helpers; default-deny.
**Test mode emails reaching real recipients.** Forgot to short-circuit the email send for test mode; sent test emails to real customers. Centralize the integration shim.
**Test mode charges reaching real Stripe.** Forgot to use Stripe test keys in your test-mode integration. Strict separation; never share Stripe keys between modes.
**Webhooks not firing in test mode.** Customer can't test their webhook handler; gets to production and discovers bugs. Webhooks SHOULD fire in test mode; fake their content but real their delivery.
**No visible mode indicator.** Developer in production thinks they're in test; charges customers real money. Always show mode prominently.
**Mode-switching by URL or query parameter.** Confusing; risky. Mode-switching by API key (immutable per request) is safer.
**No way to seed test data.** New developer in test mode sees an empty product; can't tell what's real. Pre-seed test accounts.
**Test mode rate limits same as production.** Developer iterating fast hits rate limits in test; bad DX. Higher rate limits for test mode.
**Different data models for test vs live.** Means code paths diverge; bugs in one don't surface in the other. Same model; different mode column.
**Test mode "free" without limits.** Some abusers create unlimited test accounts to misuse compute / storage. Rate limit + quota even in test.
**No way to clear test data.** Developer's test mode fills with junk over time; want to start fresh. "Reset test data" button per account.
**Test mode with different TLS / domain / region.** Should be same domain + endpoint; only key differs. Keeps integration code identical.
**Production code path that branches on mode for product behavior.** "If test mode, skip this validation" — creates divergence. Test mode = same product; only side effects differ.
**Switching modes loses state in the dashboard.** Developer toggles to live; their work in progress disappears. Persist UI state per mode where reasonable.
**Mode indicator only on certain pages.** Banner on dashboard but not on invoice detail page. Apply to ALL pages globally.
**No webhook replay.** Customer's handler has a bug; events missed; can't recover. Allow replay of any historical webhook.
**Test mode that costs real money internally.** Calling LLMs / other paid APIs in test mode adds up. Cache test responses or use cheaper models for test.
**Forgetting to namespace logs / observability.** Test traffic and live traffic blended in your logs / Datadog dashboards; signal lost. Tag every span / log with mode.
**Customer-side: integration code that works in test, breaks in live.** Often due to test mode being more permissive. Run integration tests against BOTH modes in CI.
## Customer-Facing Operations
Build the customer-facing UX for working with test mode:
- Mode toggle: top-right of dashboard; one-click switch between live + test
- API keys page: separate sections for live + test; create / rotate / revoke each
- Test data tools:
- "Seed test data" button (re-populate with sample data)
- "Clear all test data" (with confirmation)
- Webhook playground: manually send test webhooks
- Delivery log: see all webhook attempts with retry / replay actions
- Documentation links: contextual "How to test this" links throughout the dashboard
Implementation:
- Persist active mode per browser tab via cookie (so multi-tab live + test workflows are possible)
- All dashboard endpoints use the active mode for queries
Stack: Next.js + cookies + Drizzle.
## See Also
- [Public API](./public-api-chat.md) — the API this test mode runs against
- [Developer Portal & API Sandbox](./developer-portal-api-sandbox-chat.md) — broader developer experience
- [API Keys](./api-keys-chat.md) — key issuance, rotation, revocation
- [API Versioning](./api-versioning-chat.md)
- [API Pagination Patterns](./api-pagination-patterns-chat.md)
- [API HTTP Caching](./api-http-caching-chat.md)
- [Webhook Signature Verification](./webhook-signature-verification-chat.md)
- [Outbound Webhooks](./outbound-webhooks-chat.md)
- [Inbound Webhooks](./inbound-webhooks-chat.md)
- [Idempotency Patterns](./idempotency-patterns-chat.md)
- [Rate Limiting & Abuse](./rate-limiting-abuse-chat.md)
- [Quotas, Limits & Plan Enforcement](./quotas-limits-plan-enforcement-chat.md)
- [Logging Strategy / Structured Logs](./logging-strategy-structured-logs-chat.md)
- [Audit Logs](./audit-logs-chat.md)
- [Background Jobs & Queue Management](./background-jobs-queue-management-chat.md)
- [Cron / Scheduled Tasks](./cron-scheduled-tasks-chat.md)
- [Multi-Tenancy](./multi-tenancy-chat.md)
- [Roles & Permissions](./roles-permissions-chat.md)
- [Plan Upgrade, Downgrade & Mid-Cycle Billing Changes](./plan-upgrade-downgrade-billing-changes-chat.md)
- [Account Suspension & Fraud Holds](./account-suspension-fraud-holds-chat.md)
- [In-App Status Banners & System Notifications](./in-app-status-banners-system-notifications-chat.md)
- [Settings & Account Pages](./settings-account-pages-chat.md)
- [Microcopy & Product Copy Systems](./microcopy-product-copy-systems-chat.md)
- [Approval Workflows & Multi-Step Routing](./approval-workflows-multi-step-routing-chat.md)
- [Stripe (VibeReference)](https://viberef.dev/auth-and-payments/stripe.md) — example of best-in-class test mode
- [API Mocking & Mock Data Platforms (VibeReference)](https://viberef.dev/devops-and-tools/api-mocking-mock-data-platforms.md)