Ship Audit Logs That Pass Enterprise Procurement

Audit Logs for Your New SaaS

Goal: Build an audit-log system that captures every meaningful action in your product, exposes it through a clean UI and API, retains it for compliance windows, and survives a security review without rework. Move "do you have audit logs?" from a deal-blocker to a checkbox the buyer ticks in 30 seconds.

Process: Follow this chat pattern with your AI coding tool such as Claude or v0.app. Pay attention to the notes in [brackets] and replace the bracketed text with your own content.

Timeframe: Schema + writer middleware shipped in 2-3 days. Customer-facing UI + API endpoint in week 2. Retention + export + integrations in week 3. SIEM-ready streaming in month 2 if a paying customer asks.

Why Audit Logs Are an Earlier-Stage Need Than Founders Think

Three failure modes hit founders the same way:

The "we'll add audit logs when an enterprise customer asks" plan. The first enterprise customer asks. The team scrambles to retrofit logging across 18 months of code. They miss half the events, fudge timestamps, and ship a half-baked logs UI in 3 weeks of panic engineering. The customer notices the gaps in the security review and downgrades the contract. Audit logs are 5x cheaper to build from the start than to retrofit.
Logging too much without structure. The team adds console.log everywhere and calls it audit logs. The "logs" are an unsearchable blob, time-skewed across services, and full of internal debug output that customers shouldn't see. The signal-to-noise ratio kills usefulness, and exposing the unfiltered stream to customers is a security incident waiting to happen.
Mixing audit logs with application logs. App logs (errors, request traces, perf metrics) and audit logs (who did what, when, why) have different lifecycles, retention requirements, query patterns, and access controls. Conflating them means everything has to be stored at the strictest tier, costing 5x more than necessary, while the audit surface ends up worse than if it was purpose-built.

The version that works is structured: define the events you log explicitly, store them in a separate, immutable, queryable store, expose them through a customer-facing UI and API, and design retention/export from day 1.

This guide assumes you have already done Data Trust (audit logs are part of the trust artifact set), have considered Public API (the audit log API is an extension of your public API surface), and have Observability wired up (separate from audit logs but pairs with them).

1. Define the Audit Event Schema

Before any code, define the shape of an audit event. Most teams skip this and end up with inconsistent events that are hell to query later.

You're helping me design the audit log event schema for [your product] at [your-domain.com]. The product is [one-sentence description] with [N paying customers / pre-launch].

Every audit event must have these fields:

1. **event_id** — opaque prefixed ID like "evt_01HXX..." for stable referencing
2. **event_type** — dot-notation, noun.verb_past_tense (e.g., "user.logged_in", "api_key.created", "billing.subscription_changed", "team_member.role_updated")
3. **occurred_at** — ISO 8601 UTC timestamp, sub-second precision
4. **account_id** — the tenant/workspace this event happened in
5. **actor** — who did it:
   - actor_type: "user" / "api_key" / "system" / "support_admin"
   - actor_id: the user_id, api_key_id, or "system"
   - actor_email: cached for display (don't make UI re-resolve)
   - actor_ip: source IP
   - actor_user_agent: full UA string
6. **target** — what the action affected:
   - target_type: e.g., "user", "subscription", "team", "api_key", "document"
   - target_id: the affected resource's ID
   - target_name: cached human-readable name
7. **action** — what they did, plain English short form (e.g., "Logged in", "Created API key 'production'", "Changed plan from Pro to Business")
8. **metadata** — JSON object with event-type-specific context (the "before" and "after" of state changes, the parameters of an action, etc.)
9. **session_id** — group related events from the same session
10. **request_id** — correlate with application logs / traces if needed
11. **livemode** — boolean, are we in production or test mode (matches Stripe pattern)

Output:
1. The Postgres / your-database schema for the audit_events table with appropriate indices (account_id + occurred_at desc is the primary query pattern)
2. The TypeScript / [your language] type for an AuditEvent
3. A list of 25-30 specific event types I should log for [your product] grouped by category:
   - Authentication (logged_in, logged_out, password_changed, mfa_enabled, etc.)
   - Account management (workspace_created, member_invited, role_changed, member_removed)
   - API access (api_key_created, api_key_revoked, api_key_used)
   - Data operations (record_created, record_updated, record_deleted, record_exported)
   - Billing (subscription_started, subscription_changed, payment_failed, payment_succeeded)
   - Security-sensitive (data_export, account_deletion, settings_changed, oauth_app_authorized)
   - Custom to my product (whatever the core domain operations are)
4. The retention policy default (12 months for paid tiers, 30 days for free tier — call this out in the schema)

Sanity check: if my product has fewer than 8 user-visible action types worth logging, the answer might be "audit logs are premature; ship a simple history view first." Tell me if that's the case.

Three principles worth internalizing:

Past-tense, dot-notation event names. user.logged_in not userLogin not LOG_IN. Consistency makes queries possible and matches the convention every modern SaaS converges on (Stripe, Linear, Vercel).
Cache human-readable values at write time. Storing only target_id: "usr_01HXX..." means the UI has to resolve names at read time, which is slow and breaks if the user is later deleted. Cache target_name: "Jane Smith" at write — even if Jane is renamed later, the audit log shows the name at the time of the action.
Don't store secrets in metadata. Tokens, passwords, API key values — never. The metadata field is queryable; if a value lands there, it's now in your audit table forever.

2. Wire Up the Writer Middleware

The audit log is only useful if events actually fire. Bake it into the request layer.

Help me implement the audit-log writer for [your stack — Next.js / SvelteKit / Hono / Express / your framework].

Three writer entry points:

1. **HTTP request middleware** — fires for any state-changing HTTP request (POST, PUT, PATCH, DELETE)
   - Extracts actor from auth context
   - Extracts target from route params + request body
   - Maps route to event_type via a registry (e.g., `POST /v1/api_keys` → `api_key.created`)
   - Captures IP, user agent, session_id, request_id
   - Writes async after the request handler returns successfully (don't block response)

2. **Internal-service event bus** — fires for events that don't have a 1:1 HTTP request
   - Background-job actions ("system.user_archived" after 90 days inactive)
   - Webhook receipt handlers ("billing.payment_succeeded" from Stripe webhook)
   - Admin-tool actions when support staff performs operations on customer accounts (critical to log; mishandled trust)

3. **Direct call** — `await auditLog.write({...})` for cases where middleware can't capture the right context
   - Use sparingly; the middleware path is the primary one

Implementation requirements:
- The writer is async and non-blocking — buffered in-memory and flushed in batches every 1s OR on graceful shutdown
- If the audit-log database is down, fail-closed for security-critical actions (block the action) and fail-open for non-critical ones (write to a fallback queue, retry)
- Idempotency: if the same event is written twice (network retry), dedupe by event_id
- The writer must NEVER log its own writes (infinite loop)

Output:
- The middleware code for [your framework]
- The route → event_type registry as a JSON or TypeScript map
- The flush + retry logic
- The fail-closed / fail-open decision matrix for which event types are which
- Tests for: write-during-request, write-during-background-job, write-during-database-outage

Three traps to flag:

Don't write audit logs synchronously inline with the request. A 50ms audit-log write doubles your p99 latency. Buffer + flush.
Always log support-admin actions. When you, as the founder, log into a customer's account to debug, that's auditable. Customers ask. They will trust you more if your logs show "support_admin acted in your account at HH:MM" than if support actions are invisible.
Make the route → event_type map exhaustive. Routes without a mapping should error in tests, not silently skip. Otherwise the team adds new endpoints that are unlogged for months.

3. Choose Where Audit Logs Live

Audit logs have different storage requirements than application data. Pick deliberately.

Help me decide the audit-log storage architecture for [your scale — pre-revenue / $0-50K MRR / $50K-500K MRR / $500K+ MRR].

Three options:

**Option A: Same Postgres, separate table**
- audit_events table in the main database
- Append-only by convention (no UPDATE / DELETE rights granted)
- Indexes: (account_id, occurred_at desc), (event_type, occurred_at desc)
- Retention via partitioning by month + dropping old partitions
- Best for: pre-revenue → $50K MRR, simple to operate, no extra infrastructure

**Option B: Dedicated append-only store**
- Separate Postgres database OR a purpose-built logs store (Axiom, ClickHouse, BigQuery)
- Application database stays small; audit logs scale independently
- Better query performance for log-shaped queries
- Best for: $50K-$500K MRR or any product with high-volume audit events (>1M/day)

**Option C: Audit-log SaaS**
- Vendors: WorkOS Audit Logs, Cerbos, Cribl, Vanta integrated audit
- They handle storage, retention, retrieval, SIEM streaming
- Best for: when you'd rather pay than operate

For [your stage], recommend the right choice with rationale. Then output:
1. The schema as it lives in the chosen store
2. The retention configuration (12 months default for paid, 30 days free, with paid extension to 36 months for enterprise)
3. The query patterns the store needs to support efficiently
4. The migration plan if I outgrow the choice (from A to B is the most common path; from B to C is rarer)

A few rules I've watched founders re-learn:

Same-Postgres is fine until it isn't. The crossover happens around 5-50M audit events total, when your application queries start fighting audit-log queries for shared resources. Until then, Option A is correct.
Append-only by convention, enforced by permissions. Even if your audit table is regular Postgres, revoke UPDATE and DELETE on it from the application role. Use a separate role for the partition-drop retention job. Sloppy permissions are how audit logs get tampered with.
Don't roll your own audit-log SaaS unless you've explicitly chosen "audit-log infrastructure" as a pillar of your product. Vendors do it well; reinventing it is a 6-month tax for marginal differentiation.

4. Build the Customer-Facing Audit Log UI

Customers expect to see their own audit log. Build the UI as a first-class feature, not an afterthought.

Design the customer-facing audit log UI for [your product]. The page lives at /settings/audit-log (or /security/activity) and shows the events for the customer's account.

Required components:

1. **Filter bar**:
   - Date range (default: last 7 days, presets for 24h / 7d / 30d / custom)
   - Actor filter (user dropdown for the workspace's members + "API key" + "System")
   - Event type filter (multi-select grouped by category)
   - Free-text search across action + target_name

2. **Event list (table or vertical timeline)**:
   - Columns: Time | Actor | Action | Target | IP / Source
   - Each row expandable to show full metadata as JSON
   - Click an actor to filter by that actor
   - Click an event_type to filter by that type
   - Pagination: cursor-based, 50 events per page

3. **Empty state**:
   - "No events match your filter" with a "Clear filters" button
   - Don't show "no events ever" empty state to a real account; if the account has zero events, something is broken

4. **Export button**:
   - Generates a CSV or JSON of the current filter view
   - For paid tiers, allows full-account export (limited by retention)
   - Sends an email with download link if the export is large (>10K rows)

5. **Retention notice**:
   - Top of the page: "Audit logs retained for 12 months on Pro plan. Older events are not available."
   - Clear, not buried in fine print

6. **Anomaly highlights** (paid tier feature):
   - Show 1-2 banner-level highlights if the system detects anomalies (e.g., "5 failed login attempts on April 28")
   - Don't pretend to do AI-anomaly-detection; rule-based highlighting is enough for v1

Anti-patterns:
- Defaulting to "all events ever" — slow query, useless view
- Hiding the actor's IP (security teams want it)
- Showing application logs / debug output mixed in with audit events
- Not letting users filter by their own account's API key

Output:
- The page wireframe (text or HTML)
- The query backend (cursor-paginated SQL or API call)
- The expand-to-JSON metadata view component
- The CSV export logic (streamed for large exports)
- The empty state and error states

Three principles:

Filterability is the feature. A page that shows 1000 events with no filtering is a worse experience than a page that shows 20 well-filtered events. Invest in filter UX before everything else.
Show the actor's email and IP in the table. Security-conscious customers (the ones asking for audit logs) want this. Hiding it forces them to expand every row, which they won't.
CSV export is non-negotiable. Every security review asks "can I export my logs?" Yes is the only acceptable answer.

5. Expose Audit Logs Through the API

Customers integrate audit logs into their SIEM, their data warehouse, their internal dashboards. Expose the feed.

Design the audit log API endpoints. Build on top of the existing [Public API](public-api-chat.md) authentication and conventions.

Three endpoints:

**GET /v1/audit_logs**
- Query params: limit (max 100), cursor, occurred_after, occurred_before, event_type[], actor_id, target_id
- Returns: paginated list of audit events
- Auth: account API key with read scope OR personal token of any account member with admin role
- Critical: this endpoint must be permission-checked — non-admin members of a workspace cannot read all events; they can only read events they were the actor of (or where they were the target, depending on policy)

**GET /v1/audit_logs/:event_id**
- Returns the full single event with all metadata
- Same permission model

**POST /v1/audit_logs/exports**
- Triggers an async export of audit events for a date range
- Returns a job_id; customer polls or receives webhook when ready
- Returns a signed URL to download the CSV / JSON
- Larger than the synchronous endpoint; supports up to 1M events per export

Streaming option (paid tier):
**Event delivery via webhook** — same shape as your normal webhook system per [Public API](public-api-chat.md), but the event_type is "audit.event" and the payload is the full audit event. Customers subscribe by enabling the audit webhook + selecting which event types.

For SIEM integrations (enterprise tier):
- Splunk HTTP Event Collector (HEC) push: customer provides their HEC URL + token; we forward each audit event in real-time
- Datadog Logs API: same pattern
- Generic syslog forwarding (rare in 2026 but enterprise contracts sometimes ask)

Output:
- The OpenAPI spec for the three REST endpoints
- The webhook configuration UI for audit subscriptions
- The SIEM forwarding adapter pattern (add new SIEMs as separate adapters; don't fork the core delivery loop)
- Rate limits on the API (audit log queries can be heavy; cap at 60 req/min per API key)
- Permissions matrix: which roles can read which scope of audit events

A common mistake: making audit logs API-readable by any team member of any role. If "Member" can read all admin actions, you've leaked information. The default should be "Admins read everything; Members read only events they were the actor of."

6. Set Retention and Tier Pricing

Retention is a paid-tier feature, not a default for everyone. Get the tiers right.

Help me design the audit-log retention tier model.

Standard tier model:

- **Free tier**: 30 days retention, UI-only access (no API), no export
- **Pro tier**: 12 months retention, UI + API access, manual export
- **Business tier**: 36 months retention, UI + API + webhook streaming, automated exports
- **Enterprise tier**: configurable retention up to 7 years, SIEM integration, dedicated infrastructure if needed

Retention enforcement:
- A daily cron drops audit events older than the account's retention window
- For tier downgrades (Pro → Free), a 90-day grace period before deletion (so customers don't lose data unintentionally)
- For tier upgrades, retention extends going forward (no retroactive — events older than the previous tier's window are gone)

Pricing positioning:
- Audit logs as a tier-gating feature is normal in B2B SaaS in 2026
- Don't gate the existence of audit logs on free tier — show them, just retain less
- Don't charge per-event; flat tier pricing only. Per-event pricing makes customers throttle their own audit fidelity which is a security anti-pattern

Output:
1. The retention enforcement code (the cron job + the soft-delete vs hard-delete strategy)
2. The tier-downgrade flow with grace period
3. The pricing copy for [Pricing Page](pricing-page-chat.md) — how audit logs appear in the tier comparison
4. The customer-facing notice about retention (in the audit UI + on tier-change confirmation)

The most common mistake: gating audit-log existence behind paid tiers entirely. This actively repels enterprise buyers in evaluation. Show audit logs on free; gate retention and integration on paid. The buyer in evaluation sees "yes, audit logs exist" and unblocks the deal; they upgrade for retention later.

7. Test the Audit Log

A logging system that's wrong is worse than no logging system. Test thoroughly.

Design the test suite for the audit log system.

Categories of tests:

1. **Unit: writer correctness**
   - Each event_type writes the expected fields
   - Metadata serialization handles edge cases (nested objects, Unicode, large payloads)
   - Idempotency: writing the same event twice produces one row
   - Failure modes: database down, slow flush, oversized payload

2. **Integration: middleware coverage**
   - For every state-changing route, assert that the corresponding audit event fires
   - Test runs on the routing table — adding a new route without an audit mapping fails the build
   - Test that GET requests don't log (or log differently if we choose to log reads)

3. **Integration: cross-system events**
   - Background-job events fire correctly
   - Webhook-receipt events (e.g., Stripe payment) fire and don't double-fire on retries
   - Support-admin actions log with actor_type=support_admin

4. **Security: permissions**
   - Member role cannot read events they were not the actor of
   - Admin role reads all events for their account
   - Cross-account read attempts return 404 (not 403, to avoid revealing account existence)
   - API key scope respected: read scope can read; write-only scope cannot

5. **End-to-end: customer-visible flow**
   - Sign up → log_in → create_resource → invite_member → all 4 events visible in UI within 2 seconds of action
   - Filter by actor returns expected subset
   - CSV export generates and downloads correctly

6. **Reliability: failure modes**
   - Audit-log database down: critical actions fail-closed; non-critical fail-open with retry
   - Audit-log database slow: writes buffer correctly; user-facing latency unchanged

Output:
- The test plan as a markdown checklist
- The CI integration: route-coverage check that runs on every PR
- The synthetic monitoring: production-side test that creates a known event every hour and verifies it appears in the audit log within N seconds

The route-coverage check is the most undersold engineering practice in audit logging. It's 30 lines of code that fails the build if a new endpoint is added without registering an audit event. Without it, audit log coverage decays silently as the team grows.

8. Document for Buyers

Security reviewers expect documentation. Have it ready.

Generate the audit log documentation for security reviews.

The doc lives at /trust/audit-logs (linked from the [Data Trust](data-trust-chat.md) page).

Sections:

**1. What we log**
- The list of event types (auto-generated from the registry; keep in sync with code)
- For each event_type: when it fires, what fields it captures, retention window

**2. What we don't log**
- Read-only operations (or call out that we do, if we do)
- Specific PII redaction policy (e.g., "request bodies are not logged; only field names that changed")
- Internal debug logs (separate from audit logs; not customer-accessible)

**3. Retention**
- Tier-by-tier retention table
- Tier-change behavior with grace period

**4. Access**
- Who in the customer's organization can access audit logs (admin role only by default)
- API access scopes
- SIEM integration paths

**5. Tamper-evidence**
- The technical controls (append-only role, separate retention job role, audit-log integrity checks)
- Whether we hash-chain events (most indie SaaS don't in 2026; if you do, it's a strong signal for security-conscious buyers)
- Backup retention separate from primary store

**6. Compliance mappings**
- Which events satisfy which compliance controls (SOC 2 CC6.x, ISO 27001 A.12.4.x, etc.)
- This section is what passes you through enterprise procurement quickly

**7. Limitations**
- What we don't yet support (e.g., "Real-time SIEM streaming is on the Q3 roadmap; manual export is available today")
- Be honest. Reviewers respect this; pretending a feature exists when it doesn't is a bigger problem than not having it.

Output the documentation page in the same voice as the rest of /trust.

The single highest-leverage section: Compliance mappings. Enterprise buyers' security reviews map controls to evidence. If your audit-log doc says "event 'auth.logged_in' satisfies SOC 2 CC6.1", the reviewer ticks the box and moves on. Without it, they ask, you respond, time passes. Mappings shave 1-2 weeks off enterprise procurement.

What Done Looks Like

By end of week 3 of this work:

Audit event schema finalized and deployed
Writer middleware wired into the HTTP layer + background jobs + webhook receipts
25-30 event types logged consistently
Customer-facing UI at /settings/audit-log with filtering, expand, export
API endpoints for list / get / export
Retention policy enforced with daily cron
Documentation linked from /trust
Route-coverage CI check running on every PR

Within 90 days:

First enterprise customer cites audit logs as a positive in security review
Zero "audit log doesn't show this action" support tickets
Average time-to-discover for security incidents (who did what, when) drops to under 2 minutes from previous "we'd have to ask engineering"

Within 12 months:

Webhook streaming + at least one SIEM integration shipped (because a paying customer asked)
Compliance mappings page used in 80%+ of enterprise security reviews
Audit log retention is a paid-tier driver in your conversion analytics

Common Pitfalls

Logging everything. A 5MB request body in metadata explodes the log table and makes queries slow. Log diffs (before/after of changed fields), not full payloads.
Logging secrets. Tokens, passwords, raw API key values — never. They live in the audit table forever.
Mixing audit and application logs. Different lifecycles, different access controls. Separate them.
No route-coverage check. New endpoints ship without audit events. Coverage decays silently.
Treating audit logs as backend-only. Customers expect a UI. Without it, the answer to "do you have audit logs?" is "kind of" — and the buyer is gone.
Gating audit logs entirely behind paid tiers. Show the existence on free; gate retention and integration. Repels evaluation otherwise.
No retention enforcement. Logs grow forever, table slows, costs balloon. Cron + partition drops from day 1.
Synchronous writes inline with requests. Inflates p99 latency. Buffer + flush.
No support-admin logging. When you (the founder) act in a customer's account, that's auditable. Hiding it erodes trust if discovered.
Forgetting tier downgrade grace. Customers downgrade and lose 11 months of audit history overnight. Bad press, lost trust. 90-day grace.

Where Audit Logs Plug Into the Rest of the Stack

Data Trust — the trust page links to the audit log doc; both are part of the same trust artifact set
Public API — audit logs are exposed through the same API surface
Customer Support — support team uses audit logs to debug "what did the user do?" tickets
Incident Response — security incidents are investigated via audit logs
Status Page — audit log integrity issues surface as a status event
Pricing Page — retention tier appears in the pricing comparison
Email Deliverability — auth events feed deliverability anomaly detection
Observability Providers — observability is for engineering; audit logs are for customers and compliance — keep them separate
Background Jobs Providers — the daily retention cron typically runs on the same job system

What's Next

Audit logs are a feature that compounds quietly. The team that ships them in week 3 of launch passes enterprise security reviews 12 months later in 30 minutes instead of 3 weeks. The team that defers them until "we need it" pays the retrofit tax with interest — usually at the worst possible time, mid-deal-cycle.

Build the discipline now. The schema decision, the writer middleware, the retention cron — none of these are big projects in week 1. They're 6-month projects in month 18. Pay the small upfront cost; reap the recurring procurement-shortcut benefit for the life of the product.

⬅️ Growth Overview