Set Up Usage-Based Billing for Your AI SaaS

Usage-Based Billing for Your New SaaS

Goal: Add token, credit, or event-metered billing to a Stripe subscription so customers pay for what they actually use. Wire it without breaking your unit economics, without double-charging on retries, and without surprising customers with a $4,000 invoice from a runaway agent.

Process: Follow this chat pattern with your AI coding tool such as Claude or v0.app. Pay attention to the notes in [brackets] and replace the bracketed text with your own content.

Timeframe: 1–2 days for the first metered price live in production. Add per-user caps and alerts in week 2 of launch — before they are needed, not after.

Why Usage-Based for AI

Most SaaS products in 2026 still ship flat-rate pricing because it is simpler to operate. For most products, that is fine. For AI products, it is usually wrong.

Two reasons:

Cost variance is enormous. A power user who runs your tool through an agent loop can do 1,000× the inference of a casual user on the same plan. On flat-rate, you eat the cost on the heavy users and overcharge the light ones — and lose to a competitor that prices fairly.
Buyers expect it. Anthropic, OpenAI, Replicate, and every cloud-AI provider charges by tokens or seconds. Buyers comparing your tool to those providers expect at least the option of metered billing.

The realistic packaging for an AI SaaS in 2026:

Free tier with a hard token cap (the funnel filler). Just enough usage to hit the aha moment, no more.
Pro plan: flat-rate base + included quota + overage. $29/month, 1M tokens included, $5 per additional 1M. Predictable for the customer, captures expansion for you.
Sales-assisted plan: custom for any account that crosses a threshold (typically $500/month spend), with negotiated rates and dedicated terms.

This guide is the technical playbook for the Pro plan's overage billing — the piece most founders fumble.

This is a standalone follow-on to Payment Integration (3.3) — start there for the basic Stripe + Clerk subscription wiring before adding metered billing on top.

1. Decide What You Are Metering

The single most consequential decision. Once you ship, customers will hold you to whatever metric you picked.

I'm building [your product] at [your-domain.com]. The product does [one-sentence description]. The cost driver behind every action is [tokens / model calls / GPU seconds / generated images / something else].

Help me choose what to meter. Evaluate three candidates:

1. **Raw provider tokens** — count input + output tokens passed to/from [Anthropic / OpenAI / Replicate]. Pros: matches actual cost. Cons: leaks model details to customers, harder to abstract over multiple providers.

2. **Custom credits** — your own internal currency. 1 credit = N tokens, normalized across providers. Pros: hides provider details, lets you swap models without invoice changes. Cons: customers must learn what a credit means.

3. **Outcome events** — count successful product actions (one report generated, one design rendered, one email drafted). Pros: aligned with customer value, simplest mental model. Cons: variance per outcome can be huge — a 200-token email and a 200,000-token research brief both count as one event.

For my product specifically, recommend ONE primary metric and explain:
- Why it best matches customer-perceived value
- How I'll convert provider costs into the metric (markup, conversion ratio)
- Whether to expose the underlying tokens to power users in the dashboard or hide them entirely
- The migration path if I ever want to switch metrics

Default recommendation if there's no strong reason: custom credits with a fixed token-per-credit ratio. Easiest to communicate, cleanest to evolve.

For most AI SaaS, custom credits with a posted token ratio are the right answer. They survive provider switches, they normalize across input/output token differences, and they read cleanly on a pricing page.

2. Set Up the Stripe Meter

Stripe's modern primitive is Billing Meters, ingesting events through the Meter Events API. You wire one meter per metric you bill on, attach it to a metered price on a product, and link that price to subscription items.

I have a Stripe account and an existing subscription product for [your product]. I want to add usage-based billing on top of the existing flat-rate Pro plan.

Walk me through, in exact steps:

1. **Create a Billing Meter** named `tokens_used` (or my chosen unit) in the Stripe dashboard. Aggregation = sum, time window = 1 hour, idempotency on event ID.

2. **Create a metered Price** attached to my existing Pro product. Configure it as a graduated tier:
   - First [N] units per month: $0 (the included quota)
   - Beyond [N]: $X per 1,000 units
   Currency: my home currency, with Stripe automatically converting on invoice.

3. **Add a SubscriptionItem** for the metered price to existing Pro subscriptions. Show me the API call to do this and confirm prorated handling for mid-cycle adds.

4. **Webhook setup** — show me which Stripe webhooks I should listen for (`invoice.created`, `invoice.finalized`, `customer.subscription.updated`, `billing.meter.error_report_triggered`). Code the handlers in [Next.js / Express / chosen runtime].

5. **Test mode validation** — exact steps to fire a fake meter event in test mode, see it on the test invoice, verify the math, and confirm no production data is touched.

Use the current Stripe Billing v2 API (2026), not the legacy `usage_records` API. Show actual code, not pseudocode. I want to paste this and have it work.

Critical detail: Stripe deprecated the per-subscription-item usage_records endpoint in favor of Billing Meters. If your AI assistant defaults to the legacy API, push back — Meters scale to high-cardinality event volumes (the AI use case) and usage_records does not.

3. Emit Meter Events From Every Billable Call

This is the part where most implementations break. You need to emit exactly one meter event per billable unit of work, surviving retries, partial failures, and your own caching layer.

Help me wire usage tracking into every place my app calls [Anthropic / OpenAI / a model gateway].

Requirements:

1. **One emission per successful response, never on retry.** If I call the model and it fails with a 5xx, I retry — but I must not emit the meter event until the final successful call returns. Show me the wrapper that enforces this.

2. **Idempotency by event ID.** Every meter event must include a unique idempotency key (e.g., `${userId}:${requestId}:${attemptCount}`). Stripe deduplicates on `identifier`, so emitting the same event ID twice is safe. Show me how to construct this and pass it on every emit.

3. **Cache exclusion.** I have a [Vercel runtime cache / Redis cache / Convex cache] in front of model calls. Cache hits cost me $0, so I must NOT meter them as usage to the customer. Show me where in my call site to short-circuit the meter emission.

4. **Token attribution to the user.** Every meter event carries `customer` (the Stripe customer ID) and `value` (the integer token count). Walk me through how I get the customer ID from my user identity layer ([Clerk / Supabase Auth / Auth.js]) at the call site.

5. **Buffering for resilience.** If Stripe is down for 30 seconds, the in-flight calls should still succeed and the meter events should reach Stripe eventually. Show me a small in-memory queue with retry that doesn't lose events on process restart (Redis-backed or durable equivalent).

6. **End-to-end test.** Wire one test that calls my model wrapper for a known input, verifies the response, and verifies a single meter event landed in Stripe with the right token count. No flaky waits — use Stripe's test mode and explicit assertions.

Output the wrapper code, the meter-emit utility, the cache-exclusion helper, and the test in one drop.

The "exclude cached responses" rule deserves emphasis: charging customers for the cache-hit they paid you nothing to serve is the fastest way to lose trust. Make this an explicit, tested rule in your codebase.

4. Build the Customer-Facing Usage Dashboard

Buyers tolerate metered billing only when they can see what they are using. Without real-time usage visibility, every invoice feels like a surprise charge — even when it is fair.

Build a usage dashboard inside my app at /settings/usage.

Show, for the current billing period:

1. **Big number**: total credits / tokens used this period. Below it: "of [N] included" with a progress bar.
2. **Projected spend**: based on usage so far + days remaining in the period, projected total at month-end. "On track for $X this month — $Y over included quota."
3. **Daily usage chart**: line or bar chart, last 30 days, with the included-quota line plotted as a horizontal reference.
4. **Top-consuming actions**: a table of what's driving usage. "Generated reports: 1.2M tokens, 60% of total." Lets the customer self-diagnose where their spend is going.
5. **Per-user breakdown** (if multi-seat): which team members are driving usage. Useful for admins.
6. **Hard limit setting**: a control where the customer (or admin) can set "stop billable usage at $X" or "stop at N tokens" — implementing the hard limit from Section 5.

Data sources:
- Stripe Billing Meter aggregations API for the period total
- My own database for the daily breakdown (I'm storing every meter event locally too — never trust Stripe as the only source)
- Reconcile the two on every load and warn me in logs if they diverge by more than 1%

Style: minimal, numbers-first. No animation, no marketing copy. Customers checking spend want to see numbers fast.

The dashboard is also where you catch your own metering bugs before customers do. If the local count and the Stripe count diverge, you have a missed event somewhere — find it before invoice day.

5. Cost Protection: Limits, Caps, and Alerts

The single biggest fear customers have about usage-based billing is the runaway invoice. The tools that defuse it:

Add cost-protection controls for [your product].

1. **Per-user soft limit**: customers set a "warn me at" threshold. When they cross it mid-period, I send an in-app notification AND an email. They can keep using the product. Default: 80% of their typical spend.

2. **Per-user hard limit**: customers set a "stop at" threshold. When they cross it, my app refuses further billable actions and shows a clear message: "You've hit your set limit of $X. Raise it in settings to continue, or wait until next period." Default: off (opt-in), because soft surprises are worse than overage charges.

3. **Account-level kill switch**: a setting customers can flip to "block all overage" — they get exactly the included quota, no overage charges possible, even if they manually click usage actions. Useful for finance-controlled accounts.

4. **Per-action cost preview** (for high-cost actions): before the customer clicks "Generate full report" — which I know costs ~50,000 tokens / $X — show the estimated cost in the UI. Lets them opt out before incurring.

5. **My own circuit breaker**: at the platform level, if any single customer crosses [10× / your threshold] their typical daily usage in one hour, freeze their account and ping me on Slack. This catches infinite-loop agents that would otherwise generate $5,000 invoices nobody wants to defend.

6. **Stripe webhook on invoice spikes**: Stripe will send an `invoice.created` event before finalization. Wire a check that flags any invoice >50% larger than the customer's previous one and notifies me before it's sent. Gives me a chance to apply a credit before the charge surprises the customer.

For each control, show me where in the codebase it lives, what data it reads from, and a unit test that proves it triggers correctly. Don't ship without item 5 and item 6 — those are the two that save you from the runaway-agent disaster story.

Item 5 (the circuit breaker) is non-negotiable for an AI SaaS. Two real-world incidents that have happened in 2025–2026: an agent that infinite-looped through a customer account ran up $14,000 in a weekend; a misconfigured prompt template generated 200M tokens before anyone noticed. In both cases, a 10× daily-usage circuit breaker would have stopped the bleeding within an hour.

6. Migrate Existing Customers Without Breaking Trust

If you already have flat-rate paying customers, do not flip their plan to metered without notice. The wrong way to do this loses customers; the right way often increases revenue from heavy users while keeping light users happy.

I have [N] existing customers on flat-rate Pro. I want to migrate them to the new flat + overage Pro plan, but without surprise charges.

Walk me through a safe migration:

1. **Set the included quota** on the new metered Pro plan to a level that covers 95% of existing customers' actual usage based on the last 90 days. Tell me how to query my existing usage data to find that threshold — I want the 95th percentile, not the median.

2. **Communications plan**: a 60-day-ahead email to every existing customer explaining: (a) the new plan, (b) their personal usage from the last 3 months, (c) projected impact under the new plan, (d) a one-click way to stay on legacy flat-rate ("grandfathered for life") if they prefer. Draft the email.

3. **Grandfather option**: technically support keeping legacy customers on the old flat-rate plan indefinitely. The cost of grandfathering 50–200 early customers is small; the trust cost of forcing them off is huge.

4. **Active migration**: for customers who don't opt to grandfather, switch them to the new plan at their next renewal. Stripe handles proration. Confirm.

5. **First-invoice safety net**: for the first invoice on the new plan, show me how to apply a one-time credit if the new total exceeds 110% of their old total. They paid more than expected — refund the difference, keep the relationship.

6. **Visible changelog**: a public page at /pricing/changelog documenting every pricing change with date, reason, and effect on existing customers. Shows future buyers that you don't surprise people.

Migration day checklist: verify all webhooks, run a dry-run on a test customer, monitor invoices for 7 days post-migration. If anything looks off, apply credits proactively.

The grandfather-existing-customers move costs you a small fraction of revenue and buys an enormous amount of community goodwill. Founders who skip it often save $5,000/month and lose a $200,000 LTV cohort.

7. Alternatives Worth Considering

Stripe Billing is the safe default, but it is not the only good option. Two situations where alternatives win:

For my AI SaaS, audit whether Stripe Billing is the right billing layer or whether I should consider one of:

1. **Polar** — modern Stripe alternative, increasingly popular with indie SaaS, simpler API, included merchant of record (handles VAT/sales tax automatically). Best for solo founders selling globally who don't want to deal with tax compliance.

2. **Lemon Squeezy** — also merchant-of-record, mature global payments, weaker on advanced metering than Stripe.

3. **Orb** — usage-billing-native platform, runs on top of Stripe for collection. Best for teams with complex multi-dimensional pricing (tokens × seats × GPU type) where Stripe Meters get awkward.

4. **Metronome** — similar to Orb, more enterprise-leaning, used by several major AI labs themselves.

For my specific case ([describe revenue stage, geography, complexity]), recommend ONE billing layer and explain:
- Whether merchant-of-record matters (am I selling internationally enough that VAT/GST handling is a real cost?)
- Whether my metering is simple enough for Stripe Meters or complex enough for Orb/Metronome
- The migration path if I switch later
- Total all-in cost (Stripe fee + tax fees + my own ops time)

Default recommendation if no strong reason: stay on Stripe Billing. The advanced platforms are worth the complexity only past $50k MRR.

Common Failure Modes

"We charged a customer for retries." Wrap every model call in a single emit-on-final-success guard. Test it explicitly with a deliberately failing first attempt.

"Cache hits showed up on the invoice." The metering layer must run after the cache layer in your call chain, not before. If the request never hit the model, no event ever fires.

"A customer got a $4,000 invoice from an infinite loop." You did not implement the platform-level circuit breaker (Section 5, item 5). Apply a credit immediately, ship the circuit breaker before sundown, write the incident publicly.

"Customers asked why their usage doesn't match the dashboard." You are storing usage events only in Stripe and trusting Stripe as the source of truth. Always store events locally too, reconcile on every dashboard load, and surface any divergence visibly to your team.

"A migration wiped 30% of revenue." You force-migrated existing customers and they churned to a competitor. The grandfather option (Section 6, item 3) is non-optional.

"The pricing page is too confusing." Usage-based pricing tempts founders into ten-line tier matrices. The pricing page should fit on one screen and explain itself in 30 seconds. Two tiers, included quota, posted overage rate. Anything more complex usually loses revenue, not gains it.

"We launched and never raised prices." AI cost-of-goods drops every quarter as model prices fall. If you ship a metered price and never re-tune it, you leave margin on the table. Quarterly review, every quarter.