AI Features in Your SaaS: Ship LLM Capabilities Without Burning Margins or Trust
AI Feature Strategy for Your New SaaS
Goal: Ship LLM-powered features (chat, summarization, generation, classification) that customers actually use without burning your unit economics, hallucinating users into incorrect data, or shipping prompts that drift from working to broken without notice. Use a gateway, manage prompts as code, stream responses, set quotas per tier, evaluate quality continuously, and observe production traffic. Avoid the failure modes where founders ship raw OpenAI calls inline (no observability, no failover, no cost control), put system prompts in code with no versioning ("we changed the prompt three weeks ago and now it's bad"), or skip evaluation (you find out about quality regressions from customer support tickets).
Process: Follow this chat pattern with your AI coding tool such as Claude or v0.app. Pay attention to the notes in [brackets] and replace the bracketed text with your own content.
Timeframe: First AI feature behind gateway with streaming + per-tier limits in 2-3 days. Prompt management + observability in week 1. Evaluation + cost dashboards in week 2. Quarterly review baked in.
Why Most Founder AI Features Are Broken
Three failure modes hit founders the same way:
- Direct API calls without abstraction. Founder writes
openai.chat.completions.create(...)inline in a route handler. Six months later, switching providers requires touching 47 files; observability is non-existent; cost-by-feature is unknown; prompt changes are deploys. - Prompts in code with no versioning. System prompts live as string literals in the codebase. Someone "fixes" the prompt; quality regresses; nobody notices for two weeks; customer trust drops; reverting requires git archaeology.
- No quotas per tier. AI calls are unmetered. A free user runs the AI 1,000 times in one weekend; the OpenAI bill triples; founder discovers the next month. Or worse: a single customer scripts the feature into a loop that costs more than their annual subscription overnight.
The version that works is structured: route through an AI gateway, manage prompts as versioned configuration, stream responses for UX, enforce per-tier quotas (per rate-limiting-abuse-chat), evaluate quality before deploys, and observe production traffic with a LLM observability tool.
This guide assumes you have already done Authentication (AI calls are user-scoped), have shipped Multi-Tenant Data Isolation (workspace context for AI calls), have considered LLM Cost Optimization and LLM Quality Monitoring, and have shipped Rate Limiting & Abuse Prevention (AI endpoints are the highest-cost abuse vector).
1. Decide What AI Should Do Before Writing Prompts
The first question is product, not technical. Don''t ship AI as a generic "chatbot" — pick specific value-creating features.
Help me decide which AI features fit [my product].
The high-value patterns:
**Pattern 1: Replace tedium**
- Auto-categorize support tickets / leads / data rows
- Generate first-draft replies / summaries / titles
- Extract structured data from messy input (emails, PDFs)
- Time saved per use is concrete and measurable
**Pattern 2: Augment expertise**
- Suggest improvements to user-written content (writing assistant)
- Surface non-obvious connections in user data
- Recommend next actions
- Each use feels intelligent if done well
**Pattern 3: Conversational search / Q&A**
- "Ask your data" interface over user content
- Documentation chat
- Per [search-chat](search-chat.md): often paired with hybrid retrieval
**Pattern 4: Structured output / classification**
- Sentiment analysis, intent classification, tagging
- Lower stakes than open-ended generation
- Most cost-effective AI use
**Pattern 5: Generation**
- Image generation, copy generation, code generation
- High value if the output replaces a manual process
- Most expensive per call
**Anti-patterns**:
- **Chatbot for the sake of chatbot** — users don''t want to chat with your tool; they want to do work
- **AI features that exist because "AI is in the press"** — if you can''t name the value, skip
- **Vague "smart" features** — specificity beats novelty
For my product, ask:
- What''s the most-tedious task my users do?
- Where do they currently use ChatGPT / Claude as a separate tool?
- What classifications / extractions / summaries would feel like magic?
Output:
1. The top 1-3 AI features with clear user value
2. The "why now" justification per feature
3. The cost-per-use ballpark per feature
4. The metric you''ll track (time saved, conversion lift, retention)
The biggest unforced error: shipping a "chatbot" because it''s easy. Most users don''t want to type to a chatbot; they want a button that does the work. The button + LLM-under-the-hood is more valuable than the chat for most product use cases.
2. Route Through a Gateway, Not Direct API Calls
A gateway gives you observability, failover, cost tracking, and provider portability. Don''t skip.
Help me design the gateway abstraction.
The pattern:
**Don''t**:
```ts
const completion = await openai.chat.completions.create({
model: 'gpt-5',
messages: [...]
})
Do:
import { generateText } from 'ai'
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-6', // routed through Vercel AI Gateway
system: getPrompt('summarize.system'),
prompt: userInput,
})
Gateway options (per AI Gateways):
- Vercel AI Gateway — bundled with Vercel; provider/model strings
- OpenRouter — multi-provider; model marketplace
- Cloudflare AI Gateway — Cloudflare-stack
- Portkey — full-featured; fallbacks; budgeting
- DIY proxy — own everything; more work
For most indie SaaS in 2026 on Vercel: Vercel AI Gateway with the AI SDK is the default. Use plain "provider/model" strings.
What the gateway gives you:
- Provider failover (OpenAI down → Anthropic)
- Per-feature cost tracking
- Rate limiting at the gateway layer
- Observability (per LLM observability)
- Caching of duplicate prompts
- Easier model swaps
Critical implementation rules:
- Never call provider SDKs directly in product code. Always go through gateway.
- Provider/model strings are the abstraction (e.g.,
"anthropic/claude-sonnet-4-6"). Code doesn''t know which provider; product config decides. - Default to AI SDK (per ai-sdk) for TypeScript / Node.
- Centralize the model-selection logic. A function
pickModel(featureName, complexity)that returns the right model string.
Cost-aware routing:
- Cheap models for simple tasks (classification, short summarization): GPT-5-mini, Claude Haiku 4.5, Gemini 2.5 Flash
- Mid-tier for most tasks: GPT-5, Claude Sonnet 4.6
- Top-tier for complex reasoning: Claude Opus 4.7, GPT-5 Pro
- Never default to top-tier for everything — burns money
Don''t:
- Hardcode provider names in product code
- Skip the gateway "for now"
- Pick top-tier models for tasks where mid-tier is fine
Output:
- The gateway choice
- The provider/model strings used per feature
- The model-selection logic
- The migration path if currently calling APIs directly
The single biggest engineering lever: **the gateway abstraction.** Once provider/model is a string in config, switching providers is a config change. Without it, switching is a multi-week migration. Pay the small upfront cost.
---
## 3. Manage Prompts as Code (or Configuration)
Prompts in raw string literals scattered across files = unmaintainable. Centralize.
Design prompt management.
The patterns:
Pattern A: Prompts in code (versioned)
// prompts/summarize.ts
export const SUMMARIZE_SYSTEM_PROMPT = `
You are an assistant that summarizes [content type].
- Output 2-3 bullets
- Each bullet under 15 words
- Plain text, no markdown
`.trim()
// usage
const { text } = await generateText({
model: pickModel('summarize'),
system: SUMMARIZE_SYSTEM_PROMPT,
prompt: userInput,
})
Pros:
- Version-controlled with code
- Type-safe
- Easy to test
Cons:
- Changes require deploys
- Non-engineers can''t edit
Pattern B: Prompts in observability tool (Langfuse, Braintrust, LangSmith)
const prompt = await langfuse.getPrompt('summarize-system')
const { text } = await generateText({
model: pickModel('summarize'),
system: prompt.compile({ contentType: 'meeting notes' }),
prompt: userInput,
})
Pros:
- Non-engineers can edit prompts (PMs, content writers)
- Versioning with rollback
- A/B testing prompts in production
- Prompt history visible in observability tool
Cons:
- Network call to fetch prompt (cache aggressively)
- Coupling to observability tool
Pattern C: Prompts in YAML/JSON config
# prompts.yaml
summarize:
system: |
You are an assistant that summarizes...
model: anthropic/claude-sonnet-4-6
temperature: 0.3
Pros:
- Version-controlled
- Easier for non-engineers to edit (still requires PR)
Cons:
- No live editing
- Less rich than full prompt-management tools
For most indie SaaS in 2026:
- Start with Pattern A (code)
- Move to Pattern B (Langfuse) once prompts are stable and non-engineers want to iterate
Critical implementation rules:
- Never inline prompts in route handlers mixed with business logic
- Version prompts explicitly (semver or date-based)
- Test every prompt — at minimum a smoke test that asserts a known input produces an expected shape of output
- Document the contract — what the prompt expects as input, what it produces as output
Prompt-engineering basics worth following:
- System prompt sets behavior ("You are X. You do Y. You output Z format.")
- User prompt is the data (the variable input)
- Examples in system prompt (1-3 few-shot examples improve consistency)
- Output format clarity ("Output JSON with keys A, B, C")
- Constraints help ("Do not include URLs.")
- Length specifications ("Each summary under 100 words.")
Don''t:
- Mix prompt building with business logic
- Skip prompt versioning (every prompt change is a deployment risk)
- Trust prompts to "just work" — test them
Output:
- The prompt-management approach (A / B / C)
- The prompt catalog (5-10 prompts with names, system prompts, expected outputs)
- The prompt-test suite (assertions per prompt)
- The prompt-versioning convention
The single biggest reliability win: **a snapshot test for each prompt.** Run input X, assert output matches shape Y. When someone changes a prompt and the test fails, they see the regression before customers do. Without it, prompt drift is invisible until support tickets accumulate.
---
## 4. Stream Responses for UX
LLM responses are slow. Streaming makes them feel fast. Use it everywhere user-facing.
Design streaming.
The pattern (with Vercel AI SDK):
// Server route
import { streamText } from 'ai'
export async function POST(req: Request) {
const { messages } = await req.json()
const result = streamText({
model: 'anthropic/claude-sonnet-4-6',
system: getPrompt('chat.system'),
messages,
})
return result.toUIMessageStreamResponse()
}
// Client
import { useChat } from 'ai/react'
function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: '/api/chat',
})
return (
<div>
{messages.map(m => <div key={m.id}>{m.content}</div>)}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} />
</form>
</div>
)
}
Benefits:
- Time-to-first-token is what feels fast (often <500ms)
- Total latency reduces perceived wait
- Users see the AI "thinking"
- Can cancel mid-generation
When NOT to stream:
- Structured output where partial JSON is unparseable
- Background jobs where the output goes to DB, not UI
- Very short responses (overhead exceeds benefit)
- Functions used for classification (small response; not a conversation)
For non-chat features (one-shot generation):
// Stream a single generation result
const { textStream } = streamText({
model: 'anthropic/claude-sonnet-4-6',
prompt: 'Summarize this meeting',
})
for await (const delta of textStream) {
// Append to UI
}
Critical implementation rules:
- Handle stream cancellation (user closes tab; clean up server resources)
- Show a stop button so users can interrupt
- Persist final result on completion (don''t lose generation if connection drops mid-stream)
- Handle errors gracefully (provider down → fallback to error message; don''t hang forever)
- Set timeouts (max 60-120s for chat; abort and surface error)
Cost implications of streaming:
- Streaming uses the same token count as non-streaming
- BUT: you can detect bad responses early and abort (saves tokens)
- And: users can interrupt off-topic responses (saves tokens)
Don''t:
- Skip streaming for chat / generation UX (will feel slow)
- Stream when it doesn''t help (background jobs)
- Forget cancellation handling
Output:
- The streaming endpoints
- The client integration
- The cancellation logic
- The error-handling
The biggest perceived-performance win: **streaming.** A 5-second non-streamed response feels broken; the same 5-second streamed response feels engaging. Streaming is required UX for any user-facing AI feature.
---
## 5. Enforce Per-Tier Quotas
AI calls are the most-expensive endpoint class. Quota them per tier (per [rate-limiting-abuse](rate-limiting-abuse-chat.md)).
Design AI quotas.
The pattern:
For each tier, define:
| Limit | Free | Pro | Business | Enterprise |
|---|---|---|---|---|
| AI generations / day | 10 | 500 | 5,000 | custom |
| Tokens / day | 50K | 5M | 50M | custom |
| AI cost cap / day | $0.10 | $5 | $50 | custom |
| Concurrent AI requests | 1 | 5 | 20 | custom |
Calculate from unit economics:
- Per-request cost: tokens × per-token price (varies by model)
- Per-customer monthly cost: per-request × monthly limit
- Subtract from tier revenue: must be positive margin
Implementation:
async function generateWithQuota(workspaceId: string, prompt: string) {
const usage = await getDailyAIUsage(workspaceId)
const limit = await getAILimit(workspaceId)
if (usage.cost >= limit.dailyCostCap) {
throw new Error('quota_exceeded')
}
const result = await generateText({...})
// Track usage
await recordAIUsage(workspaceId, result.usage)
return result
}
Quota dimensions worth tracking:
- Per-day call count (simple)
- Per-day token count (more accurate)
- Per-day cost (best aligned to your bill)
- Concurrent in-flight (prevents loops)
Friendly UX when quota hits:
- 80% used: subtle banner ("You''ve used 80% of your daily AI quota")
- 100%: blocking message ("AI quota reached for today. Upgrade or wait until [time]")
- Don''t show internal cost numbers; show "AI requests remaining"
Per-feature quotas:
Some features cost more than others:
- Image generation: 1 image = ~10x text cost; lower per-day limit
- Long generation (full report): higher token cost per call
- Vision (image understanding): higher input token cost
Different features can have different quotas; track per feature.
Kill switch for individual users:
If a single user racks up unusual cost (10x normal in 1 hour):
- Auto-pause AI for that user
- Notify support
- Manual review
Per rate-limiting-abuse-chat: the kill-switch protects against runaway costs.
Don''t:
- Skip quotas (you''ll find out when the bill arrives)
- Use a single global quota (per-user matters)
- Hide quota info from customers (transparency builds trust)
Output:
- The per-tier quota table
- The unit-economic calculation
- The quota-enforcement code
- The UX for "approaching" / "exceeded" states
- The kill-switch logic
The single biggest cost-protection: **per-user daily cost cap.** A user looping the AI accidentally racks up $200 in inference; your cap catches at $5. Without it, the bill is yours; with it, the user gets a polite "limit reached" message.
---
## 6. Evaluate Quality Before Deploying Prompt Changes
Prompts can regress invisibly. Run evals.
Design the eval workflow.
The pattern:
Build an eval dataset:
For each AI feature, collect:
- 20-50 example inputs
- Expected outputs (or scoring criteria)
- Edge cases that previously failed
Eval per prompt change:
When prompt is updated:
- Run new prompt against the dataset
- Score each output (per criteria)
- Compare against baseline (current production)
- Block deploy if score regresses
Scoring methods:
- Exact match: works for classification ("category X" expected; "category X" got)
- Semantic similarity: works for summaries (cosine similarity to expected; or LLM-judge)
- LLM-as-judge: another LLM scores the output 1-10 on criteria
- Hand-graded: small datasets where humans score
- Functional tests: "output must be valid JSON with keys A/B/C"
Tools (per LLM observability):
- Braintrust — eval-first
- Langfuse — evals included
- LangSmith — evals strong
- Custom — script that runs prompts against dataset
CI integration:
# .github/workflows/eval.yml
on: pull_request
jobs:
eval:
if: contains(github.event.pull_request.changed_files, 'prompts/')
steps:
- run: npm run eval
# Fail PR if eval score drops below threshold
Critical implementation rules:
- Test before every prompt change. Don''t skip "small fixes."
- Maintain the dataset. When customers report bad outputs, add them as eval cases.
- Set quality threshold per feature. Better to fail PRs that regress than to ship.
- Track quality over time. Plot eval scores; spot drift even when individual changes pass.
Don''t:
- Skip evals on "minor" prompt changes
- Trust that "it worked in testing" — production data is different
- Use the same eval cases that the prompt was written against (overfitting)
Output:
- The eval dataset structure
- The scoring methods per feature
- The CI workflow
- The quality threshold per feature
- The dataset-update process
The single biggest source of "the AI got worse" complaints: **prompt changes that regressed quality without anyone noticing.** Evals catch these before deploy. Without them, you find out via customer complaints — by then you''ve damaged trust.
---
## 7. Observe Production Traffic
Per [LLM observability providers](../../../VibeReference/ai-development/llm-observability-providers.md): instrument every AI call.
Design AI observability.
What to log per call:
- Feature name (which AI feature was used)
- User ID + workspace ID (subject)
- Model used
- System prompt name + version
- User prompt (consider PII redaction)
- Response text
- Tokens (input + output)
- Cost
- Latency
- Status (success / error)
- Error message if failed
- User feedback if collected (👍 / 👎)
Tools:
- Langfuse / LangSmith / Helicone (per the comparison)
- Or custom OTel pipeline
Dashboards to build:
- Per-feature volume over time
- Per-feature cost over time
- Per-user top consumers
- Per-prompt quality scores (from production user feedback)
- Latency distribution (p50, p95, p99)
- Error rate per feature
Alerts:
- Cost spike (single user or feature uses 10x normal)
- Error rate spike (provider issue or bug)
- Latency spike (something slow)
- Quality drop (if you have automated quality scoring)
The user-feedback layer:
Add 👍 / 👎 to AI outputs:
- Click thumbs-up: log positive feedback with the call ID
- Click thumbs-down: prompt for optional reason; log
- Aggregate over time → quality metric per prompt
- Failed cases → eval dataset addition
Privacy considerations:
- Don''t log raw user PII unnecessarily
- Truncate / hash sensitive content
- Honor account-deletion-data-export when users delete
Don''t:
- Skip logging "for performance" (the cost is tiny)
- Log to a file system you don''t monitor
- Forget to redact PII from logs (privacy compliance)
Output:
- The logging schema
- The observability tool integration
- The dashboard layout
- The user-feedback UI
- The privacy policy update
The single most-actionable production signal: **the 👍 / 👎 ratio per prompt over time.** A new prompt that drops from 85% positive to 65% positive over a week is regressing; investigate. Without user feedback, you''re flying blind on quality.
---
## 8. Handle Failures Gracefully
LLM providers go down. Models return junk. Plan for it.
Design failure handling.
The patterns:
Provider outages:
- Primary: Anthropic Claude
- Fallback 1: OpenAI GPT-5
- Fallback 2: Google Gemini
Gateway-managed (Vercel AI Gateway, OpenRouter, Portkey) handles failover automatically.
Quality failures (output is malformed):
- Validate output structure before showing to user
- Retry once with same prompt (variance often gives better result)
- Retry with a "be careful about format" instruction
- Fall back to a non-AI default
async function summarizeWithFallback(input: string) {
try {
const result = await generateText({
model: 'anthropic/claude-sonnet-4-6',
system: getPrompt('summarize.system'),
prompt: input,
})
if (!isValidSummary(result.text)) {
// Retry once
const retry = await generateText({...})
if (!isValidSummary(retry.text)) {
// Fall back to non-AI summary
return truncate(input, 200)
}
return retry.text
}
return result.text
} catch (error) {
// Provider down; fall back
return truncate(input, 200)
}
}
Latency failures:
- Set timeouts (60s for chat; 30s for one-shot generation)
- Show progress UI during long generations
- Allow cancellation
- After timeout: gracefully fall back
Hallucination handling:
- Some features can detect hallucinations (e.g., extracting data from a doc — verify against source)
- Add citation requirements ("output must include source") when accuracy is critical
- Use retrieval-augmented generation (RAG) when factuality matters
- Display confidence levels when available
The "model can''t" cases:
Sometimes the model genuinely can''t do what you''re asking:
- Model returns "I cannot help with that" → trap and fall back
- Model refuses (safety filter) → log; consider prompt change
Don''t:
- Trust LLM output without validation
- Show malformed output to users
- Skip the fallback path
- Give up on "the AI just doesn''t work today"
Output:
- The validation logic per feature
- The fallback hierarchy
- The timeout policy
- The hallucination-detection approach
The biggest user-trust signal: **graceful degradation when the AI fails.** A user who sees "AI is taking longer than usual; here''s a non-AI version while we retry" trusts the product. A user who sees a hung spinner and eventually a 500 error doesn''t.
---
## 9. Pick the Right Model for the Job
Top-tier models for everything = burns money. Tier the model selection.
Design model selection.
The pattern:
Cheap / fast (most tasks):
- Claude Haiku 4.5 — extremely fast and cheap; fine for classification, short summaries, extraction
- GPT-5-mini — competitive with Haiku
- Gemini 2.5 Flash — Google''s cheap-fast option
Use for: 70-80% of AI tasks in indie SaaS. Most "smart features" don''t need top-tier reasoning.
Mid-tier (default):
- Claude Sonnet 4.6 — strong default; good at most tasks
- GPT-5 — equivalent class
- Gemini 2.5 Pro — equivalent
Use for: chat interfaces, longer summaries, content generation, moderately complex reasoning.
Top-tier (specific needs only):
- Claude Opus 4.7 — best reasoning; expensive
- GPT-5 Pro / o1 — equivalent
- Gemini 2.5 Ultra — equivalent
Use for: complex multi-step reasoning, code generation that''s actually hard, research-grade tasks.
Specialized:
- Embeddings (text-embedding-3-small / cohere-embed-v3 / voyage-3) for vector search
- Vision (Claude Sonnet 4.6 vision / GPT-5 vision) for image understanding
- Audio (Whisper / Gemini audio) for transcription
- Image gen (Recraft / Flux / DALL-E 3) for image creation
Selection logic in code:
function pickModel(feature: string, complexity: 'simple' | 'medium' | 'complex' = 'medium') {
const tiers = {
classify: 'anthropic/claude-haiku-4-5',
summarize: 'anthropic/claude-sonnet-4-6',
chat: 'anthropic/claude-sonnet-4-6',
research: 'anthropic/claude-opus-4-7',
extract: 'anthropic/claude-haiku-4-5',
}
return tiers[feature] || 'anthropic/claude-sonnet-4-6'
}
A/B test models:
- For each feature, periodically test cheaper model
- If quality matches, switch and save cost
- If quality regresses, stay
- Use evals (per step 6) to verify
Don''t:
- Default to top-tier for everything (burns money)
- Use cheap model for tasks requiring complex reasoning (poor quality)
- Hardcode model in product code (use the gateway abstraction)
Output:
- The model-selection function
- The tier-to-feature mapping
- The A/B test plan
- The cost-vs-quality target per feature
The single biggest cost optimization: **using cheap models for simple tasks.** A team using top-tier for classification might spend 10x more than necessary. Run cheaper models against your evals; switch where quality is equivalent. Most teams overspend by 3-5x on model selection alone.
---
## 10. Quarterly Review
AI features rot. Quarterly review keeps them sharp.
Quarterly AI feature review.
Cost:
- Per-feature cost trend
- Per-tier cost vs revenue (margin per AI feature)
- Top users by cost (anomalies?)
- Provider mix (failovers triggered? cost-shifted?)
Quality:
- 👍 / 👎 ratio per prompt over time
- Eval scores per prompt
- Customer-reported AI quality issues
- Prompts that need updates
Performance:
- Latency per feature (p50, p95, p99)
- Streaming reliability (cancellations, errors)
- Provider error rates
Adoption:
- Per-feature usage rate
- Features that nobody uses (kill them)
- Features users want that don''t exist (build them)
Model updates:
- New model releases that could replace current
- Cheaper models for tasks where quality is sufficient
- Specialized models worth piloting
Output:
- Snapshot per feature
- 1-2 prompt improvements
- 1 model change (if cost / quality justifies)
- 1 feature to deprecate or improve
---
## What "Done" Looks Like
A working AI-feature implementation in 2026 has:
- Clear product value per feature (no chatbot-for-the-sake-of-it)
- Gateway abstraction (provider/model strings, not direct SDK calls)
- Versioned prompt management with snapshot tests
- Streaming responses for user-facing features
- Per-tier quotas with kill-switch protection
- Eval workflow blocking regressing PRs
- Production observability with user-feedback signals
- Graceful failure handling with fallbacks
- Tiered model selection (cheap for simple; top-tier only when needed)
- Quarterly review baked into the team rhythm
The hidden cost in AI features isn''t the model bill — it''s **the trust damage from bad outputs that nobody noticed before customers did**. A team without observability and evals ships prompt regressions without warning. The discipline of "test every prompt; observe every call; fail fast on quality drops" turns AI from a liability into an asset. The infrastructure is the platform; the discipline makes it work.
---
## See Also
- [LLM Cost Optimization](llm-cost-optimization-chat.md) — companion topic
- [LLM Quality Monitoring](llm-quality-monitoring-chat.md) — companion topic
- [Rate Limiting & Abuse](rate-limiting-abuse-chat.md) — AI endpoints are highest-cost abuse vector
- [Multi-Tenant Data Isolation](multi-tenancy-chat.md) — workspace context for AI
- [API Keys & PATs](api-keys-chat.md) — programmatic AI access
- [Audit Logs](audit-logs-chat.md) — high-cost AI events logged
- [PostHog Setup](posthog-setup-chat.md) — track AI feature usage
- [Activation Funnel](activation-funnel-chat.md) — AI features drive activation
- [LLM Observability Providers](https://www.vibereference.com/ai-development/llm-observability-providers) — Langfuse / LangSmith / Helicone
- [AI Gateways](https://www.vibereference.com/cloud-and-hosting/ai-gateways) — gateway choice
- [Vercel AI Gateway](https://www.vibereference.com/cloud-and-hosting/vercel-ai-gateway) — Vercel''s offering
- [AI SDK](https://www.vibereference.com/ai-development/ai-sdk) — TS / Node SDK
- [AI SDK Core](https://www.vibereference.com/ai-development/ai-sdk-core) — generateText / streamText
- [Claude](https://www.vibereference.com/ai-models/claude.md) — Claude model details
- [Vector Databases](https://www.vibereference.com/backend-and-data/vector-databases) — for RAG
- [AI Memory Architecture Decision Framework](https://www.vibereference.com/ai-development/ai-memory-architecture-decision-framework) — for memory features
[⬅️ Growth Overview](README.md)