AI Streaming Chat UI
If you're building a B2B SaaS in 2026 with an LLM-powered chat interface — copilot, support agent, code assistant, customer-facing AI — the chat UI is one of the most-visible parts of your product. Users compare to ChatGPT, Claude, and Cursor. The naive approach:
1. Pick chat library / framework
Pick AI chat stack.
Vercel AI SDK (recommended for Next.js):
- @ai-sdk/react: useChat hook
- @ai-sdk/* providers
- Streaming + tool calls + structured output
- Best-in-class for Next.js
- See vercel-ai-sdk skill
Vercel Chat SDK:
- Higher-level: full chat experience
- Multi-platform (Slack, Telegram, Teams, Discord, GitHub, Linear)
- Built on AI SDK
- See vercel:chat-sdk skill
LangChain.js:
- Generic JS LLM framework
- Chat-specific helpers
- Heavier; more options
Custom (DIY):
- fetch + ReadableStream + SSE / WebSocket
- Full control
- Most work
Components:
- shadcn-ui Chat (preview / community)
- assistant-ui (open-source chat components)
- llamaindex Chat UI
For 2026 React stack:
- Vercel AI SDK + assistant-ui OR shadcn-chat for components
- Best balance of control + speed
Output:
1. Stack recommendation
2. Library choices
3. Custom components vs library
4. Bundle size
5. SSR / streaming considerations
The 2026 default for Next.js: Vercel AI SDK + assistant-ui. Streaming, tool calls, attachments, message persistence — all handled.
2. Token streaming — the table-stakes UX
Without streaming, AI chat feels broken in 2026.
Implement token streaming.
Streaming protocols:
Server-Sent Events (SSE):
- HTTP-based; one-way (server → client)
- Works through HTTP/2
- Simple; widely supported
- Vercel AI SDK default
WebSocket:
- Full duplex
- More overhead
- Better for bidirectional (rare for chat)
Client implementation (with AI SDK):
import { useChat } from '@ai-sdk/react';
function Chat() {
const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
api: '/api/chat'
});
return (
<>
{messages.map(m => (
<Message key={m.id} role={m.role} content={m.content} />
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} disabled={isLoading} />
</form>
</>
);
}
Server (Next.js Route Handler):
import { streamText } from 'ai';
export async function POST(req: Request) {
const { messages } = await req.json();
const result = streamText({
model: 'anthropic/claude-sonnet-4-6', // via Vercel AI Gateway
messages,
});
return result.toDataStreamResponse();
}
Cursor / typing indicator:
- During streaming: show blinking cursor at end of last message
- "▋" character after streaming text
- CSS animation: 1s blink
Smooth scrolling:
- Auto-scroll to bottom as tokens arrive
- Pause if user scrolled up (don't fight them)
- Resume on new message
Render markdown progressively:
- See markdown-rendering-sanitization-chat
- Auto-close incomplete syntax (** without close)
- Re-render on each chunk (memo previous messages)
Performance:
- Memoize all but actively-streaming message
- Avoid re-rendering message list on each token
Output:
1. Streaming protocol (SSE recommended)
2. Library setup
3. Cursor / typing indicator
4. Smooth scroll handling
5. Markdown rendering
The blinking-cursor detail: small touch; signals "actively generating." Without it, users can't tell if it's done. Take the 5 minutes to add it.
3. Message types — beyond plain text
Modern AI chat has many message types.
Render message types.
Text messages:
- User: right-aligned bubble
- Assistant: left-aligned; markdown rendered
- System: special styling (rare to show)
Code blocks:
- Syntax highlighting (Prism / Shiki)
- Language detection
- Copy button (top-right of block)
- Possibly: inline run (Pyodide, etc.)
Tool calls:
- "Calling tool: search('rate limiting')"
- Show with icon + status (running / done / error)
- Collapsible to see args + output
- Useful for transparency
Tool results:
- "Found 5 results"
- Structured display (table, list, etc.)
Function calls (legacy):
- Similar to tool calls
Images:
- User-uploaded: show inline
- AI-generated: render with caption
- Modal on click for fullscreen
Files:
- File card: name + size + type
- Click to download or preview
Charts / visualizations:
- AI-generated chart
- Embedded as image or interactive component
Citations:
- "Source: [Doc Name]" with links
- Hover for excerpt
- See AI customer support agent
Suggestions / quick replies:
- 2-3 button suggestions below assistant message
- "Tell me more" / "Show example" / "Done"
Errors:
- "Failed to generate" with retry button
- Error type badge
Streaming-status:
- "Searching docs..." (during tool call)
- "Generating response..."
For [USE CASE], output:
1. Message types you'll render
2. Per-type component
3. Tool-call rendering pattern
4. Citation display
5. Mobile considerations
The tool-call transparency rule: show users when AI is using a tool. Hidden tool calls feel magical when they work; mysterious when they fail. Transparency builds trust.
4. Message list — virtualization + memoization
Chat history can grow large. Plan performance.
Optimize message list rendering.
Virtualization:
- Don't render off-screen messages
- Use react-virtuoso or react-window
- Especially important for 100+ message threads
Memoization:
- Each message component memoized
- Re-render only the actively-streaming message
- Avoid layout thrash
Smooth scroll:
Anchor scroll to bottom:
- New message → scroll to bottom
- User scrolled up → don't auto-scroll (let them read)
- "New messages" indicator if scrolled up
Scroll restore:
- Returning to chat: restore scroll position
- Or: scroll to bottom
Lazy load history:
- Initial load: latest 50 messages
- Scroll up → fetch older 50
- Cursor pagination
Persistence:
- Messages saved to DB per conversation
- Conversation list in sidebar
- See multi-tenancy
Optimistic UI:
- User sends message → appears immediately
- AI response streams in
- On error: revert + show retry
Output:
1. Virtualization setup
2. Memoization strategy
3. Smooth-scroll behavior
4. Lazy-load history
5. Optimistic update
The virtuoso library is 2026 default for chat UIs. Handles auto-scroll, anchor at bottom, virtualization, and lazy load gracefully.
5. Input UX — composing messages
The input matters as much as the output.
Design chat input.
Input behavior:
Auto-resize textarea:
- Grows with content (1-N lines)
- react-textarea-autosize library
- Max-height before scroll
Multi-line:
- Enter sends (default for chat)
- Shift+Enter for newline
- Cmd+Enter alternative
Send button:
- Right side; disabled when empty / loading
- Replaces with stop button during streaming
Stop / cancel:
- Stop streaming response
- AI SDK: stop() function
- User regret: "Wait, that's not right"
Keyboard shortcuts:
- Up arrow: edit last message
- Esc: clear input
Attachments:
- File picker button (paperclip icon)
- Drag-drop onto chat
- Show attachment thumbnails above input
- Remove (X) on attachment
Slash commands (advanced):
- /clear, /summarize, /search
- Show menu when "/" typed
Mentions:
- @user (in team chat)
- @doc / @file (Notion-style references)
Persona / model picker:
- Dropdown to choose model (GPT-4o vs Claude vs custom persona)
- For products with multiple options
Voice input:
- Microphone icon
- Web Speech API or Whisper
Mobile:
- Auto-focus on tap
- Native keyboard
- Send button enlarged for thumb
- File picker uses native sheet
Output:
1. Input component
2. Send / stop logic
3. Attachment handling
4. Keyboard shortcuts
5. Mobile UX
The Cmd+Enter alternative: power users prefer Cmd+Enter over Enter. Some products: Enter sends; Cmd+Enter sends. Some products: opposite. Pick a default; allow toggle.
6. Regenerate, edit, and branching
Beyond send-and-receive, modern chat supports:
Implement regenerate / edit / branch.
Regenerate:
- Button below assistant message
- Generates new response with same input
- Replaces current OR appends as variant
- Useful when response is bad
Edit user message:
- Click pencil on user message
- Edit + resubmit
- Re-runs from that point (deletes downstream)
Branching (advanced):
- Edit creates new branch
- Original branch preserved
- UI: dropdown to switch branches
- Used by: ChatGPT, Claude
Implementation:
Server side:
- POST /chat with messages array including edited message
- Streaming response from edit point
- Old branch optionally archived
UI:
- Per-message hover actions (edit / regenerate / copy)
- Branch indicator if multiple variants
- Switch between variants
State management:
- conversation has many branches
- Each branch is a path through messages
- Tree structure or linear with parent_id
Anti-patterns:
- No regenerate (users stuck with bad response)
- Edit without proper history
- Confusing branching UX
Output:
1. Regenerate implementation
2. Edit-and-resubmit flow
3. Branching (if applicable)
4. State model
5. UI for switching branches
The regenerate pattern is now table-stakes. Every modern chat UI has it. Without it, users feel stuck with bad responses.
7. Attachments + file handling
Users upload files; AI processes them.
Handle file attachments.
Upload UX:
- Drag-drop on chat area
- Paperclip icon for file picker
- Paste image from clipboard
- Show thumbnails before send
File types:
- Images: thumbnail + full view on click
- PDFs: icon + filename + size
- Other: icon + filename + size
Preview:
- Hover over thumbnail: tooltip with name
- Click: modal for fullsize image / PDF preview
- See file-preview-document-viewer-chat
Server processing:
- Image: send to vision LLM (Claude / GPT-4o)
- PDF: extract text first (see document-parsing-ocr)
- Other: embed via RAG
Limits:
- File size (5-50 MB depending on context)
- File type allowlist
- Per-message limit (e.g., 5 attachments)
Inline rendering:
- AI can reference: "I see in the image..."
- Show attachment inline in user message
Errors:
- Upload failed: retry
- File too large: clear error
- Unsupported type: clear message
Multi-file:
- Allow selecting multiple
- Show all thumbnails
- Bulk remove
Mobile:
- Camera capture for images
- File picker for documents
- Native sheet UX
Output:
1. Upload UX
2. File-type handling
3. Preview component
4. Server processing pipeline
5. Mobile considerations
The image-paste-from-clipboard: power users love it. Cmd+V → screenshot pastes into chat. Easy to implement; high satisfaction.
8. Error states + recovery
AI fails. UX fails gracefully.
Handle AI chat errors.
Error types:
Rate limit:
- "Too many requests. Wait 30s."
- Auto-retry after delay
- Show countdown
Model unavailable:
- "Service temporarily unavailable"
- Fallback to alternative model (if configured)
- Retry button
Context too long:
- "Conversation too long; start fresh"
- Suggest: summarize + new chat
- Or auto-truncate older messages (with notice)
Inappropriate content (filter triggered):
- "Can't help with that request"
- Per-policy explanation
- Don't be hostile
Network failure:
- "Network error; retry"
- Auto-retry with backoff
- Cache user message; don't lose
Streaming interrupted:
- "Response cut off"
- Continue / regenerate options
Generic 500:
- Apology + retry
- Log to error monitoring (Sentry)
- Don't expose internal errors
UI patterns:
- Error in-message (replaces "..." indicator)
- Or: toast for transient
- Retry button visible
User input preservation:
- Failure doesn't lose user's message
- Repopulate input or keep visible
Output:
1. Error type taxonomy
2. Per-type handling
3. Retry / fallback logic
4. Input preservation
5. Logging / observability
The "preserve user message on error" rule: if AI errors, user's message stays in input or visible. Otherwise they retype the whole thing. Frustrating.
9. Performance + cost
AI chat is expensive. Optimize.
Optimize chat performance + cost.
Server-side:
Stream from start:
- Don't buffer; pipe LLM stream to client
- First-token latency matters
Caching:
- Cache identical prompts (rare for chat)
- Cache tool results (search queries)
- Use Vercel Runtime Cache
Context management:
- Don't send unbounded history
- Truncate old messages OR
- Summarize older context (memory)
- Token-count aware
Tool calls:
- Parallel where possible
- Timeout long-running tools
- Stream partial results if possible
Provider routing:
- Vercel AI Gateway: failover across providers
- Cheap model for simple queries; expensive for complex
Cost optimization:
- Per-user quotas (see quotas-limits-plan-enforcement)
- Track tokens per request
- Alert on cost spikes
Caching at API level:
- Anthropic prompt caching: save 90% on repeated context
- 5-min TTL; ideal for system prompts
- See claude-api skill
Frontend:
Bundle size:
- AI SDK + assistant-ui ≈ 60KB
- Reasonable
Memoization:
- Memo all messages except streaming
- useMemo on message list
Render budget:
- 60fps during streaming
- Profile if slow
Network:
- Compression on stream
- HTTP/2 multiplexing
Output:
1. Server streaming setup
2. Context management
3. Provider routing
4. Cost monitoring
5. Frontend perf
The Anthropic prompt caching insight: huge saving on repeated system prompts. If you have 5K-token system prompt, prompt cache can drop costs by 90%. Use it.
10. Accessibility
Chat UIs are notoriously inaccessible. Don't be.
Make AI chat accessible.
Required:
Screen reader:
- Each message has role (user / assistant)
- aria-live="polite" on chat region for new messages
- Streaming text: chunk announcements (don't spam SR with every token)
Keyboard:
- Tab through messages
- Send on Enter
- Esc to close
- Arrow keys to navigate messages
Focus:
- After send: focus stays in input
- After AI response: focus stays in input (most users)
- Or: focus on response (some prefer)
Color independence:
- Don't rely on color alone (user vs assistant)
- Use position + role labels
Motion:
- prefers-reduced-motion: disable cursor blink
- Or: keep but slow
Voice (optional):
- Read responses aloud
- Web Speech API or Cloud TTS
Errors:
- Announce via aria-live="assertive"
- Clear actionable message
Test:
- VoiceOver / NVDA
- Keyboard-only
- Lighthouse / axe-core
Common failures:
- Streaming text: SR announces every token (overwhelming)
- No way to skip past long messages
- Focus lost after send
Output:
1. ARIA pattern
2. Keyboard navigation
3. Streaming announcements (debounce)
4. Reduced motion
5. Test plan
The streaming-announcement throttle: announce every 50-200 chars to screen readers, not every token. Otherwise overwhelming.
What Done Looks Like
A v1 AI streaming chat UI for B2B SaaS in 2026:
- Vercel AI SDK + assistant-ui or shadcn-chat
- Token streaming with cursor indicator
- Markdown rendering (sanitized)
- Tool-call display (transparency)
- Code block with syntax highlight + copy
- Auto-resize input with send + stop
- Regenerate + edit user message
- File attachments with previews
- Error states with retry + input preservation
- Message list virtualization
- Smooth scroll (don't fight user)
- Accessibility (ARIA + keyboard + reduced motion)
- Mobile-friendly UX
Add later when product is mature:
- Branching conversations
- Voice input / output
- Slash commands
- @mentions for docs / users
- Multi-model picker
- Citations + sources
- Inline charts / visualizations
- Real-time collaboration
The mistake to avoid: non-streaming responses. Feels broken in 2026; users wait 30s for full response.
The second mistake: no regenerate button. Users stuck with bad responses.
The third mistake: lose user input on error. Frustrating; retypes.
See Also
- AI Features Implementation — strategy (companion)
- RAG Implementation — RAG-backed chat
- Markdown Rendering & Sanitization — render LLM output
- LLM Cost Optimization — cost
- LLM Quality Monitoring — quality
- Comments, Threading & @Mentions — adjacent UX
- File Preview & Document Viewer — attachment preview
- File Uploads — upload pipeline
- Empty States, Loading & Error States — error states
- Toast Notifications UI — error toasts
- Real-Time Collaboration — adjacent realtime
- WebSocket / SSE Implementation — streaming protocols
- Performance Optimization — perf
- Quotas, Limits & Plan Enforcement — usage limits
- VibeReference: AI SDK — Vercel AI SDK
- VibeReference: AI SDK UI — UI hooks
- VibeReference: AI SDK Core — Core
- VibeReference: Anthropic Claude — Claude
- VibeReference: OpenAI GPT — GPT
- VibeReference: AI Gateways — AI gateway
- VibeReference: Vercel AI Gateway — Vercel AI Gateway
- VibeReference: AI Customer Support Agents — support-agent integration
- VibeReference: shadcn/ui — components