Real-Time Collaboration: Add Multiplayer Without Reinventing CRDTs
Real-Time Collaboration Strategy for Your New SaaS
Goal: Ship multiplayer collaboration that feels Linear-fast — live cursors, real-time updates, conflict-free edits, presence indicators, and offline tolerance — without spending six months building CRDTs from scratch. Use a managed real-time provider (Liveblocks / Yjs + Hocuspocus / PartyKit / Convex / Supabase Realtime) and focus on product UX. Avoid the failure modes where founders write a custom WebSocket protocol with optimistic-update hacks (six bug categories you've never heard of), build CRDTs from scratch (a year of engineering), or skip collaboration entirely until a competitor ships it (now you're behind).
Process: Follow this chat pattern with your AI coding tool such as Claude or v0.app. Pay attention to the notes in [brackets] and replace the bracketed text with your own content.
Timeframe: Presence + cursors shipped in week 1. Real-time updates on a few entities in week 2. CRDT-backed text editing (if needed) in weeks 3-4. Quarterly review baked in.
Why Most Founder Real-Time Is Broken
Three failure modes hit founders the same way:
- Polling pretending to be real-time. Founder ships "real-time" by polling the server every 5 seconds. Updates appear with a noticeable delay; the database melts under N users × 12 polls/min; the experience feels broken even though the engineering looks fine.
- Custom WebSocket protocol with last-write-wins. Founder builds bespoke WebSocket events. Two users edit the same field; the second overwrites the first. Cursor positions get lost. Reconnect after disconnect drops 30 seconds of state. Each bug feels small; together they ruin trust.
- Building CRDTs from scratch. Founder reads about Yjs and decides to roll their own. Six months later they have a half-working implementation that''s slower than Yjs and missing the ecosystem. The product side stalls.
The version that works is structured: pick a real-time provider that solves the hard parts (websockets, presence, CRDTs, offline), focus engineering on the product surface (which entities are multiplayer, what UI affordances), and treat collaboration as a feature you ship — not infrastructure you build.
This guide assumes you have already done Authentication (collaboration is user-scoped), have shipped Multi-Tenant Data Isolation (workspaces are the collaboration boundary), have shipped Roles & Permissions (RBAC) (collaborators get appropriate access), have considered Database Providers (CRDT or events persist somewhere), and have shipped Audit Logs (collaboration events are auditable).
1. Decide What Needs to Be Real-Time
Not everything needs to be multiplayer. Decide deliberately.
Help me decide which surfaces are real-time.
The categories:
**Category 1: True multiplayer (CRDTs needed)**
- Collaborative editing of one document by multiple users simultaneously
- Examples: Google Docs, Linear comment threads, Notion blocks, Figma canvas
- Requires conflict-free merging
- Hardest to implement; biggest UX win where it matters
**Category 2: Real-time updates (last-write-wins acceptable)**
- One user changes; others see the change quickly
- No simultaneous editing of same field
- Examples: Trello card moves, Slack messages appearing, dashboard updates
- Easier; can use simple pub/sub
- Works for most product features
**Category 3: Presence (who''s here)**
- Show which users are currently viewing
- Show what they''re focused on (which doc, which view)
- Lightweight; ephemeral
- Easy win; high product feel
**Category 4: Live cursors**
- Show other users'' mouse positions in canvas / document
- Subset of presence; specific UI
- Useful for design tools, whiteboarding, document editing
**Category 5: Just polling**
- Some "real-time" needs are actually fine with 30s polling
- Examples: notification badges, "new comments" counts
- Cheap; reliable; sometimes the right answer
**For my product, decide per surface**:
For each surface (document edit, dashboard view, comment thread, etc.):
- Which category fits?
- Is real-time actually needed, or would a manual refresh be acceptable?
- What''s the user-volume per surface? (1-2 users editing → simple. 50+ → harder.)
**Don''t**:
- Make everything multiplayer "for the future" — adds complexity for no value
- Skip presence on collaborative surfaces — it''s the cheapest win
- Force CRDTs where last-write-wins is fine
Output:
1. The per-surface categorization
2. The "real-time" decision per surface
3. The phasing plan (presence first, then updates, then CRDTs if needed)
The biggest unforced error: building CRDTs for surfaces that don''t need them. A Trello-style board where "user moves card" is a discrete action doesn''t need CRDTs. Pub/sub + optimistic UI is enough. CRDTs are for "two users typing in the same paragraph."
2. Use a Real-Time Provider
Building from scratch is years of work. Pick a provider; ship in days.
Help me pick the real-time provider.
The leaders in 2026:
**Liveblocks**
- Hosted real-time + CRDTs (built on Yjs)
- Strong primitives: presence, cursors, threads, comments, document storage
- $24+/mo Starter; usage-based above
- Ideal for indie SaaS adding collaboration features
- Good React / Next.js / TypeScript support
- Handles infra (websockets, scaling, persistence)
**Yjs + Hocuspocus / y-sweet**
- OSS CRDT engine (Yjs) + hosted backend (Hocuspocus, y-sweet)
- More work to set up than Liveblocks; more control
- y-sweet (by Drifting in Space) is a managed Hocuspocus
- Good for teams comfortable running infra
**PartyKit (now part of Cloudflare)**
- Cloudflare-based real-time runtime
- Durable Objects underneath
- OSS framework; Cloudflare hosts
- Good for Cloudflare-stack teams
- Lighter than Liveblocks; build more yourself
**Convex**
- Backend-as-a-service with real-time built in
- Subscriptions auto-update React queries
- Strong if you''re using Convex as your database
- Less specialized than Liveblocks but more bundled
**Supabase Realtime**
- Postgres + WebSocket-based pub/sub
- Free tier on Supabase
- Lighter than CRDT-native tools — best for "real-time updates" not "true multiplayer"
- Good if you''re already on Supabase
**Pusher / Ably / Soketi**
- Pub/sub-as-a-service (last-write-wins style)
- Mature; widely supported
- Good for Category 2 (real-time updates) and Category 3 (presence)
- Soketi is OSS; self-hostable
- Not for true multiplayer (Category 1)
**Decision criteria**:
- Need true multiplayer (collaborative document editing)? → Liveblocks or Yjs/Hocuspocus
- Need real-time updates without conflicts? → Convex / Supabase Realtime / Pusher
- Already on Cloudflare? → PartyKit
- Already on Supabase? → Supabase Realtime
- Already on Convex? → Convex (built in)
- Want OSS / self-host? → Yjs + Hocuspocus, or Soketi
**For most indie SaaS in 2026**:
- Adding collaboration to an existing product: **Liveblocks** (fastest to ship)
- New product on Convex / Supabase: their built-in tools
- Cost-conscious + comfortable with infra: Yjs + Hocuspocus self-hosted
For my product, ask:
- Do I need true multiplayer or just real-time updates?
- What stack am I already on?
- Build vs buy time-cost?
Output:
1. The provider choice with reasoning
2. The fallback if that doesn''t work
3. The integration scope (which surfaces use it)
The single biggest engineering-time saver: using a managed CRDT provider like Liveblocks instead of building. Yjs is great as a library but the websocket-server / scaling / persistence layer is real work. Liveblocks handles all that for $24+/mo.
3. Ship Presence First
Presence is the cheapest collaborative-feeling win. Deliver it before anything harder.
Design presence.
The pattern:
**What presence shows**:
- Avatar / initials of users currently in the workspace / page
- Optionally: their current focus (which page, which item)
- Stack of avatars when many users (limit display to 3-5; "+N more")
- Optional: status (typing, idle, etc.)
**Implementation (Liveblocks example)**:
```tsx
import { useOthers, useMyPresence } from '@liveblocks/react'
function Header() {
const others = useOthers()
const [myPresence, updateMyPresence] = useMyPresence()
return (
<div className="flex">
{others.slice(0, 5).map(({ connectionId, info }) => (
<Avatar key={connectionId} src={info.avatar} name={info.name} />
))}
{others.length > 5 && <span>+{others.length - 5}</span>}
</div>
)
}
Presence data shape:
Keep it small. Examples:
name,avatar,email(display)currentPage(where they are)currentSelection(what they''ve selected, if relevant)isTyping(boolean)
Don''t put more than ~1KB; it''s broadcast frequently.
Critical implementation rules:
- Presence is ephemeral. No persistence; gone when user disconnects.
- Heartbeat every 30s to detect zombies; remove disconnected users.
- Throttle updates. Don''t broadcast every mouse move (next section); throttle to 30Hz max.
- Privacy: respect who can see whom. Per RBAC: a Viewer shouldn''t see Admins'' presence on admin-only pages.
UX details:
- Hover an avatar → show name + role + current page
- Click an avatar → jump to where they are
- Subtle online indicator (green dot)
- Don''t over-emphasize — presence is decoration, not navigation
Don''t:
- Show every user that''s ever connected (only currently online)
- Use presence as authentication signal (it''s display, not auth)
- Skip the room concept (presence is per-page or per-document, not global)
Output:
- The presence data shape
- The avatar-stack UI
- The privacy filter
- The heartbeat / disconnect logic
The single biggest "feels modern" UX win: **presence avatars in the header.** Three lines of code; one $24/mo Liveblocks plan; and your product feels like Linear instead of like 2015 SaaS.
---
## 4. Add Live Cursors (Where Useful)
Live cursors are a presence subset. Useful in canvas / document / spreadsheet products. Skip elsewhere.
Design live cursors.
The pattern:
Where cursors make sense:
- Document / text editing
- Whiteboard / canvas / design tools
- Spreadsheets (cell selection)
- Map products (location markers)
Where cursors are noise:
- Standard CRUD lists / tables
- Settings pages
- Nav-heavy apps
Implementation (Liveblocks example):
function Cursor() {
const others = useOthers()
const [, updateMyPresence] = useMyPresence()
return (
<>
<div onMouseMove={e => {
updateMyPresence({ cursor: { x: e.clientX, y: e.clientY } })
}}>
...
</div>
{others.map(({ connectionId, presence, info }) => (
presence.cursor && (
<Cursor
key={connectionId}
x={presence.cursor.x}
y={presence.cursor.y}
color={info.color}
name={info.name}
/>
)
))}
</>
)
}
Cursor display:
- Colored cursor arrow per user (color from a palette of 8-12 distinguishable colors)
- Name label next to cursor (small, fades after 2s of no movement)
- Smooth interpolation between updates (don''t jump pixel-by-pixel — interpolate)
Performance:
- Throttle position updates to ~30Hz
- Use
requestAnimationFramefor smooth rendering - Hide cursors that are off-screen (don''t render)
Critical implementation rules:
- Smooth, not janky. Interpolation is what makes cursors feel real-time.
- Privacy. Cursors expose where users are looking; respect doc-level permissions.
- Mobile: cursors don''t apply; hide on touch devices.
- Identifiable colors. Persistent per user (so the same user always has the same color across sessions).
Don''t:
- Add cursors to every page (overkill; distracting)
- Skip the name label (color alone is hard to identify with 5+ users)
- Send raw mouse-move events (throttle is mandatory)
Output:
- The cursor component
- The color palette and assignment
- The throttling logic
- The interpolation between updates
The biggest UX gotcha: **cursors that lag visibly.** Without smooth interpolation, you see them jump every 30ms. With interpolation, they glide. The difference between "feels broken" and "feels magic" is interpolation.
---
## 5. Real-Time Updates (Last-Write-Wins)
For non-CRDT surfaces, real-time updates use simple pub/sub. Easy to implement; high value.
Design real-time updates.
The pattern:
The flow:
- User A makes a change (creates a record, edits a field)
- Change is persisted to the database
- Server broadcasts an event over the realtime channel:
{type: 'document.updated', id: '...', fields: {...}} - All users in the same room receive the event
- Their UI updates locally (optimistic merge or refetch)
Last-write-wins handling:
When two users edit the same field within a short window:
- Both updates persist; the later one wins
- The earlier user sees their value get replaced
- Show a toast: "Alice changed [field] to [new value]"
This is fine for:
- Card moves on a Trello-style board
- Status changes
- Typed fields where users rarely overlap
This is NOT fine for:
- Long text editing (use CRDTs)
- Free-form canvases (use CRDTs)
Implementation patterns:
With Supabase Realtime:
const channel = supabase
.channel(`workspace:${workspaceId}:documents`)
.on('postgres_changes', { event: '*', schema: 'public', table: 'documents' },
(payload) => updateDocumentLocally(payload.new))
.subscribe()
With Convex:
const documents = useQuery(api.documents.list, { workspaceId })
// Auto-updates when documents change
With Liveblocks:
const storage = useStorage(root => root.documents)
// Auto-updates; conflict-free with CRDTs underneath
Critical implementation rules:
- Scope channels per workspace. Don''t broadcast workspace A''s events to workspace B.
- Permission-check on receive. Even if event arrives, only render if user has access.
- Optimistic UI. Update locally before server confirms; rollback on error.
- Retry on disconnect. When websocket reconnects, refetch state to catch missed events.
- Throttle high-frequency updates. Batch related events (e.g., 5 quick saves in 1s = 1 broadcast).
Don''t:
- Trust the realtime layer for security (do permission checks server-side too)
- Send full document state every change (send deltas / diffs)
- Skip reconnect handling (websockets drop more than you think)
Output:
- The channel scoping per workspace
- The event schema
- The optimistic-update + rollback logic
- The reconnect / refetch flow
The single most-important post-launch behavior: **graceful reconnect.** Users on bad networks reconnect frequently; the app needs to refetch state on reconnect, not just keep the stale local copy. Without this, users see ghost data after reconnects.
---
## 6. Use CRDTs for Collaborative Text/Canvas
Where multiple users edit the same field simultaneously, you need CRDTs. Don''t roll your own.
Design CRDT-based editing.
The pattern (using Yjs / Liveblocks):
For text editing:
// Liveblocks + ProseMirror / TipTap
import { useEditor, EditorContent } from '@tiptap/react'
import { useLiveblocksExtension } from '@liveblocks/react-tiptap'
function CollaborativeEditor() {
const liveblocks = useLiveblocksExtension()
const editor = useEditor({
extensions: [StarterKit, liveblocks],
})
return <EditorContent editor={editor} />
}
For shared object state:
const [todos, setTodos] = useStorage(root => root.todos)
const addTodo = useMutation(({ storage }, text) => {
storage.get('todos').push({ id: nanoid(), text, completed: false })
}, [])
Underneath: Yjs CRDTs ensure no two operations conflict. Concurrent inserts merge. Concurrent deletions merge. Order is preserved.
Performance considerations:
- CRDTs grow over time (operation history); compact periodically
- Yjs has good built-in compaction
- For very large documents, splitting into smaller documents helps
- Server-side persistence is via a snapshot + delta log
Persistence:
- Liveblocks: storage is automatic; backed by Liveblocks infra
- Yjs + Hocuspocus: persist to Postgres / Redis / S3
- Snapshots every N updates; delta log between snapshots
Critical implementation rules:
- Use a real CRDT library. Yjs, Automerge, Loro. Don''t implement.
- One CRDT document per logical unit (per document, per board). Cross-CRDT operations are complex.
- Authentication on connect. Verify user permissions before joining a room.
- Audit changes. Per Audit Logs: record who changed what when (CRDTs themselves track this; surface it).
- Test with poor networks. CRDTs handle this gracefully; verify by simulating drops.
Don''t:
- Mix CRDT and non-CRDT operations on the same data
- Forget about conflicts in metadata (e.g., title change while body editing)
- Skip the persistence layer (in-memory only = data loss on restart)
Output:
- The CRDT library choice (Yjs / Automerge)
- The schema for the shared document
- The persistence backend
- The auth-on-connect flow
The single most underestimated CRDT decision: **how granular to make the CRDT scope.** One huge CRDT for "the whole project" gets slow; tiny CRDTs per field get hard to manage. Per-document or per-board is usually right.
---
## 7. Handle Offline Gracefully
Real users have flaky networks. Handle disconnects.
Design offline handling.
The pattern:
Optimistic local edits:
When the user edits while offline:
- Save edits locally (in IndexedDB / localStorage / in-memory)
- Mark them as "pending sync"
- Show a "offline" indicator in the UI
On reconnect:
- Replay pending edits to the server
- Server merges via CRDT or last-write-wins
- Local indicators update to "synced"
- If conflicts: show diff UI for user to resolve
Status indicators:
- Online: subtle green dot or no indicator
- Offline: yellow / red bar at top — "Working offline. [N] changes pending sync."
- Reconnecting: spinner indicator
- Synced: brief "All changes saved" then fade
With Yjs / Liveblocks:
These libraries handle most offline mechanics:
- Yjs IndexedDB provider persists CRDT state locally
- On reconnect, the CRDT engine merges automatically
- No explicit "replay" code needed
Critical implementation rules:
- Persist CRDT state to IndexedDB for offline tolerance
- Detect online/offline state via
navigator.onLine+ connection events - Surface state to users so they''re not confused
- Test with airplane mode during QA
- Cap offline duration if needed (e.g., 24 hours; after that, force re-fetch)
Don''t:
- Pretend everything''s real-time when offline
- Drop user edits on disconnect (they''ll be furious)
- Skip the visual indicator (silent offline mode is confusing)
Output:
- The offline state detection
- The local persistence layer
- The pending-sync indicator UI
- The conflict-resolution flow (if non-CRDT)
The single biggest user-trust feature: **clear offline indicators.** Users on Wi-Fi that drops every 10 minutes need to know their changes are saved locally and will sync. Without indicators, they assume the worst.
---
## 8. Permissions in Real-Time
Real-time channels need authorization. Don''t leak across tenants.
Design real-time permissions.
The pattern:
Channel naming:
Always include workspace_id and resource_id in channel names:
workspace:{id}:document:{id}— document-specificworkspace:{id}:board:{id}— board-specificworkspace:{id}:presence— workspace-level presence
Authentication on connect:
When a client connects to a channel:
- Server verifies the user is authenticated
- Server verifies the user belongs to that workspace
- Server verifies the user has access to that resource (per RBAC)
- Connection accepted or rejected
Implementation (Liveblocks example):
// Server-side authentication endpoint
app.post('/api/liveblocks-auth', async (req, res) => {
const session = liveblocks.prepareSession(req.user.id, {
userInfo: { name: req.user.name, avatar: req.user.avatar }
})
const room = req.body.room // e.g., "workspace:abc:document:xyz"
const [_, workspaceId, __, documentId] = room.split(':')
// Verify access
const member = await getWorkspaceMember(workspaceId, req.user.id)
if (!member) return res.status(403).end()
const doc = await getDocument(documentId)
if (!canAccess(member, doc)) return res.status(403).end()
session.allow(room, session.FULL_ACCESS)
const { body, status } = await session.authorize()
res.status(status).end(body)
})
Critical implementation rules:
- NEVER trust client-claimed identity. Verify server-side from session.
- NEVER expose room access without auth. Public rooms are fine; private rooms need auth.
- Re-verify on permission changes. If a user is removed from a workspace, kick them from rooms.
- Audit access. Log who joined which rooms when.
- Per-room rate limits. A user joining 1000 rooms in a minute is suspicious.
Don''t:
- Use a global "all users in workspace" room (over-broadcasts)
- Skip the auth endpoint (trusting clients = catastrophic)
- Forget about leaving rooms when users navigate away
Output:
- The channel-naming convention
- The auth endpoint
- The permission-check function
- The "kick on permission change" flow
The single biggest privacy bug pattern: **a client that joins another tenant''s room by guessing the ID.** Server-side authorization is non-negotiable. Verify on every room-join request.
---
## 9. Performance and Cost
Real-time scales differently than HTTP. Watch the bills.
Plan for cost.
Liveblocks pricing dimensions:
- Monthly active users (MAU)
- Connection minutes
- Storage (CRDT data)
- Each grows differently from API request count
Pusher / Ably pricing dimensions:
- Concurrent connections
- Messages per month
- Bandwidth
Self-hosted (Yjs + Hocuspocus / Soketi):
- Server costs (CPU + RAM)
- WebSocket connections per server (~10K typical)
- Bandwidth
Cost-saving patterns:
- Disconnect when not actively collaborating. A user who left the tab should not stay connected; expire after 30s of inactivity.
- Batch presence updates. Instead of broadcasting every cursor move, batch into 10-50ms windows.
- Limit room sizes. Hard-cap collaborators per room (e.g., 50 max).
- Compact CRDT history. Yjs supports compaction; run it periodically.
- Cap free-tier features. Free users get presence; paid users get full multiplayer.
Monitoring:
- Active connections per workspace
- Average connection duration
- Messages per connection per minute
- Bandwidth per connection
- CRDT document size growth
Alerts:
- Connection count spike (possible bug or attack)
- Per-workspace bandwidth anomaly
- CRDT document approaching size limit
Don''t:
- Skip the cost dashboard (real-time bills surprise)
- Forget to disconnect inactive sessions (zombie connections cost real money)
- Allow unlimited room growth (hard-cap somewhere)
Output:
- The provider pricing model
- The cost-saving config
- The monitoring + alerts
- The free vs paid feature gates
The biggest cost surprise: **zombie connections from users who closed the tab without graceful disconnect.** Without server-side timeouts, these accumulate. Set 30-60s heartbeat detection; close zombies aggressively.
---
## 10. Quarterly Review
Real-time infrastructure rots. Quarterly review keeps it healthy.
Quarterly review.
Performance:
- p50 / p95 / p99 update latency (server → client)
- Connection failure rate
- Reconnect frequency
- CRDT document size distribution
Cost:
- Per-user real-time cost trend
- Per-feature usage (presence vs cursors vs full multiplayer)
- Free vs paid usage ratio
Reliability:
- Outages / incidents
- Data-loss reports (any?)
- Customer complaints about "real-time felt slow"
Feature usage:
- How many users actually collaborate? (vs solo editors)
- Which surfaces use real-time vs ignore it?
- Any surfaces where real-time is enabled but unused?
Output:
- Performance snapshot
- Cost adjustments
- Surface adjustments (turn off real-time where unused)
- 1 improvement to ship
---
## What "Done" Looks Like
A working real-time collaboration system in 2026 has:
- A documented per-surface decision on what''s real-time vs not
- A managed real-time provider (Liveblocks / Yjs+Hocuspocus / Convex / Supabase Realtime)
- Presence shipped on collaborative surfaces (avatars in header)
- Live cursors on canvas / document surfaces (where appropriate)
- Real-time updates on non-CRDT surfaces (last-write-wins acceptable)
- CRDT-based editing on text / canvas (using Yjs / equivalent — never DIY)
- Offline tolerance with local persistence and sync indicators
- Permissions enforced at room-join time
- Cost monitoring and aggressive zombie-connection cleanup
- Quarterly review baked into the team rhythm
The hidden cost in real-time isn''t the provider bill — it''s **engineers building infrastructure when they should build product**. A team that picks Liveblocks and ships multiplayer in 2 weeks beats a team that builds CRDTs from scratch over 6 months. The provider is the platform; your product is the value. Skip the rebuild.
---
## See Also
- [Multi-Tenant Data Isolation](multi-tenancy-chat.md) — collaboration boundaries
- [Roles & Permissions (RBAC)](roles-permissions-chat.md) — who can collaborate
- [Audit Logs](audit-logs-chat.md) — collaboration events logged
- [In-App Notifications](in-app-notifications-chat.md) — notify users of collaborative changes
- [Search](search-chat.md) — index real-time changes
- [File Uploads](file-uploads-chat.md) — collaborative file sharing
- [API Keys & PATs](api-keys-chat.md) — programmatic collaboration via API
- [Database Providers](https://www.vibereference.com/backend-and-data/database-providers) — Convex, Supabase, etc.
- [Convex](https://www.vibereference.com/backend-and-data/convex) — real-time-first backend
- [Vercel Functions](https://www.vibereference.com/cloud-and-hosting/vercel-functions) — for auth endpoints
- [Performance Optimization](performance-optimization-chat.md) — real-time is a performance concern
[⬅️ Growth Overview](README.md)