Long-Running Operations & Job Status UI: Chat Prompts
Some operations don't fit in a 5-second HTTP request. AI generation that takes 30 seconds. Video transcoding that takes 5 minutes. Bulk import of 100K rows that takes 20 minutes. Migration that takes hours. Report generation. Data export. Web scraping. ML training jobs. Without proper UX, the user clicks "Start" and stares at a spinner that may never end. Or worse, refreshes the page and triggers it again.
Long-running operations need a specific UX pattern: kick off async; track job status; show progress; notify when done; let user navigate away. This is the chat-prompt playbook for shipping that pattern cleanly without your servers melting under polling load.
When This Belongs
Use long-running ops UX when:
- Operation reliably takes >5 seconds
- Progress is meaningful (per-item, per-step, percentage)
- Users may want to navigate away and come back
- Failures need to be communicated clearly
Don't bother when:
- <5 second operation (just spin and wait)
- Background-only / never user-facing
Architecture: Sync Request Kicks Off Async Job
Browser → POST /api/jobs/start
Backend creates job record with status='pending'; returns job_id; enqueues to background worker
Browser ← { job_id, estimated_duration_seconds }
Background worker picks up job, processes, updates status as it goes:
- pending → running (with progress 0%)
- running (progress 10%, 25%, 50%, ...)
- running → succeeded OR failed
Browser polls / subscribes to job_id; renders progress UI:
- Status (pending / running / succeeded / failed)
- Progress percentage or step count
- Estimated time remaining
- Cancel button
- View result button (when complete)
Data Model
I'm building long-running operations infrastructure for my SaaS.
Schema (Drizzle):
```sql
jobs:
id (UUID), user_id, account_id,
type (e.g. 'csv_import', 'report_generation', 'ai_generation', 'data_export'),
status (enum: 'pending' | 'running' | 'succeeded' | 'failed' | 'cancelled'),
progress_pct (int 0-100),
progress_message (e.g. "Processing row 1234 of 50000"),
current_step (int), total_steps (int),
created_at, started_at, completed_at,
result_url (nullable; signed URL to download / view result),
error_message (nullable),
metadata (JSON: input parameters; job-specific data),
estimated_duration_seconds (initial estimate),
cancellable (boolean; can user cancel this job?)
job_logs:
job_id, log_level (info / warn / error), message, timestamp
-- detailed progress; viewable for debugging
Implement:
- The schema
- API: POST /api/jobs/start (creates job + enqueues), GET /api/jobs/:id (status), POST /api/jobs/:id/cancel
- Background worker integration (Inngest / Trigger.dev / Vercel Queues / etc.)
- Helper:
updateJobProgress(jobId, pct, message)— workers call this periodically
Stack: Next.js + Drizzle + Vercel Queues / Inngest.
## Status Polling vs Server-Sent Events vs WebSocket
For job status updates, three patterns:
Pattern A: Polling
- Browser polls GET /api/jobs/:id every 2-5 seconds
- TanStack Query handles this nicely with
refetchInterval - Pros: simple; works behind proxies / firewalls
- Cons: load on server (N polls per N users); slight delay
Pattern B: Server-Sent Events (SSE)
- Browser opens SSE connection to /api/jobs/:id/events
- Server streams updates as they happen
- Pros: real-time; one connection
- Cons: long-lived connections; Vercel Functions have 5-min limit by default
Pattern C: WebSocket
- Bidirectional; can send + receive
- Pros: most flexible
- Cons: more infrastructure; not always justified
Recommended:
- Polling (Pattern A) for jobs <5 minutes
- SSE (Pattern B) for longer jobs or real-time matters
- WebSocket only if you need bidirectional updates (cancellation feedback, etc.)
Implement Pattern A with TanStack Query as default:
const { data: job } = useQuery({
queryKey: ['job', jobId],
queryFn: () => fetch(`/api/jobs/${jobId}`).then(r => r.json()),
refetchInterval: (data) => {
if (data?.status === 'succeeded' || data?.status === 'failed' || data?.status === 'cancelled') {
return false; // stop polling
}
return 3000; // poll every 3 seconds while in flight
},
});
Stack: Next.js + TanStack Query.
## The Job Status Component
Build a <JobStatus> component that renders consistently across all long-running operations:
States:
- pending: spinner + "Queued..." + estimated start time
- running: progress bar + percentage + current step text + estimated time remaining
- succeeded: checkmark + "Complete" + view result button
- failed: red icon + error message + retry button
- cancelled: gray icon + "Cancelled" + restart button
Visual:
- Compact mode: inline progress bar (in tables / cards)
- Expanded mode: full panel with steps, logs, actions
- Toggle expand/collapse
Behaviors:
- Auto-update via TanStack Query
- Cancel button visible if job.cancellable + status='running'
- Auto-redirect / notify when complete (configurable)
- Browser tab title updates: "(50%) Importing customers..."
- Sound notification on complete (optional, opt-in)
Stack: Next.js + Tailwind + shadcn/ui + TanStack Query.
## Progress Tracking Strategies
Different operations track progress differently:
Determinable progress (you know total work):
- Bulk imports: rows processed / total rows
- Multi-step jobs: current step / total steps
- File processing: bytes processed / total bytes
await updateJobProgress(jobId, {
progress_pct: Math.round((rowsProcessed / totalRows) * 100),
progress_message: `Processing row ${rowsProcessed} of ${totalRows}`,
current_step: 2,
total_steps: 5,
});
Indeterminate progress (you don't know total work):
- AI generation (variable length output)
- Web scraping (unknown number of pages)
- Long DB queries
For indeterminate:
- Show animated progress bar (no percentage)
- Show "elapsed: 2m 14s"
- Periodic status messages ("Still working on it...")
Hybrid: known steps + variable work within each step
- "Step 3 of 5: Generating AI summary..." (with sub-progress within step)
Implement helpers for both modes.
Stack: Next.js + Drizzle.
## Estimated Time Remaining
Show "estimated time remaining" — if you can.
Methods:
- Linear extrapolation: time_elapsed × (1 - progress) / progress
- Historical average: based on similar past jobs
- Initial estimate: server-provided at job start; refined as job progresses
Implementation:
- Track historical job duration per type
- For new jobs, use historical median as initial estimate
- Update estimate as job progresses
UX:
- Round + round up: "About 30 seconds remaining" (not 27.4 seconds)
- Show range for high uncertainty: "1-3 minutes remaining"
- Below 10 seconds: "Almost done"
- Don't show if estimate would be misleading (highly variable jobs)
Stack: Next.js + simple statistics.
## Cancellation
Allow users to cancel long-running jobs:
Implementation:
- Cancel button in UI (visible if cancellable)
- POST /api/jobs/:id/cancel — sets job.status = 'cancelled' (intent)
- Worker periodically checks cancellation flag; if set, gracefully exits + cleans up
- Job marked as 'cancelled' in DB; UI updates
Cancellation patterns:
- Cooperative: worker checks flag; cleanest
- Hard kill: worker process terminated; risk of partial state / orphaned resources
Recommended: cooperative cancellation. Worker checks every X iterations or steps.
Edge cases:
- Cancellation requested mid-step: complete current step, then exit
- Job already completed when cancellation arrives: ignore cancel
- Multiple cancel clicks: idempotent
Cleanup on cancel:
- Delete partial output files
- Refund any pre-paid resources (e.g., AI generation that hasn't completed)
- Log the cancellation for analytics
Stack: Next.js + your worker framework.
## Notification on Completion
Notify users when long jobs complete:
In-app:
- Toast notification when job.status changes to succeeded/failed (if user still on page)
- Notification badge / inbox icon updates
- Page title flicker / browser sound (opt-in)
Out-of-app:
- Email if job took >5 minutes (user likely closed tab)
- Slack/Teams notification (for integration-enabled workspaces)
- Mobile push notification (if applicable)
Implementation:
- Notification preference per user (in / out / both)
- Default: in-app only
- For >5 min jobs, default to in-app + email
Stack: Next.js + your notification infrastructure (Knock / Resend / etc.).
## Result Delivery
When a job succeeds, deliver the result:
For data exports:
- result_url points to signed URL (24h expiry) on Vercel Blob / S3
- "Download" button in UI
- Email link to download
For generated content:
- result_url points to viewing URL in your app
- "View result" button
- Optionally show preview inline
For long calculations:
- result stored back in main DB
- "View" navigates to the relevant page
For background tasks (no user-visible result):
- Just success notification; no result URL needed
Stack: Next.js + Vercel Blob + your notification.
## Multi-Job Dashboard
Some users have many concurrent jobs. Build a job dashboard:
/jobs page:
- List all active + recent jobs for current user
- Sort by status (running first, then by completion time)
- Filter by type
- Bulk actions (cancel all running; clear completed)
Per-job row:
- Type icon
- Status (visual + text)
- Progress (mini bar)
- Started time
- Actions (cancel / view / retry)
Auto-refresh:
- Polls /api/jobs?user_id=X every 5s
- Updates active job statuses
Use case:
- User starts 5 export jobs; comes back; sees all 5 with status
- Can cancel any; download completed ones
Stack: Next.js + TanStack Query.
## Failure Handling + Retries
Implement failure + retry semantics:
Job failures:
- Worker catches exception; updates job.status = 'failed'; logs error_message
- UI surfaces error clearly with retry button
- Auto-retry for transient errors (network blips, rate limits)
- No auto-retry for hard failures (invalid input, auth errors)
Retry policy per job type:
- Default: max 3 retries with exponential backoff (1s, 4s, 16s)
- AI generation: retry up to 3 times for rate limit / transient errors
- File processing: no retry on parse errors; retry on transient I/O
User-initiated retry:
- "Retry" button on failed job
- Creates new job (don't reuse failed job_id)
- Carries over original parameters
Implementation:
- Worker wraps logic in try/catch with retry decorator
- Distinguish retryable vs non-retryable errors
- Surface retry count in UI ("Retry 2 of 3")
Stack: Next.js + your worker framework (Inngest / Trigger.dev have built-in retries).
## Performance: Polling Load
Polling can hammer your servers if not done well.
For 1000 concurrent users polling every 3 seconds:
- 333 requests/second on /api/jobs/:id
- Each query hits DB
- Adds up
Optimizations:
- Cache responses briefly (1 second cache on completed job statuses)
- Increase poll interval as job ages (start at 1s; grow to 30s for long-running)
- Stop polling on completion (essential)
- Batch poll multiple jobs (one request for all user's jobs)
- Use SSE / WebSocket for high-traffic apps
For smaller apps: don't optimize prematurely. TanStack Query's defaults work fine up to several thousand concurrent users.
Stack: Next.js + TanStack Query + appropriate caching.
## Common Pitfalls
**Synchronous request that "should be fast" but isn't.** User clicks; spinner; 30 seconds; timeout. Always plan for async + status UI.
**No cancellation.** User clicks "start"; realizes mistake; can't stop it. Burns resources + frustrates user.
**Hardcoded poll interval.** Polling at 100ms = server load. At 60s = sluggish UX. Tune.
**Polling forever after completion.** Browser keeps polling completed job. Stop polling.
**Page refresh starts new job.** User refreshes; same job restarts. Use idempotency keys + show existing job.
**No error message on failure.** "Something went wrong." Useless. Show specific error + retry path.
**No notification on long jobs.** User starts 30-min job; closes tab; never finds out it finished. Email notification.
**Estimates wildly wrong.** "1 minute remaining" → 30 minutes later still running. Better no estimate than misleading one.
**No cleanup on cancel.** Cancelled job leaves orphaned files / partial DB state. Cleanup hooks.
**Multiple identical jobs running.** User clicks "Start" multiple times. Deduplicate by idempotency key.
**Browser tab not informative.** Job running; tab title is generic. Update with progress: "(45%) Generating report..."
**No max execution time.** Stuck jobs run forever. Set max duration; auto-fail past threshold.
**Logs invisible.** User wants to see what's happening; logs only available in admin UI. Surface user-relevant logs.
**Workers fail silently.** Worker crashes; job stuck in 'running' forever. Heartbeat + zombie detection.
**Result URLs expire too fast.** User comes back tomorrow; download link dead. Reasonable expiry (24-72h) + regeneration option.
**No retry visibility.** Auto-retry happens silently; user doesn't know it took 3 attempts. Show retry counter.
**Cancellation race conditions.** Cancel + complete arrive simultaneously; status confused. Use atomic status transitions.
**Mobile UX broken.** Mobile users navigate away; come back; status confusing. Work on mobile.
**No bulk operations.** User has 50 export jobs; want to download all. Bulk download or zip option.
**No "view past job" history.** User wants to see what they exported last week. Job history page with retention.
**Result URLs not auth-checked.** Anyone with the URL can download. Sign URLs to current user.
## See Also
- [Background Jobs & Queue Management](./background-jobs-queue-management-chat.md)
- [Cron / Scheduled Tasks](./cron-scheduled-tasks-chat.md)
- [CSV / Data Export Patterns](./csv-data-export-patterns-chat.md) — uses long-running ops
- [Customer Reports & Scheduled Exports](./customer-reports-scheduled-exports-chat.md)
- [Email Template Implementation](./email-template-implementation-chat.md)
- [Toast Notifications UI](./toast-notifications-ui-chat.md)
- [In-App Notifications](./in-app-notifications-chat.md)
- [In-App Status Banners & System Notifications](./in-app-status-banners-system-notifications-chat.md)
- [Optimistic UI Updates](./optimistic-ui-updates-chat.md)
- [Form Autosave & Draft Persistence](./form-autosave-draft-persistence-chat.md)
- [Audit Logs](./audit-logs-chat.md)
- [Internal Admin Tools](./internal-admin-tools-chat.md)
- [Real-Time Collaboration](./real-time-collaboration-chat.md)
- [WebSocket / SSE Implementation](./websocket-sse-implementation-chat.md)
- [HTTP Retry & Backoff](./http-retry-backoff-chat.md)
- [Idempotency Patterns](./idempotency-patterns-chat.md)
- [Approval Workflows & Multi-Step Routing](./approval-workflows-multi-step-routing-chat.md)
- [Sandbox & Test Mode for SaaS APIs](./sandbox-test-mode-saas-apis-chat.md)
- [Plan Upgrade, Downgrade & Mid-Cycle Billing Changes](./plan-upgrade-downgrade-billing-changes-chat.md)
- [Sub-Account & Parent-Child Org Hierarchy](./sub-account-parent-child-organization-hierarchy-chat.md)
- [In-Product AI Agent Implementation](./in-product-ai-agent-implementation-chat.md) — agents are long-running ops
- [LLM Cost Optimization](./llm-cost-optimization-chat.md)
- [LLM Quality Monitoring](./llm-quality-monitoring-chat.md)
- [AI Streaming Chat UI](./ai-streaming-chat-ui-chat.md)
- [AI Features Implementation](./ai-features-implementation-chat.md)
- [Empty States, Loading & Error States](./empty-states-loading-error-states-chat.md)
- [Microcopy & Product Copy Systems](./microcopy-product-copy-systems-chat.md)
- [Background Jobs Providers (VibeReference)](https://viberef.dev/backend-and-data/background-jobs-providers.md) — Inngest / Trigger.dev / Vercel Queues
- [Vercel Queues (VibeReference)](https://viberef.dev/cloud-and-hosting/vercel-queues.md)
- [Vercel Workflow (VibeReference)](https://viberef.dev/cloud-and-hosting/vercel-workflow.md)
- [Vercel Blob (VibeReference)](https://viberef.dev/cloud-and-hosting/vercel-blob.md)
- [File Storage Providers (VibeReference)](https://viberef.dev/cloud-and-hosting/file-storage-providers.md)
- [Notification Providers (VibeReference)](https://viberef.dev/backend-and-data/notification-providers.md)
- [Realtime / WebSocket Platforms (VibeReference)](https://viberef.dev/backend-and-data/realtime-websocket-platforms.md)