Long-Running Operations & Job Status UI: Chat Prompts

⬅️ Back to 6. Grow

Some operations don't fit in a 5-second HTTP request. AI generation that takes 30 seconds. Video transcoding that takes 5 minutes. Bulk import of 100K rows that takes 20 minutes. Migration that takes hours. Report generation. Data export. Web scraping. ML training jobs. Without proper UX, the user clicks "Start" and stares at a spinner that may never end. Or worse, refreshes the page and triggers it again.

Long-running operations need a specific UX pattern: kick off async; track job status; show progress; notify when done; let user navigate away. This is the chat-prompt playbook for shipping that pattern cleanly without your servers melting under polling load.

When This Belongs

Use long-running ops UX when:

Operation reliably takes >5 seconds
Progress is meaningful (per-item, per-step, percentage)
Users may want to navigate away and come back
Failures need to be communicated clearly

Don't bother when:

<5 second operation (just spin and wait)
Background-only / never user-facing

Architecture: Sync Request Kicks Off Async Job

Browser → POST /api/jobs/start
Backend creates job record with status='pending'; returns job_id; enqueues to background worker
Browser ← { job_id, estimated_duration_seconds }

Background worker picks up job, processes, updates status as it goes:
- pending → running (with progress 0%)
- running (progress 10%, 25%, 50%, ...)
- running → succeeded OR failed

Browser polls / subscribes to job_id; renders progress UI:
- Status (pending / running / succeeded / failed)
- Progress percentage or step count
- Estimated time remaining
- Cancel button
- View result button (when complete)

Data Model

I'm building long-running operations infrastructure for my SaaS.

Schema (Drizzle):
```sql
jobs:
  id (UUID), user_id, account_id,
  type (e.g. 'csv_import', 'report_generation', 'ai_generation', 'data_export'),
  status (enum: 'pending' | 'running' | 'succeeded' | 'failed' | 'cancelled'),
  progress_pct (int 0-100),
  progress_message (e.g. "Processing row 1234 of 50000"),
  current_step (int), total_steps (int),
  created_at, started_at, completed_at,
  result_url (nullable; signed URL to download / view result),
  error_message (nullable),
  metadata (JSON: input parameters; job-specific data),
  estimated_duration_seconds (initial estimate),
  cancellable (boolean; can user cancel this job?)

job_logs:
  job_id, log_level (info / warn / error), message, timestamp
  -- detailed progress; viewable for debugging

Implement:

The schema
API: POST /api/jobs/start (creates job + enqueues), GET /api/jobs/:id (status), POST /api/jobs/:id/cancel
Background worker integration (Inngest / Trigger.dev / Vercel Queues / etc.)
Helper: updateJobProgress(jobId, pct, message) — workers call this periodically

Stack: Next.js + Drizzle + Vercel Queues / Inngest.


## Status Polling vs Server-Sent Events vs WebSocket

For job status updates, three patterns:

Pattern A: Polling

Browser polls GET /api/jobs/:id every 2-5 seconds
TanStack Query handles this nicely with refetchInterval
Pros: simple; works behind proxies / firewalls
Cons: load on server (N polls per N users); slight delay

Pattern B: Server-Sent Events (SSE)

Browser opens SSE connection to /api/jobs/:id/events
Server streams updates as they happen
Pros: real-time; one connection
Cons: long-lived connections; Vercel Functions have 5-min limit by default

Pattern C: WebSocket

Bidirectional; can send + receive
Pros: most flexible
Cons: more infrastructure; not always justified

Recommended:

Polling (Pattern A) for jobs <5 minutes
SSE (Pattern B) for longer jobs or real-time matters
WebSocket only if you need bidirectional updates (cancellation feedback, etc.)

Implement Pattern A with TanStack Query as default:

const { data: job } = useQuery({
  queryKey: ['job', jobId],
  queryFn: () => fetch(`/api/jobs/${jobId}`).then(r => r.json()),
  refetchInterval: (data) => {
    if (data?.status === 'succeeded' || data?.status === 'failed' || data?.status === 'cancelled') {
      return false; // stop polling
    }
    return 3000; // poll every 3 seconds while in flight
  },
});

Stack: Next.js + TanStack Query.


## The Job Status Component

Build a <JobStatus> component that renders consistently across all long-running operations:

States:

pending: spinner + "Queued..." + estimated start time
running: progress bar + percentage + current step text + estimated time remaining
succeeded: checkmark + "Complete" + view result button
failed: red icon + error message + retry button
cancelled: gray icon + "Cancelled" + restart button

Visual:

Compact mode: inline progress bar (in tables / cards)
Expanded mode: full panel with steps, logs, actions
Toggle expand/collapse

Behaviors:

Auto-update via TanStack Query
Cancel button visible if job.cancellable + status='running'
Auto-redirect / notify when complete (configurable)
Browser tab title updates: "(50%) Importing customers..."
Sound notification on complete (optional, opt-in)

Stack: Next.js + Tailwind + shadcn/ui + TanStack Query.


## Progress Tracking Strategies

Different operations track progress differently:

Determinable progress (you know total work):

Bulk imports: rows processed / total rows
Multi-step jobs: current step / total steps
File processing: bytes processed / total bytes

await updateJobProgress(jobId, {
  progress_pct: Math.round((rowsProcessed / totalRows) * 100),
  progress_message: `Processing row ${rowsProcessed} of ${totalRows}`,
  current_step: 2,
  total_steps: 5,
});

Indeterminate progress (you don't know total work):

AI generation (variable length output)
Web scraping (unknown number of pages)
Long DB queries

For indeterminate:

Show animated progress bar (no percentage)
Show "elapsed: 2m 14s"
Periodic status messages ("Still working on it...")

Hybrid: known steps + variable work within each step

"Step 3 of 5: Generating AI summary..." (with sub-progress within step)

Implement helpers for both modes.

Stack: Next.js + Drizzle.


## Estimated Time Remaining

Show "estimated time remaining" — if you can.

Methods:

Linear extrapolation: time_elapsed × (1 - progress) / progress
Historical average: based on similar past jobs
Initial estimate: server-provided at job start; refined as job progresses

Implementation:

Track historical job duration per type
For new jobs, use historical median as initial estimate
Update estimate as job progresses

UX:

Round + round up: "About 30 seconds remaining" (not 27.4 seconds)
Show range for high uncertainty: "1-3 minutes remaining"
Below 10 seconds: "Almost done"
Don't show if estimate would be misleading (highly variable jobs)

Stack: Next.js + simple statistics.


## Cancellation

Allow users to cancel long-running jobs:

Implementation:

Cancel button in UI (visible if cancellable)
POST /api/jobs/:id/cancel — sets job.status = 'cancelled' (intent)
Worker periodically checks cancellation flag; if set, gracefully exits + cleans up
Job marked as 'cancelled' in DB; UI updates

Cancellation patterns:

Cooperative: worker checks flag; cleanest
Hard kill: worker process terminated; risk of partial state / orphaned resources

Recommended: cooperative cancellation. Worker checks every X iterations or steps.

Edge cases:

Cancellation requested mid-step: complete current step, then exit
Job already completed when cancellation arrives: ignore cancel
Multiple cancel clicks: idempotent

Cleanup on cancel:

Delete partial output files
Refund any pre-paid resources (e.g., AI generation that hasn't completed)
Log the cancellation for analytics

Stack: Next.js + your worker framework.


## Notification on Completion

Notify users when long jobs complete:

In-app:

Toast notification when job.status changes to succeeded/failed (if user still on page)
Notification badge / inbox icon updates
Page title flicker / browser sound (opt-in)

Out-of-app:

Email if job took >5 minutes (user likely closed tab)
Slack/Teams notification (for integration-enabled workspaces)
Mobile push notification (if applicable)

Implementation:

Notification preference per user (in / out / both)
Default: in-app only
For >5 min jobs, default to in-app + email

Stack: Next.js + your notification infrastructure (Knock / Resend / etc.).


## Result Delivery

When a job succeeds, deliver the result:

For data exports:

result_url points to signed URL (24h expiry) on Vercel Blob / S3
"Download" button in UI
Email link to download

For generated content:

result_url points to viewing URL in your app
"View result" button
Optionally show preview inline

For long calculations:

result stored back in main DB
"View" navigates to the relevant page

For background tasks (no user-visible result):

Just success notification; no result URL needed

Stack: Next.js + Vercel Blob + your notification.


## Multi-Job Dashboard

Some users have many concurrent jobs. Build a job dashboard:

/jobs page:

List all active + recent jobs for current user
Sort by status (running first, then by completion time)
Filter by type
Bulk actions (cancel all running; clear completed)

Per-job row:

Type icon
Status (visual + text)
Progress (mini bar)
Started time
Actions (cancel / view / retry)

Auto-refresh:

Polls /api/jobs?user_id=X every 5s
Updates active job statuses

Use case:

User starts 5 export jobs; comes back; sees all 5 with status
Can cancel any; download completed ones

Stack: Next.js + TanStack Query.


## Failure Handling + Retries

Implement failure + retry semantics:

Job failures:

Worker catches exception; updates job.status = 'failed'; logs error_message
UI surfaces error clearly with retry button
Auto-retry for transient errors (network blips, rate limits)
No auto-retry for hard failures (invalid input, auth errors)

Retry policy per job type:

Default: max 3 retries with exponential backoff (1s, 4s, 16s)
AI generation: retry up to 3 times for rate limit / transient errors
File processing: no retry on parse errors; retry on transient I/O

User-initiated retry:

"Retry" button on failed job
Creates new job (don't reuse failed job_id)
Carries over original parameters

Implementation:

Worker wraps logic in try/catch with retry decorator
Distinguish retryable vs non-retryable errors
Surface retry count in UI ("Retry 2 of 3")

Stack: Next.js + your worker framework (Inngest / Trigger.dev have built-in retries).


## Performance: Polling Load

Polling can hammer your servers if not done well.

For 1000 concurrent users polling every 3 seconds:

333 requests/second on /api/jobs/:id
Each query hits DB
Adds up

Optimizations:

Cache responses briefly (1 second cache on completed job statuses)
Increase poll interval as job ages (start at 1s; grow to 30s for long-running)
Stop polling on completion (essential)
Batch poll multiple jobs (one request for all user's jobs)
Use SSE / WebSocket for high-traffic apps

For smaller apps: don't optimize prematurely. TanStack Query's defaults work fine up to several thousand concurrent users.

Stack: Next.js + TanStack Query + appropriate caching.


## Common Pitfalls

**Synchronous request that "should be fast" but isn't.** User clicks; spinner; 30 seconds; timeout. Always plan for async + status UI.

**No cancellation.** User clicks "start"; realizes mistake; can't stop it. Burns resources + frustrates user.

**Hardcoded poll interval.** Polling at 100ms = server load. At 60s = sluggish UX. Tune.

**Polling forever after completion.** Browser keeps polling completed job. Stop polling.

**Page refresh starts new job.** User refreshes; same job restarts. Use idempotency keys + show existing job.

**No error message on failure.** "Something went wrong." Useless. Show specific error + retry path.

**No notification on long jobs.** User starts 30-min job; closes tab; never finds out it finished. Email notification.

**Estimates wildly wrong.** "1 minute remaining" → 30 minutes later still running. Better no estimate than misleading one.

**No cleanup on cancel.** Cancelled job leaves orphaned files / partial DB state. Cleanup hooks.

**Multiple identical jobs running.** User clicks "Start" multiple times. Deduplicate by idempotency key.

**Browser tab not informative.** Job running; tab title is generic. Update with progress: "(45%) Generating report..."

**No max execution time.** Stuck jobs run forever. Set max duration; auto-fail past threshold.

**Logs invisible.** User wants to see what's happening; logs only available in admin UI. Surface user-relevant logs.

**Workers fail silently.** Worker crashes; job stuck in 'running' forever. Heartbeat + zombie detection.

**Result URLs expire too fast.** User comes back tomorrow; download link dead. Reasonable expiry (24-72h) + regeneration option.

**No retry visibility.** Auto-retry happens silently; user doesn't know it took 3 attempts. Show retry counter.

**Cancellation race conditions.** Cancel + complete arrive simultaneously; status confused. Use atomic status transitions.

**Mobile UX broken.** Mobile users navigate away; come back; status confusing. Work on mobile.

**No bulk operations.** User has 50 export jobs; want to download all. Bulk download or zip option.

**No "view past job" history.** User wants to see what they exported last week. Job history page with retention.

**Result URLs not auth-checked.** Anyone with the URL can download. Sign URLs to current user.

## See Also

- [Background Jobs & Queue Management](./background-jobs-queue-management-chat.md)
- [Cron / Scheduled Tasks](./cron-scheduled-tasks-chat.md)
- [CSV / Data Export Patterns](./csv-data-export-patterns-chat.md) — uses long-running ops
- [Customer Reports & Scheduled Exports](./customer-reports-scheduled-exports-chat.md)
- [Email Template Implementation](./email-template-implementation-chat.md)
- [Toast Notifications UI](./toast-notifications-ui-chat.md)
- [In-App Notifications](./in-app-notifications-chat.md)
- [In-App Status Banners & System Notifications](./in-app-status-banners-system-notifications-chat.md)
- [Optimistic UI Updates](./optimistic-ui-updates-chat.md)
- [Form Autosave & Draft Persistence](./form-autosave-draft-persistence-chat.md)
- [Audit Logs](./audit-logs-chat.md)
- [Internal Admin Tools](./internal-admin-tools-chat.md)
- [Real-Time Collaboration](./real-time-collaboration-chat.md)
- [WebSocket / SSE Implementation](./websocket-sse-implementation-chat.md)
- [HTTP Retry & Backoff](./http-retry-backoff-chat.md)
- [Idempotency Patterns](./idempotency-patterns-chat.md)
- [Approval Workflows & Multi-Step Routing](./approval-workflows-multi-step-routing-chat.md)
- [Sandbox & Test Mode for SaaS APIs](./sandbox-test-mode-saas-apis-chat.md)
- [Plan Upgrade, Downgrade & Mid-Cycle Billing Changes](./plan-upgrade-downgrade-billing-changes-chat.md)
- [Sub-Account & Parent-Child Org Hierarchy](./sub-account-parent-child-organization-hierarchy-chat.md)
- [In-Product AI Agent Implementation](./in-product-ai-agent-implementation-chat.md) — agents are long-running ops
- [LLM Cost Optimization](./llm-cost-optimization-chat.md)
- [LLM Quality Monitoring](./llm-quality-monitoring-chat.md)
- [AI Streaming Chat UI](./ai-streaming-chat-ui-chat.md)
- [AI Features Implementation](./ai-features-implementation-chat.md)
- [Empty States, Loading & Error States](./empty-states-loading-error-states-chat.md)
- [Microcopy & Product Copy Systems](./microcopy-product-copy-systems-chat.md)
- [Background Jobs Providers (VibeReference)](https://viberef.dev/backend-and-data/background-jobs-providers.md) — Inngest / Trigger.dev / Vercel Queues
- [Vercel Queues (VibeReference)](https://viberef.dev/cloud-and-hosting/vercel-queues.md)
- [Vercel Workflow (VibeReference)](https://viberef.dev/cloud-and-hosting/vercel-workflow.md)
- [Vercel Blob (VibeReference)](https://viberef.dev/cloud-and-hosting/vercel-blob.md)
- [File Storage Providers (VibeReference)](https://viberef.dev/cloud-and-hosting/file-storage-providers.md)
- [Notification Providers (VibeReference)](https://viberef.dev/backend-and-data/notification-providers.md)
- [Realtime / WebSocket Platforms (VibeReference)](https://viberef.dev/backend-and-data/realtime-websocket-platforms.md)