Bulk Operations & Batch Processing: Let Customers Edit 5,000 Things at Once Without Breaking Your Database

Bulk Operations Strategy for Your New SaaS

Goal: Ship bulk operations (multi-select edit, bulk delete, mass tag, batch import-and-process) that let customers act on hundreds or thousands of items at once — with progress UI, partial-success handling, undo capability, and back-end safety so the database doesn''t lock during the operation. Avoid the failure modes where founders ship "select all" without backend support (UI spinning forever as 50K rows update synchronously), no progress indicator (customer thinks it''s broken), no undo (one accidental "delete all" call ruins the customer''s day), or no rate limit (one bulk op blocks every other customer''s requests).

Process: Follow this chat pattern with your AI coding tool such as Claude or v0.app. Pay attention to the notes in [brackets] and replace the bracketed text with your own content.

Timeframe: Backend bulk-action API + progress UI shipped in week 1. Partial-success handling + undo in week 2. Rate limits + audit + edge cases in week 3. Quarterly review baked in.

Why Most Founder Bulk Ops Are Broken

Three failure modes hit founders the same way:

Sync UI processing. Founder writes "select all → click delete" that calls a single endpoint with 5,000 IDs. Backend processes inline; HTTP timeout at 30 seconds; customer sees error; some rows deleted, others not; state is inconsistent.
No partial-success handling. Bulk operation fails on row 47 of 5,000; rolls back everything; customer is angry that 46 valid edits were lost. Or: doesn''t roll back; customer doesn''t know which 46 succeeded; reconciles state by hand.
No undo. Customer accidentally clicks "Delete all" with the wrong filter applied; loses 5,000 records; tears at support. No rollback option; angry tweet incoming.

The version that works is structured: async backend processing with progress, partial-success per row with detailed status, undo for destructive operations within a window, rate limiting per workspace, and audit logging for accountability.

This guide assumes you have already done Authentication (bulk ops are user-scoped), have shipped Multi-Tenant Data Isolation (bulk respects workspace boundaries), have shipped Roles & Permissions (RBAC) (some bulk ops are admin-only), have considered Background Jobs Providers (the queue layer), have shipped Audit Logs (bulk ops are high-value events), and have shipped Rate Limiting & Abuse Prevention.

1. Decide Which Operations Need Bulk

Not every action needs a bulk version. Decide deliberately.

Help me decide which actions need bulk variants.

The candidates:

**Always need bulk** (most products):
- Delete (multi-select; one of the most-requested)
- Tag / un-tag (batch labeling)
- Move / archive (organizational)
- Status / state changes (mark-as-done; mark-as-shipped; etc.)
- Assignment changes (assign to user X)

**Sometimes need bulk**:
- Export (per [account-deletion-data-export](account-deletion-data-export-chat.md) — bulk export = data takeout)
- Edit fields (multi-select edit one field across rows)
- Apply template / preset

**Rarely need bulk**:
- Create (bulk creation usually = [CSV import](csv-import-chat.md) instead)
- Detailed-edit (bulk edit on most fields creates conflicts)

**Don''t need bulk**:
- Per-record operations that have side effects (sending emails — too risky in bulk)
- Workflow transitions that require per-record review

For my product:
- The 5-10 actions that customers do repeatedly
- Which ones would benefit from "do this to N items at once"
- Which ones are too risky for bulk

Output:
1. The bulk-action catalog
2. The risk assessment per action
3. The "skip bulk" list with reasons

The biggest unforced error: shipping bulk on every action because "users want it". Some actions (sending emails to N customers; permanent deletes without undo) are too risky for bulk without serious safeguards. Triage: is the bulk version actually needed, and what safeguards must come with it?

2. Process Asynchronously With Progress

The HTTP request that "kicks off" a bulk op shouldn''t process the whole thing inline. Use a queue.

Design the async processing pipeline.

The pattern:

**Phase 1: Receive (HTTP handler)**

1. Customer selects N items + clicks bulk action
2. Frontend POSTs `{ action: 'delete', ids: [...] }` to `/api/bulk/operations`
3. Backend validates: user has permission for this action; all IDs in same workspace; quota OK
4. Backend creates a `bulk_operations` record:
   - operation_id (UUID)
   - user_id, workspace_id
   - action type
   - target IDs
   - status: 'pending'
   - total_count, processed_count: 0
5. Backend enqueues a background job (per [background-jobs-providers](https://www.vibereference.com/backend-and-data/background-jobs-providers))
6. Returns: `{ operation_id, status_url }`

**Phase 2: Process (background worker)**

1. Worker picks up the operation
2. Marks status: 'processing'
3. For each ID in the target list:
   - Apply the action
   - Update per-item status (success / failure / reason)
   - Increment processed_count
4. On completion: status: 'completed' (or 'completed_with_errors')

**Phase 3: Notify**

1. Frontend polls `status_url` (or websocket)
2. Shows progress: "Processed 47 of 5,000 (1%)"
3. On completion: success summary or error report

**Storage**:

```sql
CREATE TABLE bulk_operations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),
  workspace_id UUID NOT NULL REFERENCES workspaces(id),
  action TEXT NOT NULL,                  -- 'delete', 'tag', 'archive', etc.
  target_count INT NOT NULL,
  processed_count INT NOT NULL DEFAULT 0,
  succeeded_count INT NOT NULL DEFAULT 0,
  failed_count INT NOT NULL DEFAULT 0,
  status TEXT NOT NULL DEFAULT 'pending', -- pending / processing / completed / failed / cancelled
  parameters JSONB,                       -- action-specific params (new tag value, etc.)
  started_at TIMESTAMP,
  completed_at TIMESTAMP,
  error_message TEXT,
  is_undoable BOOLEAN NOT NULL DEFAULT FALSE,
  undo_until TIMESTAMP,
  created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE TABLE bulk_operation_items (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  operation_id UUID NOT NULL REFERENCES bulk_operations(id) ON DELETE CASCADE,
  target_id UUID NOT NULL,
  status TEXT NOT NULL DEFAULT 'pending', -- pending / succeeded / failed / skipped
  error_message TEXT,
  previous_state JSONB,                   -- for undo
  processed_at TIMESTAMP
);
CREATE INDEX idx_bulk_op_items ON bulk_operation_items(operation_id);

Critical implementation rules:

Never block the HTTP request on long-running bulk ops.
Persist target IDs. Don''t re-fetch from a query (state may change mid-op).
Process in batches within the worker (e.g., 100 at a time; transaction per batch).
Update progress incrementally. Don''t wait until end to surface state.
Handle worker death gracefully. Resume from last processed item.

Don''t:

Run the operation inline in the HTTP request
Lock the entire dataset (use small transactions)
Forget partial-success state

Output:

The bulk_operations schema
The endpoint code
The worker code with batching
The progress-polling endpoint


The biggest single performance win: **batched async processing.** A 5,000-item bulk op processed in 50 batches of 100 each (each in its own transaction) completes in seconds without locking; the same op processed inline in one transaction can lock the table for minutes and fail under load.

---

## 3. Handle Partial Success

Bulk ops can fail on individual items. Don''t roll back the whole thing.

Design partial-success handling.

The pattern:

Per-item result tracking:

For each item in the bulk op:

Apply the action
If success: mark bulk_operation_items.status = 'succeeded'
If failure: mark 'failed' + reason
Increment counters on parent bulk_operations record

Common failure reasons:

Item already deleted (race with another operation)
Item lacks required state (e.g., can''t archive an active subscription)
Permission lost during operation (user demoted mid-op)
External service failure (e.g., search index update failed)

The customer-facing result UI:

After completion:

"Processed 5,000 items: 4,950 succeeded, 50 failed"
Expandable: "View failed items" → list with reasons
"Retry failed" button (re-runs only failed items)

Aggregating failure reasons:

If 50 of 5,000 failed all with the same reason: surface as one error type ("47 items couldn''t be archived because they''re currently active"). Don''t list 50 individual errors.

The "all-or-nothing" mode (rare but useful):

Some operations should be atomic:

"Move all selected items to project X" — if X is full, fail all
Use a transaction; rollback on first error
Fewer use cases; document explicitly

Critical implementation rules:

Default to per-item independence. One failure shouldn''t cancel others.
Surface failures clearly. Customer needs to know what failed and why.
Make retry-failed possible. Don''t make the customer re-select.
Rate-limit failures. If 1,000 fail in a row, halt and alert (something''s wrong).

Don''t:

Roll back successes when one item fails
Hide failures
Treat failures as "the operation failed" globally

Output:

The per-item status tracking
The aggregation logic
The result UI with retry-failed
The "circuit breaker" for cascading failures


The biggest customer-trust factor: **showing exactly which items failed and why.** A customer who can see "47 failed because [reason]" can fix and retry; a customer who just sees "operation failed" has no path forward.

---

## 4. Build Undo for Destructive Operations

Bulk delete is the most-feared operation. Make it undoable.

Design undo.

The pattern:

For destructive operations (delete, archive, mass-state-change):

Store previous state in bulk_operation_items.previous_state
Mark is_undoable: true on the bulk_operations record
Set undo_until (typically 30 minutes to 24 hours)

Undo UI:

After successful destructive op:

Banner / toast: "Deleted 5,000 items. [Undo]"
Persistent until dismissed or undo_until expires
Click "Undo" → restore items
Or: visit operation detail page; click "Undo this operation"

Undo execution:

When customer clicks undo:

Background job processes each item in the operation
Restores from previous_state
Marks the original op as undone
Notifies customer when complete

Undo storage:

// previous_state for a delete:
{
  "deleted": false,
  "data": { "title": "...", "content": "...", ... }
}

// previous_state for an archive:
{
  "archived": false,
  "archived_at": null
}

// previous_state for a tag change:
{
  "tags": ["old-tag-1", "old-tag-2"]
}

The undo window:

Short (30 min): low storage cost; surprise customers
Long (7 days): higher storage cost; customer-friendly
Default: 24 hours

Critical implementation rules:

Capture previous state BEFORE applying changes. Once changed, you can''t recover otherwise.
Store enough to fully restore. Don''t store just IDs; store fields needed to rebuild.
Clean up after undo window. Stale previous_state takes storage.
Audit undo events. Per Audit Logs: when an undo runs, log it.

Edge cases:

Item modified after delete-undo: customer deleted X, modified another version of X (impossible if X was deleted), undid: restore as it was at delete time
Cascade restore: delete a project; cascade-deletes 50 tasks. Undo: restore project AND tasks. Track relationship.
External service involvement: bulk op also called external API (e.g., Stripe). Undo may not be possible there. Document.

For non-undoable ops:

Some ops can''t be undone (e.g., already-sent webhooks, already-charged payments). Surface explicitly: "This action cannot be undone. Type 'CONFIRM' to proceed."

Don''t:

Make undo unavailable for destructive ops (always provide unless impossible)
Use a global undo (per-operation is clearer)
Forget cascade restoration

Output:

The undo storage schema
The undo execution worker
The undo UI / banner
The undo-window cleanup job


The single biggest customer-trust feature: **a 30-minute undo window after bulk delete.** A customer who panic-deletes 1,000 items at 11pm clicks undo at 11:01pm; everything restored. Without undo, they file a frantic support ticket and you spend hours restoring from backup.

---

## 5. Rate-Limit and Quota Bulk Ops

One customer running a 100K-item op can starve every other customer. Constrain.

Design rate limits.

The pattern:

Per-workspace concurrency:

Max N concurrent bulk operations per workspace (typically 1-3)
New op requested while N already running: queue OR reject with "too many in-flight"
Prevents one workspace from monopolizing workers

Per-action volume limits:

Action	Per-tier limit	Why
Delete	10K items/day	Protects against accidental mass-delete
Tag	100K items/day	Cheap operation; high tolerance
Export	1 export/hour	Prevents data exfiltration spam
Bulk-edit	50K items/day	Complex operation

Per-tier scaling:

Free tier: lower limits (1K items/op max)
Paid tier: standard limits
Enterprise: customer-configurable

Worker concurrency:

Total worker pool: e.g., 10 workers
Per-customer hard cap: e.g., 2 workers
Prevents one customer from using all workers

Detection of abusive patterns:

Spike in bulk-op volume from one customer
Repeated failures (cascading errors)
Off-hours volume (3am bulk delete from new account)

Trigger alerts; pause if needed.

Critical implementation rules:

Document limits publicly. Customers should know.
Surface limits clearly when hit. "You''ve hit the daily bulk-delete limit. Try again tomorrow OR contact support to raise."
Don''t silently throttle. Customer should know.

For very large operations:

Customer wants to delete 1M items
Standard limits prevent it in one op
Solution: enterprise feature; manual review; scheduled execution
Or: customer can do across multiple days

Don''t:

Allow unlimited bulk ops (DDoS risk)
Skip per-customer quotas (one bad actor ruins it for all)
Forget worker concurrency caps

Output:

The per-action limit table
The concurrency cap
The detection rules
The customer-facing rate-limit messaging


The biggest invisible cost: **one customer''s bulk op blocking all other customers.** Without per-customer worker caps, a 100K-item delete starves the worker pool; everyone else''s smaller ops queue indefinitely. Per-customer limits prevent this.

---

## 6. Surface Progress in the UI

A customer who clicks "delete 5,000" and sees a spinner for 30 seconds thinks the app is broken.

Design the progress UX.

The pattern:

Immediate response:

Optimistic UI: items removed from view immediately (show "Deleting...")
OR: blocking modal: "Processing your bulk action..."

Progress indicator:

Progress bar: "Processed 1,247 of 5,000 (25%)"
Time estimate: "About 30 seconds remaining"
Cancellable button: "Cancel" (if supported)

Background-processing pattern:

Customer can navigate away mid-op
Banner persists at top: "Bulk operation in progress: 1,247 / 5,000"
Click banner: open detail / cancel

Real-time updates:

WebSocket / SSE for sub-second updates
Or: polling every 1-2 seconds
Update counter / progress bar

On completion:

Toast: "5,000 items deleted. [Undo]"
Or: persistent banner with details
Failed-items expandable list

Mobile considerations:

Most bulk ops happen on desktop (better selection UX)
Mobile: simpler progress UI; allow background completion

Critical implementation rules:

Feedback within 1 second of click (even just "Starting...").
Show progress, not just spinner. Spinner = unknown state; progress bar = known state.
Allow navigation away. Don''t trap the customer.
Clear completion signal. Don''t make customer guess if it''s done.

The cancel option:

For long-running ops:

Cancel button on progress UI
Cancel signals worker; worker stops processing remaining items
Items already processed are not undone (but undo flow available afterward)
Mark op as cancelled; customer sees "Cancelled at X of Y"

Don''t:

Block the UI for the entire op
Show only "Loading..."
Hide failures during progress

Output:

The progress UI component
The polling / WebSocket update flow
The cancel mechanism
The completion notification


The biggest UX moment: **the first second after the customer clicks bulk action.** If they see immediate feedback (item count appearing in the progress, items disappearing from view), they trust the system. If they see a frozen spinner, they panic.

---

## 7. Audit Bulk Operations

Bulk ops are high-leverage; small mistakes affect many records. Audit thoroughly.

Design the audit integration.

Per Audit Logs, log:

Bulk op lifecycle:

bulk_operation.started — who, what action, what targets, count
bulk_operation.completed — success/failure counts
bulk_operation.cancelled — by whom, at what point
bulk_operation.undone — when undo executed
bulk_operation.failed — if globally failed

Per-item events (sample at high volume):

For destructive ops, log per-item:

item.bulk_deleted — record ID + previous state hash
item.bulk_archived — etc.

For 5,000-item ops, that''s 5,000 audit entries. Use:

Sample 1% for non-destructive ops
Log 100% for destructive ops (cheap insurance)

The "who" and "context":

Acting user
Workspace
IP / user agent
Operation ID linking back to bulk_operations
Trigger (UI / API / scheduled)

Customer-facing audit feed:

In /admin/audit (per audit-logs):

Show bulk operations prominently
Filterable by action type
Linkable to operation detail (what was affected)

Compliance / regulator support:

Audit logs answer "what happened to data X on date Y"
Especially important for destructive ops
Retain per account-deletion-data-export: 7-year retention typical for high-value events

Don''t:

Skip audit logging on bulk ops (these are high-leverage events)
Log raw sensitive data in audit (mask PII)
Forget to surface customer-visible audit views

Output:

The audit-event schema for bulk ops
The sampling strategy
The customer-facing audit view
The retention policy


The biggest "we need audit logs"-moment: **a customer claims their data was wrongly deleted.** Without audit, you can''t investigate. With it: "On [date] at [time], user X ran bulk-delete operation [ID] affecting these 5,000 items. Here''s the full record." Audit is your defense and the customer''s recourse.

---

## 8. Handle Edge Cases

Real bulk ops have weird cases. Plan.

The edge case checklist.

Edge case 1: Customer selects "all" but means "all on this page"

"Select all" checkbox ambiguous: is this 50 or 5,000?
Fix: explicit "Select all 5,000" link after page-select
Or: cap "select all" at visible items; require explicit "Select all matching filter"

Edge case 2: Filter changes mid-selection

Customer selects items; changes filter; "selected" set is now stale
Fix: capture IDs at click-time; ignore filter changes
Or: warn "Filter changed; selection cleared"

Edge case 3: Permissions change during op

User has permission to delete; mid-op, role changed
Fix: check permission per-item; skip un-authorized; report

Edge case 4: Item modified during op

Item modified by another user mid-bulk-op
Fix: optimistic locking with version IDs
On conflict: skip; report
Customer sees "47 items skipped (modified by others)"

Edge case 5: Cascade impacts

Bulk-deleting 100 projects cascades to 10K tasks
Fix: surface impact in confirmation dialog; "This will affect 10,047 records"

Edge case 6: External service fails during op

Bulk update of customer records also calls Stripe API
Stripe down: bulk op partially succeeds (DB updated; Stripe didn''t)
Fix: retry external calls; surface as failure if persistent

Edge case 7: Workspace-wide vs filtered "all"

"Delete all" without filter = 100K items
Fix: confirm explicitly with count; require typed confirmation

Edge case 8: Confirmation dialog text

"Are you sure?" is weak
Better: "You are about to delete 5,000 items. Type DELETE to confirm."

Edge case 9: API access to bulk ops

Customers calling bulk endpoints via API
Same rate limits; require auth scope per api-keys-chat
Document; provide examples

Edge case 10: Scheduled bulk operations

"Delete items older than 90 days" weekly
Different from interactive bulk; less risky if scoped carefully
Audit + alert on unusual volumes

Output:

Handling per edge case
The confirmation-dialog patterns
The locking strategy
The cascade-impact warning


---

## 9. Test Bulk Operations Carefully

Bulk ops affect many records; bugs are amplified. Test thoroughly.

Design the test suite.

The tests:

Unit tests:

Per-action handler: applies correctly
Permission checks
Rate limits

Integration tests:

End-to-end: trigger bulk op; verify all items processed correctly
Partial failure: some items fail; verify others succeed
Race conditions: items modified mid-op
Cancellation mid-op
Undo restores correctly

Load tests:

1K item op: timing, memory
10K item op: same
100K item op: same
Concurrent ops from multiple workspaces

Failure-mode tests:

Worker crashes mid-op: resume correctly
Database connection drops: retry
External service fails: handles gracefully

Customer-facing tests:

Manual exploratory: as a real user, click bulk action; verify UI matches expectations

Test data:

Sample data with realistic volumes
Edge cases: items with weird states, missing fields
Multi-workspace scenarios

Don''t:

Skip load tests (you''ll find out at scale)
Forget failure-mode tests
Test with too-small datasets

Output:

The test suite
The load-test scenarios
The failure-injection tests
The customer-facing exploratory test plan


---

## 10. Quarterly Review

Bulk ops accumulate edge cases. Quarterly review.

Quarterly review.

Usage metrics:

Bulk ops triggered per period
Volume distribution (small / medium / large)
Most-common actions
Failure rate per action

Performance:

Avg processing time per item
Worker utilization
Concurrency caps hit?

Customer impact:

Support tickets about bulk ops
Undo usage rate (high = customer regret pattern)
Cancellation rate (high = ops too slow)

Quality:

Most-common failure reasons
Items skipped due to race conditions

New actions:

Customer-requested bulk variants we don''t have
Action requests that should NOT have bulk

Output:

Snapshot
1-2 fixes
1 process improvement


---

## What "Done" Looks Like

A working bulk operations system in 2026 has:

- A bulk-action catalog with risk-assessed scope
- Async processing via background workers (per [background jobs](https://www.vibereference.com/backend-and-data/background-jobs-providers))
- Per-item status tracking with partial-success handling
- Undo for destructive operations within 24-hour window
- Per-workspace rate limits + concurrency caps
- Real-time progress UI (polling or WebSocket)
- Cancellation support during long ops
- Audit logging on every bulk event
- Confirmation dialogs for destructive ops with typed-confirmation
- Edge-case handling (filter-change / permission-change / cascade / external services)
- Test coverage including load tests
- Quarterly review baked in

The hidden cost of weak bulk ops: **a single accidental "delete all" that nukes a customer''s data.** Without undo + confirmation + audit, the recovery is hours of restoring from backup AND eroded customer trust forever. The infrastructure (async workers + undo storage + audit) is small; the protection it provides is enormous. Bulk operations are the highest-leverage product surface; build them with the most care.

---

## See Also

- [CSV Import Flows](csv-import-chat.md) — bulk creation flow
- [Account Deletion & Data Export](account-deletion-data-export-chat.md) — bulk export
- [Multi-Tenant Data Isolation](multi-tenancy-chat.md) — workspace boundary
- [Roles & Permissions (RBAC)](roles-permissions-chat.md) — permission per bulk action
- [Audit Logs](audit-logs-chat.md) — high-value events logged
- [Rate Limiting & Abuse Prevention](rate-limiting-abuse-chat.md) — bulk ops are abuse vector
- [API Keys & PATs](api-keys-chat.md) — programmatic bulk access
- [In-App Notifications](in-app-notifications-chat.md) — notify on completion
- [Real-Time Collaboration](real-time-collaboration-chat.md) — bulk ops affect collaborators
- [Background Jobs Providers](https://www.vibereference.com/backend-and-data/background-jobs-providers) — queue layer
- [Database Providers](https://www.vibereference.com/backend-and-data/database-providers) — bulk ops stress the DB

[⬅️ Growth Overview](README.md)