Customer Reports & Scheduled Exports — Chat Prompts

⬅️ Back to 6. Grow

If your B2B SaaS shows customers any data at all, sooner or later they'll ask: "Can I get this as a report?" Then: "Can it be emailed to me weekly?" Then: "Can I export to CSV / Excel for my own analysis?" Then: "Can my CFO get a monthly summary automatically?" The naive shape: "View" page in the app; customer screenshots it. Doesn't scale; loses formatting; not on the cadence they need. The right shape: a customer-facing reports + scheduled-export system that lets users define what data + format they want, schedule delivery (email / Slack / S3 / API), and trust the report will arrive consistently.

Customer reports are different from internal admin dashboards (you build dashboards FOR YOUR TEAM about customers; reports are for CUSTOMERS about THEIR own data). They're also different from embedded analytics (interactive in-product dashboards). Reports are generated artifacts: PDF, CSV, scheduled email digests with embedded charts, data feeds.

This chat walks through implementing real customer reports + scheduled exports: report definition + design, generation pipeline (CSV / PDF / email), scheduling, delivery channels, custom-formatted reports, edge cases (large data, time zones, branding), and the operational realities.

What you're building

Report definition system (what data, what filters, what format)
One-shot exports (customer clicks "Export"; gets file)
Scheduled reports (weekly/monthly/custom recurring)
Multiple output formats (CSV, Excel, PDF, JSON, email-embed)
Multiple delivery channels (download / email / Slack / S3 / webhook)
Template-based PDF reports with branding
Customer-facing UI for report management
Operational runbooks for large/slow reports

1. Decide the scope

Help me decide what shape of reporting to ship.

Three increasingly-deep shapes:

LEVEL 1: BASIC EXPORT (the simplest start)
- "Export to CSV" button on tables
- Customer downloads file
- One-shot; no scheduling
- Pros: ships in 1-2 weeks
- Cons: no automation; no PDF; no branding

LEVEL 2: SCHEDULED REPORTS + EMAIL DELIVERY
- Customer defines a report (data + filters)
- Schedule (daily / weekly / monthly)
- Delivered via email (CSV attached or PDF rendered or summary embedded)
- Pros: hands-off automation; high customer value
- Cons: 4-8 weeks; email-rendering challenges; PDF generation infra

LEVEL 3: BRANDED PDF REPORTS + MULTI-CHANNEL DELIVERY
- Pixel-perfect PDF reports with customer branding
- Delivery to Slack / S3 / webhook / custom destination
- Custom report templates per workspace
- Pros: enterprise-grade; high stickiness
- Cons: 12-24 weeks; branding system complex

LEVEL 4: SELF-SERVE REPORT BUILDER
- Customer drags fields, defines filters, picks chart types
- Visual report builder (Looker-Studio-style)
- Pros: ultimate flexibility
- Cons: 24+ weeks; build a query engine; high engineering investment

DEFAULT FOR MOST B2B SaaS:
- Year 1-2: Level 1 (CSV export)
- Year 2+: Level 2 (scheduled + email)
- Year 3+: Level 3 (branded PDF) when enterprise customers demand
- Skip Level 4 (use embedded analytics like Cube + Tremor; not a custom builder)

Output: scope statement; what's in v1.

Output: scope decision.

2. Design the report definition data model

Schema for reports:

reports (
  id              uuid pk
  workspace_id    uuid not null
  created_by      uuid not null
  name            text not null
  description     text
  type            text  -- 'tabular', 'summary', 'chart', 'composite'
  data_source     text  -- 'projects', 'deals', 'tickets', etc.
  filters         jsonb -- query filters
  fields          jsonb -- which columns to include
  group_by        jsonb -- aggregation grouping
  sort            jsonb
  output_format   text  -- 'csv', 'xlsx', 'pdf', 'json', 'html_email'
  template_id     uuid  -- optional PDF template (Level 3)
  schedule        jsonb -- null (one-shot) | { cron, timezone, next_run_at }
  delivery        jsonb -- { type: 'email'|'slack'|'webhook'|'s3', config: {...} }
  recipients      uuid[]  -- users to deliver to
  status          text  -- 'draft', 'active', 'paused', 'archived'
  last_run_at     timestamptz
  last_run_status text  -- 'success', 'failed', 'partial'
  next_run_at     timestamptz
  created_at      timestamptz
  updated_at      timestamptz
)

Indexes:
- (workspace_id, status='active') for run-due lookup
- (next_run_at) for scheduling
- (workspace_id, name)

report_runs (
  id              uuid pk
  report_id       uuid not null
  triggered_at    timestamptz
  triggered_by    text  -- 'schedule', 'manual', 'api'
  triggered_by_user uuid
  started_at      timestamptz
  completed_at    timestamptz
  status          text  -- 'pending', 'running', 'completed', 'failed'
  output_url      text  -- S3 / blob URL with TTL
  output_size_bytes bigint
  rows_generated  bigint
  error           text
  delivery_status jsonb  -- per-recipient: { user_id, channel, status, error }
)

INDEX (report_id, triggered_at DESC)
INDEX (status='running' OR status='pending') for monitoring

Why store filters/fields/group-by as JSONB:
- Flexible; matches your data-model query patterns
- Easy to evolve schema without migration
- Validates at app layer

Why separate report (definition) from report_runs (execution):
- Report definition is stable + versioned
- Runs are events (history, audit, debug)
- Scaling: trim run history; keep definitions

Implement:
1. Migration for reports + report_runs
2. Filter / field / group-by JSON schemas
3. Validation at write
4. RLS for multi-tenancy
5. Indexes for run-scheduling queries

Output: schema for definition + execution.

3. Build the report generation pipeline

Now build the actual report execution.

Architecture:

1. Trigger (manual / scheduled / API)
2. Worker fetches report definition
3. Worker executes query (against your DB / data warehouse)
4. Worker generates output file (CSV / Excel / PDF)
5. Worker uploads to storage (S3 / blob)
6. Worker delivers via configured channel
7. Worker updates report_runs

Use a durable workflow runtime: Inngest / Trigger.dev / Temporal / Vercel Workflow.

Pseudo-code:

async function executeReport(reportId: string, triggeredBy: TriggerInfo) {
  const report = await db.reports.findById(reportId)
  const run = await db.reportRuns.create({
    report_id: reportId,
    triggered_at: new Date(),
    triggered_by: triggeredBy.type,
    status: 'pending',
  })
  
  try {
    run.started_at = new Date()
    run.status = 'running'
    await db.reportRuns.update(run)
    
    // Step 1: Execute query
    const data = await queryData(report)
    
    // Step 2: Generate output
    const output = await generateOutput(data, report.output_format, report.template_id)
    
    // Step 3: Upload to storage
    const url = await uploadToBlob(output, {
      filename: `${report.name}-${formatDate(new Date())}.${ext(report.output_format)}`,
      ttl: 30 * DAY,  // signed URL expires
      contentType: contentTypeFor(report.output_format),
    })
    
    run.output_url = url
    run.output_size_bytes = output.size
    run.rows_generated = data.length
    
    // Step 4: Deliver
    const deliveryResults = await deliverReport(report, run, url)
    run.delivery_status = deliveryResults
    
    // Step 5: Update status
    run.status = 'completed'
    run.completed_at = new Date()
    
    await db.reportRuns.update(run)
    
    // Step 6: Schedule next run (if recurring)
    if (report.schedule) {
      await db.reports.update({ id: reportId, next_run_at: computeNextRun(report.schedule) })
    }
  } catch (err) {
    run.status = 'failed'
    run.error = err.message
    await db.reportRuns.update(run)
    throw err
  }
}

Key concerns:

A. Query timeout
- Reports may need to read millions of rows
- Set per-report timeout (default 5 min; configurable)
- Use streaming queries; don't load all into memory
- For very large: paginate the export across multiple files

B. Memory management
- Stream CSV / Excel / PDF generation
- Don't build entire output in memory for large reports
- Use libraries that support streaming (papaparse for CSV; exceljs for Excel; @react-pdf/renderer or puppeteer for PDF)

C. Worker isolation
- Reports run in background jobs (not request path)
- Per-worker timeout; failure-handling
- Scale workers separately from API

D. Storage + URL TTL
- Upload to S3/blob with signed URL (default 30 days)
- Store URL + run metadata
- Link expires; customer can request re-generation if needed

E. Idempotency
- Use deterministic file names per (report_id, triggered_at)
- If retry: use same filename (overwrite); avoid duplicates

Implement:
1. Query function (streams data per report config)
2. Output generators per format (CSV, Excel, PDF)
3. Delivery dispatchers per channel
4. Workflow runtime integration
5. Error handling + retry
6. Memory + timeout safeguards

Output: a generation pipeline that scales.

4. Implement output formats

Per-format generation:

CSV:
- Use streaming library (papaparse / fast-csv)
- Headers row from field definitions
- Encode special chars correctly
- BOM for Excel-compatibility (UTF-8 BOM)
- Stream-write to file or blob

Excel (XLSX):
- Use exceljs (Node) or similar
- Stream-write rows
- Multiple sheets if needed (one per data slice)
- Number formatting (currency, dates)
- Frozen header row

PDF:
- Two approaches:
  a. HTML → PDF (Puppeteer / @sparticuz/chromium / Browserless)
     - Render React components to HTML; print to PDF
     - Best for visually-rich reports with branding
     - Slower (1-3 seconds per page)
  b. Programmatic (PDFKit / @react-pdf/renderer)
     - Build PDF instructions directly
     - Faster; less visual flexibility
- Pick: HTML→PDF for branded reports; programmatic for table dumps

For HTML→PDF on Vercel:
- Use @sparticuz/chromium with puppeteer-core in Node.js function
- Or: external service (Browserless / DocRaptor / PDFShift)
- Or: rendering-as-a-service (e.g., HTMLcss-to-pdf)

Sample PDF rendering with Puppeteer:

import puppeteer from 'puppeteer-core'
import chromium from '@sparticuz/chromium'

async function generatePDFReport(report, data) {
  const html = await renderReportHTML(report, data)
  
  const browser = await puppeteer.launch({
    args: chromium.args,
    executablePath: await chromium.executablePath(),
    headless: true,
  })
  
  const page = await browser.newPage()
  await page.setContent(html, { waitUntil: 'networkidle0' })
  
  const pdf = await page.pdf({
    format: 'A4',
    printBackground: true,
    margin: { top: '20mm', bottom: '20mm', left: '15mm', right: '15mm' },
  })
  
  await browser.close()
  return pdf
}

JSON:
- Stream-write JSON to file
- Use ndjson for streaming-friendly format
- Best for API consumers / data ingestion

HTML email:
- Rendered template with embedded charts (as <img> from Cloudinary or Mailtrap-rendered)
- Inline CSS (use juice.js or similar)
- Email-client-safe HTML (test in Litmus / Email on Acid)

Implement:
1. CSV generator (streaming)
2. Excel generator (streaming)
3. PDF generator (Puppeteer or programmatic)
4. JSON generator
5. HTML-email generator
6. Format validation (don't try to PDF a 100K-row dataset)

Output: format generators that handle real data.

5. Implement delivery channels

Per-delivery:

EMAIL DELIVERY:
- Use your email provider (Resend / Postmark / SendGrid)
- Subject: "{Report name} - {date}"
- Body: brief summary + download link OR inline content
- Attachments: CSV / PDF (size-limited; some providers cap 10-25MB)
- For large: link only; no attachment
- Per-recipient sending (each user gets their own email)
- Track: bounces, opens, link clicks

SLACK DELIVERY:
- Use Slack Web API (chat.postMessage with file upload)
- Customer connects Slack to their workspace via OAuth
- Channel selected per report
- Format: structured message with chart preview + download link
- Use Slack's "blocks" format for rich rendering

S3 / OBJECT STORAGE DELIVERY:
- Customer provides S3 bucket + IAM role / access keys
- Worker uploads to their bucket
- Path-pattern: /reports/{date}/{report-name}.csv
- For enterprise customers running data pipelines

WEBHOOK DELIVERY:
- Customer provides webhook URL
- Worker POSTs report data + metadata to URL
- Format: JSON with download link OR full data
- Retry with backoff on failure
- Sign payloads (HMAC) so customer can verify

DOWNLOAD-ONLY:
- No automatic delivery
- Customer triggers + downloads via UI
- Generated URL valid 30 days

API DELIVERY:
- Customer requests via API (GET /api/reports/[id]/runs/[run_id]/data)
- Returns latest run data
- Useful for customer-side automation

DELIVERY CONFIGURATION (per channel):

Email:
{ "channel": "email", "config": { "subject_template": "{report.name} - {date}" } }

Slack:
{ "channel": "slack", "config": { "channel_id": "C123", "include_summary": true } }

S3:
{ "channel": "s3", "config": { "bucket": "...", "path_pattern": "...", "credentials_secret_id": "..." } }

Webhook:
{ "channel": "webhook", "config": { "url": "...", "auth_type": "hmac", "secret_id": "..." } }

Implement:
1. Per-channel dispatcher
2. Configuration UI per channel
3. Connection testing (verify Slack OAuth, S3 access, webhook URL)
4. Retry logic per channel
5. Failure notifications to customer

Output: multi-channel delivery.

6. Build the customer-facing UI

Customer-side UI for reports.

Page: /reports

Layout:
- List of reports (grouped: scheduled / one-shot)
- "Create report" button → wizard
- Per-report row: name, schedule, last-run status, next-run-time, actions

Create-report wizard:

Step 1: Choose data source
- Pick from available data: Projects, Deals, Tickets, Users, Custom (if custom-fields)

Step 2: Define filters
- Visual filter builder (similar to saved-views)
- "Status = Open" "Created in last 30 days"

Step 3: Pick fields / columns
- Multi-select from available fields
- Reorder via drag-drop

Step 4: Group by + sort
- Optional aggregation
- "Group by Owner; Sum of Value"

Step 5: Pick output format
- CSV / Excel / PDF / JSON
- Preview if possible

Step 6: Schedule
- "Run once now" OR
- Recurring: Daily / Weekly / Monthly / Custom cron
- Time + timezone
- Day of week (for weekly)

Step 7: Delivery
- Email (pick recipients)
- Slack (pick channel)
- S3 (configure)
- Webhook (configure)
- Download only

Step 8: Confirm
- Summary
- "Run once now" button
- "Save + activate"

Run history per report:
- Timestamp + status
- Download link (if not expired)
- Re-run button

Edit report:
- Same wizard; pre-filled
- Pause / unpause / archive

Implement:
1. Reports list page
2. Create-report wizard
3. Per-step UX (filter builder, field picker, etc.)
4. Run history view
5. Edit / archive flow
6. Connection-setup flows (Slack OAuth, S3 creds, etc.)

Output: a UX customers actually use.

7. Operational + edge cases

Walk me through:

1. Report timeout (large dataset)
- Default timeout 5 min
- For larger: paginate output (multiple files)
- Or: ETL-style approach (export to S3 in chunks)
- Email customer with progress link

2. Email size limit (CSV > 25 MB)
- Don't attach; link only
- Customer downloads via signed URL
- TTL longer for large files

3. Customer's Slack token revoked
- Detection: API returns auth_error
- Notify customer via in-app + email
- Pause report; require re-auth

4. S3 credentials expired / invalid
- Detection: upload error
- Mark delivery failed
- Retry next scheduled run; if persistent: notify

5. Webhook returns non-2xx
- Retry with exponential backoff (3 attempts; 5min, 30min, 2hr)
- After 3 failures: mark failed; notify customer

6. Customer changes timezone
- Existing schedule interpreted in customer's NEW timezone (or remain in old; pick a rule)
- Confusing; document clearly
- Recommend: schedule in IANA tz; tz from user's profile

7. Customer adds 50 reports each running daily
- Quota: max active scheduled reports per workspace per plan tier
- Free: 5; Pro: 25; Enterprise: 100
- Plan-tier check at creation

8. Report fails repeatedly (e.g., bad query)
- After 3 consecutive failures: pause report; notify customer
- Diagnose: query error / data missing / permission

9. Workspace deleted; reports orphaned
- Cascade-delete all reports + runs
- Don't orphan delivery configs

10. Time-zone shifts (DST)
- Schedule in IANA tz; honor DST automatically
- Don't use UTC offsets directly

11. Customer wants historical data
- Default report scope: based on filter (e.g., last 30 days)
- Backfill: customer requests one-shot run for past period

12. Sensitive data in reports (PII)
- Same access controls as in-product
- Don't email PII to non-authorized recipients
- Encrypt at rest in S3 if applicable

13. Audit + compliance
- Log every report run + delivery
- Customer audit-export of report-run history

14. Multi-currency / multi-language
- Per-recipient localization
- Currency from workspace / user settings

For each: code change + UX impact + ops consideration.

Output: a system that handles real customer load.

8. Recap

What you've built:

Report definition data model + schema
Generation pipeline (durable workflow)
Output formats: CSV, Excel, PDF, JSON, HTML email
Delivery channels: email, Slack, S3, webhook, download
Customer-facing wizard + reports list
Per-tier quotas
Audit + run-history
Operational handling (timeouts, retries, failures)

What you're explicitly NOT shipping in v1:

Visual drag-drop report builder (Level 4; defer; use embedded analytics)
Custom SQL editor (defer; security concerns)
Charts in-product (use embedded analytics; see Embedded Analytics Platforms (Reference))
Customer-uploadable PDF templates (Level 3; defer)
Multi-tenant report-shared-across-customers (anti-pattern)

Ship Level 1 (CSV export) in 1-2 weeks. Add Level 2 (scheduled + email) in 4-8 weeks. Add Level 3 (branded PDF) when enterprise customers ask + pay.

The biggest mistake teams make: building a custom visual report builder. Tons of engineering for narrow value; embedded analytics platforms (Cube + Tremor / Explo / Metabase Embedded) do this better. Use them for interactive analytics; use scheduled reports for delivered artifacts.

The second mistake: PDF generation in request path. PDFs take seconds to render; do in background.

The third mistake: not planning for large data. CSV export of 1M rows blows up memory if you load all at once. Stream-write from day one.