PDF Generation In-App: Invoices, Reports, Contracts Without Drowning in PDF Hell

⬅️ Day 6: Grow Overview

If you're shipping a SaaS in 2026, your customers will eventually need a PDF — invoice, monthly report, signed contract, packing slip, certificate, exported document. PDF generation looks like a simple feature; it's not. The choices range from "use a headless browser to print HTML" (works but heavy) to "use a PDF library directly" (faster but fragile) to "buy a PDF API" (easiest but per-document cost adds up). Most indie SaaS pick wrong on the first attempt: ship Puppeteer in production, hit memory leaks, switch to a vendor at $0.01/PDF, then panic when the bill hits $500/mo.

A working PDF strategy answers: which approach (browser-render / PDF library / PDF-as-a-service), what content types (invoices / reports / contracts / certificates), how do you template (HTML+CSS / Markdown / DSL), how do you handle page breaks / pagination / headers, how do you handle async generation for large docs, how do you store and deliver, and how do you handle versioning of templates.

This guide is the implementation playbook for PDF generation. Companion to Email Template Implementation, File Uploads, Cron & Scheduled Tasks, and Account Deletion & Data Export.

Why This Matters

Get the use cases clear first.

Help me categorize PDF needs.

The categories:

**1. Invoices / receipts**
- Generated on payment events
- Brand-consistent
- Itemized line items; tax handling
- 1-2 pages typically
- Need: customer-facing, archivable, regulatorily-acceptable

**2. Monthly / weekly reports**
- Generated on schedule
- Custom data per customer
- Multi-page; charts; tables
- Often 5-50 pages
- Need: scheduled bulk generation; well-designed for skim-reading

**3. Contracts / quotes / SOWs**
- Generated on action (deal flow)
- Custom per customer + signature blocks
- Page-numbered; legal-grade
- Need: integration with e-signature (DocuSign / Dropbox Sign / Zoho)

**4. Certificates / badges / awards**
- Generated on completion (course / training / membership)
- Personalized
- 1 page typically; design-heavy
- Need: high-quality print; embedded fonts

**5. Exports of data / charts**
- Generated on user action ("Download as PDF")
- Snapshots of dashboards / reports
- Variable size
- Need: matches in-app rendering; not laggy

**6. Compliance / regulatory**
- Generated for audit / legal events
- Strict format requirements
- Sometimes pre-defined templates (tax forms etc.)
- Need: deterministic; archived; tamper-evident

**7. Tickets / passes / labels**
- Event tickets; shipping labels; gift cards
- Often barcode/QR-encoded
- Mobile-first; printer-friendly
- Need: precise sizing; barcode reliability

For my app:
- Which categories
- Volume per category

Output:
1. PDF need inventory
2. Volume estimates
3. Quality requirements

The biggest unforced error: treating all PDF needs as one problem. Invoices need different infra than 50-page reports. Mixing them into one "PDF service" creates a tangled mess. Categorize first.

The Three Approaches

Help me pick an approach.

The three:

**1. HTML → PDF (browser-rendered)**

Approach: write HTML+CSS; render via Chrome (Puppeteer / Playwright / Chromium); save as PDF.

Pros:
- HTML+CSS is what you know; charts/tables/styling are easy
- Pixel-accurate to web rendering
- Same template for web view + PDF
- Page breaks, headers, footers via CSS print rules

Cons:
- Heavy: 200-500MB Chrome install
- Slow: 2-5s per PDF (browser cold-start)
- Memory leaks under load
- Difficult to deploy on serverless (Lambda 10GB cap; cold starts brutal)

Tools:
- **Puppeteer** (Chrome headless) — Node.js
- **Playwright** (multi-browser) — Node.js / Python
- **WeasyPrint** (no browser; HTML/CSS engine) — Python
- **Browserless / ScrapingBee** — managed Chrome-as-a-service
- **wkhtmltopdf** — older; deprecated; avoid

Best for: invoice / report / dashboard-export use cases where HTML-fidelity matters

**2. PDF library (programmatic)**

Approach: build PDF directly in code; place text/images at coordinates.

Pros:
- Fast (10ms-100ms per PDF)
- Lightweight; runs on serverless without issues
- Deterministic output

Cons:
- Coordinate-based layout is painful
- Rich layouts (tables, multi-column, charts) require lots of code
- Different mental model than HTML

Tools:
- **PDFKit** (Node.js) — most popular
- **pdf-lib** (Node.js) — modify existing PDFs
- **ReportLab** (Python) — robust; batteries included
- **PyPDF / pypdf** (Python) — read/modify
- **iText** (Java/.NET) — enterprise-grade; AGPL or commercial
- **gofpdf / unipdf** (Go)
- **Maroto** (Go) — higher-level

Best for: invoices, receipts, simple structured docs, high-volume

**3. PDF-as-a-service (managed)**

Approach: pay an API; send template + data; receive PDF.

Pros:
- Zero infrastructure
- Templates often have UI builder
- Fast; reliable; no scaling concerns
- Often includes signing / delivery features

Cons:
- Per-document cost ($0.01-$0.10 typical)
- Vendor lock-in
- Network latency
- Templates may not match your needs

Tools (see VibeReference: pdf-document-generation-tools):
- **PDFMonkey** — modern API
- **DocRaptor** — older HTML-to-PDF API
- **APITemplate** — template-based
- **Carbone** — fast template engine
- **DocuPilot** — workflow-heavy
- **PDFShift** — HTML-to-PDF API
- **Anvil** — webform → PDF; has e-sign

Best for: low volume; quick start; teams without ops capacity

The decision matrix:

| Use case | Volume | Approach |
|---|---|---|
| Invoice (1-2 page) | 100/mo | PDF library (PDFKit) |
| Invoice | 10K/mo | PDF library or batch service |
| Report (10+ page) | 100/mo | HTML → PDF (Puppeteer/Playwright) |
| Report | 10K/mo | HTML → PDF on dedicated infra OR batch service |
| Custom contract w/ e-sign | any | PDF-as-a-service (Anvil / DocuSign Builder) |
| Dashboard export | any | HTML → PDF |
| High-volume tickets | 100K+/mo | PDF library |

For my app:
- Use cases
- Volume
- Ops capacity

Output:
1. Recommended approach per use case
2. Library / vendor pick
3. Migration path if currently wrong

The 2026 default for most indie SaaS: PDFKit (or ReportLab in Python) for invoices, Playwright for reports, vendor for contracts. Don't over-unify; pick per use case.

HTML → PDF: Playwright Pattern

Help me set up Playwright PDF generation.

The pattern (Node.js / Next.js):

```typescript
import { chromium } from 'playwright';

async function generatePDF(html: string): Promise<Buffer> {
  const browser = await chromium.launch({ headless: true });
  try {
    const page = await browser.newPage();
    await page.setContent(html, { waitUntil: 'networkidle' });
    
    const pdf = await page.pdf({
      format: 'A4',
      printBackground: true,
      margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' },
      displayHeaderFooter: true,
      headerTemplate: '<div style="font-size: 10px; padding: 0 20mm;">Acme Corp</div>',
      footerTemplate: `<div style="font-size: 10px; padding: 0 20mm;">
        Page <span class="pageNumber"></span> of <span class="totalPages"></span>
      </div>`,
    });
    
    return pdf;
  } finally {
    await browser.close();
  }
}

The HTML template:

<!DOCTYPE html>
<html>
<head>
  <style>
    @page { size: A4; margin: 20mm; }
    body { font-family: -apple-system, sans-serif; font-size: 12pt; }
    .invoice-header { display: flex; justify-content: space-between; }
    .line-item { display: grid; grid-template-columns: 1fr 1fr 1fr 1fr; }
    .total { font-weight: bold; border-top: 2px solid black; }
    .page-break { page-break-after: always; }
    @media print {
      .no-print { display: none; }
    }
  </style>
</head>
<body>
  <div class="invoice-header">
    <h1>Invoice #{{invoice_number}}</h1>
    <p>{{date}}</p>
  </div>
  
  <table>
    <thead><tr><th>Item</th><th>Qty</th><th>Price</th><th>Total</th></tr></thead>
    <tbody>
      {{#each items}}
      <tr><td>{{name}}</td><td>{{qty}}</td><td>{{price}}</td><td>{{total}}</td></tr>
      {{/each}}
    </tbody>
  </table>
  
  <div class="total">Total: {{grand_total}}</div>
</body>
</html>

Deployment considerations:

Local dev: works great.

Vercel / Netlify (serverless): Playwright too heavy for default function size. Solutions:

Vercel: use @sparticuz/chromium-min — slimmed Chromium for Lambda-class environments
Browserless.io — managed Playwright/Puppeteer; $50-200/mo
AWS Lambda layer with Chromium binary
Dedicated worker service (Render / Fly / Railway) with full Playwright

Memory: 512MB-1GB per Chrome instance. Plan capacity.

Handling concurrency:

Don't launch a new browser per request. Pool browsers:

const browserPool = new BrowserPool({ min: 1, max: 5 });
async function generate(html) {
  const browser = await browserPool.acquire();
  try {
    // ... generate ...
  } finally {
    browserPool.release(browser);
  }
}

Async generation pattern:

For reports > 5s to generate:

User clicks "Download report"
Backend creates job → returns job_id
Job runs async (Vercel Workflow / BullMQ / Inngest / SQS)
On complete: store in S3 / Vercel Blob; email link OR signal frontend via WebSocket / SSE
Frontend polls job status OR receives push

Don't make user wait synchronously past 10s.

For my stack:

Hosting
Concurrency expected
Async infrastructure

Output:

Setup
Template
Deploy plan
Async architecture


The mistake most teams make: **synchronous PDF generation of 30-page reports**. Browser takes 10-20s; user's connection times out; everyone's confused. Fire-and-forget; email link.

## PDF Library: PDFKit Pattern

Help me set up PDFKit for invoices.

The pattern (Node.js):

import PDFDocument from 'pdfkit';

function generateInvoice(invoice: Invoice): Buffer {
  return new Promise((resolve, reject) => {
    const doc = new PDFDocument({ size: 'A4', margin: 50 });
    const buffers: Buffer[] = [];
    
    doc.on('data', (b) => buffers.push(b));
    doc.on('end', () => resolve(Buffer.concat(buffers)));
    doc.on('error', reject);
    
    // Header
    doc.fontSize(20).text('Invoice', 50, 50);
    doc.fontSize(10).text(`#${invoice.number}`, 400, 50, { align: 'right' });
    doc.text(invoice.date.toLocaleDateString(), 400, 65, { align: 'right' });
    
    // Customer block
    doc.moveDown(2);
    doc.text(`Bill to:`, 50, doc.y);
    doc.text(invoice.customer.name, 50, doc.y + 15);
    doc.text(invoice.customer.address, 50, doc.y + 15);
    
    // Line items table
    const tableTop = doc.y + 40;
    doc.text('Item', 50, tableTop);
    doc.text('Qty', 300, tableTop);
    doc.text('Price', 350, tableTop);
    doc.text('Total', 450, tableTop);
    
    let y = tableTop + 20;
    for (const item of invoice.items) {
      doc.text(item.name, 50, y);
      doc.text(item.qty.toString(), 300, y);
      doc.text(`$${item.price}`, 350, y);
      doc.text(`$${item.qty * item.price}`, 450, y);
      y += 20;
    }
    
    // Total
    doc.moveTo(50, y).lineTo(545, y).stroke();
    doc.fontSize(12).text(`Total: $${invoice.total}`, 350, y + 10);
    
    doc.end();
  });
}

The trade-offs:

10x faster than Playwright (no browser startup)
Bundle size: ~500KB vs Playwright's 200MB+
Layout is coordinate-based — pain for complex docs
Tables / multi-page / page breaks need manual handling

Page breaks:

if (doc.y > 700) {
  doc.addPage();
}

PDFKit has bufferPages: true option for going back and adding to earlier pages (e.g. "Page X of Y" in footer; needs total page count which you only know at end).

Charts / images:

Embed:

doc.image('logo.png', 50, 50, { width: 100 });

For dynamic charts: render to PNG (Chart.js with node-canvas; or SVG → PNG) then embed.

Fonts:

Embed custom fonts:

doc.font('fonts/Inter-Regular.ttf');

Multi-page footer pattern:

doc.bufferedPageRange(); // {start: 0, count: N}
const range = doc.bufferedPageRange();
for (let i = range.start; i < range.start + range.count; i++) {
  doc.switchToPage(i);
  doc.fontSize(8).text(
    `Page ${i + 1} of ${range.count}`,
    50, doc.page.height - 30,
    { width: 495, align: 'center' }
  );
}

For my use case: [invoice / receipt / etc.]

Output:

PDFKit setup
Layout code
Multi-page handling
Test approach


The performance reality: **PDFKit handles 100+ PDFs/second** on modest hardware. For invoice/receipt volume, this is more than enough. Reach for it before reaching for browser-render.

## Templating: Don't Hard-Code Layout

Help me set up templates.

The principle: separate template from data so non-engineers can edit.

The approaches:

1. Handlebars / Mustache for HTML

Template (handlebars-syntax):

<h1>Invoice {{number}}</h1>
<p>Bill to: {{customer.name}}</p>
{{#each items}}
  <tr><td>{{name}}</td><td>{{qty}}</td></tr>
{{/each}}

Compile + render:

import Handlebars from 'handlebars';
const template = Handlebars.compile(templateString);
const html = template({ number: 'INV-001', customer: {...}, items: [...] });
const pdf = await playwrightHTMLToPDF(html);

2. React / JSX for HTML

Use React Server Components or react-dom/server to render:

function InvoiceTemplate({ invoice }: { invoice: Invoice }) {
  return (
    <html>
      <body>
        <h1>Invoice {invoice.number}</h1>
        {invoice.items.map(item => <Row key={item.id} item={item} />)}
      </body>
    </html>
  );
}

const html = renderToStaticMarkup(<InvoiceTemplate invoice={data} />);

3. Markdown → HTML → PDF

For reports authored by non-engineers:

Author writes Markdown
Engine: marked / unified
Then HTML → PDF

4. Database-stored templates

Store template in DB as Handlebars / MJML / Markdown:

Customer-account-specific templates
Versioning per template
Admin UI to edit

5. Vendor template UI

Most PDF-as-a-service vendors have visual builders. Pro: non-technical edits. Con: lock-in to vendor.

Versioning:

When template changes, old invoices regenerated should match the version they were generated under. Pattern:

CREATE TABLE invoice_templates (
  id UUID PRIMARY KEY,
  version INT NOT NULL,
  body TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE invoices (
  id UUID PRIMARY KEY,
  template_id UUID REFERENCES invoice_templates,
  template_version INT NOT NULL,
  generated_pdf_url TEXT,
  ...
);

When regenerating invoice #X, use template_version pinned to it. New invoices use latest template.

For my app:

Template content authority (eng / non-eng?)
Versioning needs

Output:

Template engine pick
Versioning schema
Editing workflow


The discipline: **never hard-code layout in business logic**. Even if template-engine is overkill today, having a separation makes "let's rebrand the invoice" a 5-min change instead of a 5-day deploy.

## Storage and Delivery

Help me handle storage and delivery.

The flow:

Generate:

PDF blob in memory

Store:

Upload to S3 / Vercel Blob / Cloudflare R2
Naming: invoices/{invoice_id}.pdf or tenant_id/...
Set permissions: private (require signed URL to download)

Deliver:

Option A: Direct download

Generate signed URL (presigned S3 / Vercel Blob with expiry)
Return to user; user downloads
Cache headers: short-lived

Option B: Email attachment

Generate; attach to email; send
Watch attachment size limits (most providers: 25MB)
For larger: send link to download

Option C: In-app delivery

Show in customer's "Documents" tab
Lazy-load; signed URL on click

Schema:

CREATE TABLE generated_documents (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users,
  type VARCHAR(50) NOT NULL,  -- 'invoice', 'report', 'contract'
  related_id UUID,  -- e.g. invoice_id this PDF is for
  storage_url TEXT,  -- s3://... or blob://...
  size_bytes BIGINT,
  generated_at TIMESTAMPTZ DEFAULT NOW(),
  expires_at TIMESTAMPTZ  -- for ephemeral docs
);

Retention:

Invoices: 7+ years (regulatory in most jurisdictions)
Reports: per business need; often 3 years
Contracts: indefinite or per legal team
Ephemeral exports: 30-90 days

Tamper-evidence (compliance):

For invoices/contracts, hash and log:

const hash = sha256(pdfBuffer);
await auditLog.insert({ doc_id, hash, generated_at });

If document re-fetched and hash differs, evidence of tampering.

CDN:

Don't serve PDFs from your origin server. Use CDN (S3 + CloudFront / Cloudflare R2 + Cloudflare CDN / Vercel Blob auto-CDN).

For my stack: [storage / CDN]

Output:

Storage setup
Delivery pattern
Retention policy
Tamper-evidence (if regulated)


The tamper-evidence detail most teams skip: **hashing + logging at generation**. For invoices and contracts in regulated jurisdictions (EU e-invoicing, US tax docs), this isn't optional. Cheap to add at generation; expensive to retrofit.

## Common PDF Pitfalls

Help me avoid PDF pitfalls.

The 10 mistakes:

1. Shipping Puppeteer in serverless without size optimization Lambda 10GB limit; default Puppeteer ~300MB; cold starts brutal. Fix: @sparticuz/chromium-min OR move to dedicated worker.

2. Synchronous generation of 50-page reports User browser times out at 60s on Vercel; 30s on Cloudflare. Fix: async job; email link.

3. Memory leaks from unclosed browser instances Each Playwright launch leaks if not closed. Server OOMs. Fix: try/finally with browser.close(); pool browsers.

4. Hardcoded layout in business logic "Update invoice template" = code deploy. Fix: template engine (Handlebars / React / Markdown).

5. No versioning of templates Re-generate invoice from 2024 with 2026 template = visually different. Fix: pin template version per invoice.

6. Embedding fonts via web URLs Browser without internet = missing fonts. Fix: bundle fonts locally; embed in PDF.

7. Unicode / emoji failures Default fonts don't support all Unicode; emoji renders as boxes. Fix: use a Unicode-coverage font (e.g. Noto Sans).

8. Page break in middle of table row Default browser breaks rows. Fix: CSS page-break-inside: avoid on rows.

9. Forgotten print background Backgrounds stripped by default in browser-print mode. Fix: printBackground: true in Playwright page.pdf() options.

10. PDF size bloat Embedding 5000×3000 PNG when 800×600 would do. Fix: resize images to actual rendered size before embed.

For my app: [PDF use cases]

Output:

Top 3 risks
Mitigations
Tests to add


The single most-painful production bug: **memory leak from un-closed Puppeteer instances under load**. Server runs fine; gradually accumulates browsers; OOMs in week 3. Always close in `finally`; consider browser pooling for >1 PDF/sec.

## Testing PDF Generation

Help me test PDF generation.

The test categories:

1. Snapshot tests

Generate PDF for known fixture
Hash; compare to stored hash
Detects unintended changes

2. Visual regression tests

Generate PDF; convert to image (pdf2image)
Compare to reference image (pixelmatch)
Catches layout drift

3. Content tests

Extract text from PDF (pdf-parse)
Assert specific content present
Useful for: "invoice contains line items"

4. Page count tests

Generate; assert page count for known input
Catches table-overflow regressions

5. Edge cases

Long customer name (overflow)
0 line items (empty state)
100 line items (page break)
Unicode customer name (font handling)
Very large prices (number formatting)

6. Performance tests

Time to generate (target: <2s for invoice; <30s for report)
Memory usage at concurrency

7. Async pipeline tests

Job created → job runs → file in S3 → email sent
End-to-end test

For my pipeline: [tests today]

Output:

Test list
Tools (pdf-parse, pdf2image, vitest, etc.)
CI integration


The single most-useful test: **content extraction + assertion**. "Generated invoice contains the invoice number" is dead-simple but catches 80% of regressions. Don't skip it because it's not visual.

## What Done Looks Like

A working PDF generation setup:
- Approach picked per use case (PDFKit for invoices, Playwright for reports, vendor for contracts)
- Templates separated from business logic; versioned
- Async pipeline for >5s generation
- Storage in S3 / Blob with signed-URL delivery
- Memory-leak-safe (browser close + pooling)
- Tests cover happy path + edge cases (overflow, empty, unicode)
- Tamper-evidence for regulated docs (hash + audit log)
- Retention policy enforced (auto-delete ephemeral; preserve regulatory)
- Performance acceptable: invoice in <500ms; report in <30s
- Monitoring: success rate, avg time, P95 time, queue depth

The proof you got it right: a customer says "I need my Q3 invoices for accounting"; you generate 90 days of invoices in 30s; they email; the customer's accountant doesn't ask follow-up questions about formatting.

## See Also

- [Email Template Implementation](email-template-implementation-chat.md) — companion content-rendering concern
- [File Uploads](file-uploads-chat.md) — companion to PDF storage
- [Cron & Scheduled Tasks](cron-scheduled-tasks-chat.md) — scheduled report generation
- [Account Deletion & Data Export](account-deletion-data-export-chat.md) — data exports often want PDF format
- [Background Jobs Providers](https://vibereference.dev/backend-and-data/background-jobs-providers) — async generation infrastructure
- [PDF Document Generation Tools](https://vibereference.dev/backend-and-data/pdf-document-generation-tools) — vendor landscape
- [File Storage Providers](https://vibereference.dev/cloud-and-hosting/file-storage-providers) — S3 / Vercel Blob / R2
- [Audit Logs](audit-logs-chat.md) — tamper-evidence on regulated documents
- [Email Deliverability](email-deliverability-chat.md) — emailing invoices via Resend / SES / Postmark
- [Tax & VAT Handling](tax-vat-handling-chat.md) — invoice content compliance