PDF Generation In-App: Invoices, Reports, Contracts Without Drowning in PDF Hell
If you're shipping a SaaS in 2026, your customers will eventually need a PDF — invoice, monthly report, signed contract, packing slip, certificate, exported document. PDF generation looks like a simple feature; it's not. The choices range from "use a headless browser to print HTML" (works but heavy) to "use a PDF library directly" (faster but fragile) to "buy a PDF API" (easiest but per-document cost adds up). Most indie SaaS pick wrong on the first attempt: ship Puppeteer in production, hit memory leaks, switch to a vendor at $0.01/PDF, then panic when the bill hits $500/mo.
A working PDF strategy answers: which approach (browser-render / PDF library / PDF-as-a-service), what content types (invoices / reports / contracts / certificates), how do you template (HTML+CSS / Markdown / DSL), how do you handle page breaks / pagination / headers, how do you handle async generation for large docs, how do you store and deliver, and how do you handle versioning of templates.
This guide is the implementation playbook for PDF generation. Companion to Email Template Implementation, File Uploads, Cron & Scheduled Tasks, and Account Deletion & Data Export.
Why This Matters
Get the use cases clear first.
Help me categorize PDF needs.
The categories:
**1. Invoices / receipts**
- Generated on payment events
- Brand-consistent
- Itemized line items; tax handling
- 1-2 pages typically
- Need: customer-facing, archivable, regulatorily-acceptable
**2. Monthly / weekly reports**
- Generated on schedule
- Custom data per customer
- Multi-page; charts; tables
- Often 5-50 pages
- Need: scheduled bulk generation; well-designed for skim-reading
**3. Contracts / quotes / SOWs**
- Generated on action (deal flow)
- Custom per customer + signature blocks
- Page-numbered; legal-grade
- Need: integration with e-signature (DocuSign / Dropbox Sign / Zoho)
**4. Certificates / badges / awards**
- Generated on completion (course / training / membership)
- Personalized
- 1 page typically; design-heavy
- Need: high-quality print; embedded fonts
**5. Exports of data / charts**
- Generated on user action ("Download as PDF")
- Snapshots of dashboards / reports
- Variable size
- Need: matches in-app rendering; not laggy
**6. Compliance / regulatory**
- Generated for audit / legal events
- Strict format requirements
- Sometimes pre-defined templates (tax forms etc.)
- Need: deterministic; archived; tamper-evident
**7. Tickets / passes / labels**
- Event tickets; shipping labels; gift cards
- Often barcode/QR-encoded
- Mobile-first; printer-friendly
- Need: precise sizing; barcode reliability
For my app:
- Which categories
- Volume per category
Output:
1. PDF need inventory
2. Volume estimates
3. Quality requirements
The biggest unforced error: treating all PDF needs as one problem. Invoices need different infra than 50-page reports. Mixing them into one "PDF service" creates a tangled mess. Categorize first.
The Three Approaches
Help me pick an approach.
The three:
**1. HTML → PDF (browser-rendered)**
Approach: write HTML+CSS; render via Chrome (Puppeteer / Playwright / Chromium); save as PDF.
Pros:
- HTML+CSS is what you know; charts/tables/styling are easy
- Pixel-accurate to web rendering
- Same template for web view + PDF
- Page breaks, headers, footers via CSS print rules
Cons:
- Heavy: 200-500MB Chrome install
- Slow: 2-5s per PDF (browser cold-start)
- Memory leaks under load
- Difficult to deploy on serverless (Lambda 10GB cap; cold starts brutal)
Tools:
- **Puppeteer** (Chrome headless) — Node.js
- **Playwright** (multi-browser) — Node.js / Python
- **WeasyPrint** (no browser; HTML/CSS engine) — Python
- **Browserless / ScrapingBee** — managed Chrome-as-a-service
- **wkhtmltopdf** — older; deprecated; avoid
Best for: invoice / report / dashboard-export use cases where HTML-fidelity matters
**2. PDF library (programmatic)**
Approach: build PDF directly in code; place text/images at coordinates.
Pros:
- Fast (10ms-100ms per PDF)
- Lightweight; runs on serverless without issues
- Deterministic output
Cons:
- Coordinate-based layout is painful
- Rich layouts (tables, multi-column, charts) require lots of code
- Different mental model than HTML
Tools:
- **PDFKit** (Node.js) — most popular
- **pdf-lib** (Node.js) — modify existing PDFs
- **ReportLab** (Python) — robust; batteries included
- **PyPDF / pypdf** (Python) — read/modify
- **iText** (Java/.NET) — enterprise-grade; AGPL or commercial
- **gofpdf / unipdf** (Go)
- **Maroto** (Go) — higher-level
Best for: invoices, receipts, simple structured docs, high-volume
**3. PDF-as-a-service (managed)**
Approach: pay an API; send template + data; receive PDF.
Pros:
- Zero infrastructure
- Templates often have UI builder
- Fast; reliable; no scaling concerns
- Often includes signing / delivery features
Cons:
- Per-document cost ($0.01-$0.10 typical)
- Vendor lock-in
- Network latency
- Templates may not match your needs
Tools (see VibeReference: pdf-document-generation-tools):
- **PDFMonkey** — modern API
- **DocRaptor** — older HTML-to-PDF API
- **APITemplate** — template-based
- **Carbone** — fast template engine
- **DocuPilot** — workflow-heavy
- **PDFShift** — HTML-to-PDF API
- **Anvil** — webform → PDF; has e-sign
Best for: low volume; quick start; teams without ops capacity
The decision matrix:
| Use case | Volume | Approach |
|---|---|---|
| Invoice (1-2 page) | 100/mo | PDF library (PDFKit) |
| Invoice | 10K/mo | PDF library or batch service |
| Report (10+ page) | 100/mo | HTML → PDF (Puppeteer/Playwright) |
| Report | 10K/mo | HTML → PDF on dedicated infra OR batch service |
| Custom contract w/ e-sign | any | PDF-as-a-service (Anvil / DocuSign Builder) |
| Dashboard export | any | HTML → PDF |
| High-volume tickets | 100K+/mo | PDF library |
For my app:
- Use cases
- Volume
- Ops capacity
Output:
1. Recommended approach per use case
2. Library / vendor pick
3. Migration path if currently wrong
The 2026 default for most indie SaaS: PDFKit (or ReportLab in Python) for invoices, Playwright for reports, vendor for contracts. Don't over-unify; pick per use case.
HTML → PDF: Playwright Pattern
Help me set up Playwright PDF generation.
The pattern (Node.js / Next.js):
```typescript
import { chromium } from 'playwright';
async function generatePDF(html: string): Promise<Buffer> {
const browser = await chromium.launch({ headless: true });
try {
const page = await browser.newPage();
await page.setContent(html, { waitUntil: 'networkidle' });
const pdf = await page.pdf({
format: 'A4',
printBackground: true,
margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' },
displayHeaderFooter: true,
headerTemplate: '<div style="font-size: 10px; padding: 0 20mm;">Acme Corp</div>',
footerTemplate: `<div style="font-size: 10px; padding: 0 20mm;">
Page <span class="pageNumber"></span> of <span class="totalPages"></span>
</div>`,
});
return pdf;
} finally {
await browser.close();
}
}
The HTML template:
<!DOCTYPE html>
<html>
<head>
<style>
@page { size: A4; margin: 20mm; }
body { font-family: -apple-system, sans-serif; font-size: 12pt; }
.invoice-header { display: flex; justify-content: space-between; }
.line-item { display: grid; grid-template-columns: 1fr 1fr 1fr 1fr; }
.total { font-weight: bold; border-top: 2px solid black; }
.page-break { page-break-after: always; }
@media print {
.no-print { display: none; }
}
</style>
</head>
<body>
<div class="invoice-header">
<h1>Invoice #{{invoice_number}}</h1>
<p>{{date}}</p>
</div>
<table>
<thead><tr><th>Item</th><th>Qty</th><th>Price</th><th>Total</th></tr></thead>
<tbody>
{{#each items}}
<tr><td>{{name}}</td><td>{{qty}}</td><td>{{price}}</td><td>{{total}}</td></tr>
{{/each}}
</tbody>
</table>
<div class="total">Total: {{grand_total}}</div>
</body>
</html>
Deployment considerations:
Local dev: works great.
Vercel / Netlify (serverless): Playwright too heavy for default function size. Solutions:
- Vercel: use @sparticuz/chromium-min — slimmed Chromium for Lambda-class environments
- Browserless.io — managed Playwright/Puppeteer; $50-200/mo
- AWS Lambda layer with Chromium binary
- Dedicated worker service (Render / Fly / Railway) with full Playwright
Memory: 512MB-1GB per Chrome instance. Plan capacity.
Handling concurrency:
Don't launch a new browser per request. Pool browsers:
const browserPool = new BrowserPool({ min: 1, max: 5 });
async function generate(html) {
const browser = await browserPool.acquire();
try {
// ... generate ...
} finally {
browserPool.release(browser);
}
}
Async generation pattern:
For reports > 5s to generate:
- User clicks "Download report"
- Backend creates job → returns job_id
- Job runs async (Vercel Workflow / BullMQ / Inngest / SQS)
- On complete: store in S3 / Vercel Blob; email link OR signal frontend via WebSocket / SSE
- Frontend polls job status OR receives push
Don't make user wait synchronously past 10s.
For my stack:
- Hosting
- Concurrency expected
- Async infrastructure
Output:
- Setup
- Template
- Deploy plan
- Async architecture
The mistake most teams make: **synchronous PDF generation of 30-page reports**. Browser takes 10-20s; user's connection times out; everyone's confused. Fire-and-forget; email link.
## PDF Library: PDFKit Pattern
Help me set up PDFKit for invoices.
The pattern (Node.js):
import PDFDocument from 'pdfkit';
function generateInvoice(invoice: Invoice): Buffer {
return new Promise((resolve, reject) => {
const doc = new PDFDocument({ size: 'A4', margin: 50 });
const buffers: Buffer[] = [];
doc.on('data', (b) => buffers.push(b));
doc.on('end', () => resolve(Buffer.concat(buffers)));
doc.on('error', reject);
// Header
doc.fontSize(20).text('Invoice', 50, 50);
doc.fontSize(10).text(`#${invoice.number}`, 400, 50, { align: 'right' });
doc.text(invoice.date.toLocaleDateString(), 400, 65, { align: 'right' });
// Customer block
doc.moveDown(2);
doc.text(`Bill to:`, 50, doc.y);
doc.text(invoice.customer.name, 50, doc.y + 15);
doc.text(invoice.customer.address, 50, doc.y + 15);
// Line items table
const tableTop = doc.y + 40;
doc.text('Item', 50, tableTop);
doc.text('Qty', 300, tableTop);
doc.text('Price', 350, tableTop);
doc.text('Total', 450, tableTop);
let y = tableTop + 20;
for (const item of invoice.items) {
doc.text(item.name, 50, y);
doc.text(item.qty.toString(), 300, y);
doc.text(`$${item.price}`, 350, y);
doc.text(`$${item.qty * item.price}`, 450, y);
y += 20;
}
// Total
doc.moveTo(50, y).lineTo(545, y).stroke();
doc.fontSize(12).text(`Total: $${invoice.total}`, 350, y + 10);
doc.end();
});
}
The trade-offs:
- 10x faster than Playwright (no browser startup)
- Bundle size: ~500KB vs Playwright's 200MB+
- Layout is coordinate-based — pain for complex docs
- Tables / multi-page / page breaks need manual handling
Page breaks:
if (doc.y > 700) {
doc.addPage();
}
PDFKit has bufferPages: true option for going back and adding to earlier pages (e.g. "Page X of Y" in footer; needs total page count which you only know at end).
Charts / images:
Embed:
doc.image('logo.png', 50, 50, { width: 100 });
For dynamic charts: render to PNG (Chart.js with node-canvas; or SVG → PNG) then embed.
Fonts:
Embed custom fonts:
doc.font('fonts/Inter-Regular.ttf');
Multi-page footer pattern:
doc.bufferedPageRange(); // {start: 0, count: N}
const range = doc.bufferedPageRange();
for (let i = range.start; i < range.start + range.count; i++) {
doc.switchToPage(i);
doc.fontSize(8).text(
`Page ${i + 1} of ${range.count}`,
50, doc.page.height - 30,
{ width: 495, align: 'center' }
);
}
For my use case: [invoice / receipt / etc.]
Output:
- PDFKit setup
- Layout code
- Multi-page handling
- Test approach
The performance reality: **PDFKit handles 100+ PDFs/second** on modest hardware. For invoice/receipt volume, this is more than enough. Reach for it before reaching for browser-render.
## Templating: Don't Hard-Code Layout
Help me set up templates.
The principle: separate template from data so non-engineers can edit.
The approaches:
1. Handlebars / Mustache for HTML
Template (handlebars-syntax):
<h1>Invoice {{number}}</h1>
<p>Bill to: {{customer.name}}</p>
{{#each items}}
<tr><td>{{name}}</td><td>{{qty}}</td></tr>
{{/each}}
Compile + render:
import Handlebars from 'handlebars';
const template = Handlebars.compile(templateString);
const html = template({ number: 'INV-001', customer: {...}, items: [...] });
const pdf = await playwrightHTMLToPDF(html);
2. React / JSX for HTML
Use React Server Components or react-dom/server to render:
function InvoiceTemplate({ invoice }: { invoice: Invoice }) {
return (
<html>
<body>
<h1>Invoice {invoice.number}</h1>
{invoice.items.map(item => <Row key={item.id} item={item} />)}
</body>
</html>
);
}
const html = renderToStaticMarkup(<InvoiceTemplate invoice={data} />);
3. Markdown → HTML → PDF
For reports authored by non-engineers:
- Author writes Markdown
- Engine: marked / unified
- Then HTML → PDF
4. Database-stored templates
Store template in DB as Handlebars / MJML / Markdown:
- Customer-account-specific templates
- Versioning per template
- Admin UI to edit
5. Vendor template UI
Most PDF-as-a-service vendors have visual builders. Pro: non-technical edits. Con: lock-in to vendor.
Versioning:
When template changes, old invoices regenerated should match the version they were generated under. Pattern:
CREATE TABLE invoice_templates (
id UUID PRIMARY KEY,
version INT NOT NULL,
body TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE invoices (
id UUID PRIMARY KEY,
template_id UUID REFERENCES invoice_templates,
template_version INT NOT NULL,
generated_pdf_url TEXT,
...
);
When regenerating invoice #X, use template_version pinned to it. New invoices use latest template.
For my app:
- Template content authority (eng / non-eng?)
- Versioning needs
Output:
- Template engine pick
- Versioning schema
- Editing workflow
The discipline: **never hard-code layout in business logic**. Even if template-engine is overkill today, having a separation makes "let's rebrand the invoice" a 5-min change instead of a 5-day deploy.
## Storage and Delivery
Help me handle storage and delivery.
The flow:
Generate:
- PDF blob in memory
Store:
- Upload to S3 / Vercel Blob / Cloudflare R2
- Naming:
invoices/{invoice_id}.pdfortenant_id/... - Set permissions: private (require signed URL to download)
Deliver:
Option A: Direct download
- Generate signed URL (presigned S3 / Vercel Blob with expiry)
- Return to user; user downloads
- Cache headers: short-lived
Option B: Email attachment
- Generate; attach to email; send
- Watch attachment size limits (most providers: 25MB)
- For larger: send link to download
Option C: In-app delivery
- Show in customer's "Documents" tab
- Lazy-load; signed URL on click
Schema:
CREATE TABLE generated_documents (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users,
type VARCHAR(50) NOT NULL, -- 'invoice', 'report', 'contract'
related_id UUID, -- e.g. invoice_id this PDF is for
storage_url TEXT, -- s3://... or blob://...
size_bytes BIGINT,
generated_at TIMESTAMPTZ DEFAULT NOW(),
expires_at TIMESTAMPTZ -- for ephemeral docs
);
Retention:
- Invoices: 7+ years (regulatory in most jurisdictions)
- Reports: per business need; often 3 years
- Contracts: indefinite or per legal team
- Ephemeral exports: 30-90 days
Tamper-evidence (compliance):
For invoices/contracts, hash and log:
const hash = sha256(pdfBuffer);
await auditLog.insert({ doc_id, hash, generated_at });
If document re-fetched and hash differs, evidence of tampering.
CDN:
Don't serve PDFs from your origin server. Use CDN (S3 + CloudFront / Cloudflare R2 + Cloudflare CDN / Vercel Blob auto-CDN).
For my stack: [storage / CDN]
Output:
- Storage setup
- Delivery pattern
- Retention policy
- Tamper-evidence (if regulated)
The tamper-evidence detail most teams skip: **hashing + logging at generation**. For invoices and contracts in regulated jurisdictions (EU e-invoicing, US tax docs), this isn't optional. Cheap to add at generation; expensive to retrofit.
## Common PDF Pitfalls
Help me avoid PDF pitfalls.
The 10 mistakes:
1. Shipping Puppeteer in serverless without size optimization Lambda 10GB limit; default Puppeteer ~300MB; cold starts brutal. Fix: @sparticuz/chromium-min OR move to dedicated worker.
2. Synchronous generation of 50-page reports User browser times out at 60s on Vercel; 30s on Cloudflare. Fix: async job; email link.
3. Memory leaks from unclosed browser instances Each Playwright launch leaks if not closed. Server OOMs. Fix: try/finally with browser.close(); pool browsers.
4. Hardcoded layout in business logic "Update invoice template" = code deploy. Fix: template engine (Handlebars / React / Markdown).
5. No versioning of templates Re-generate invoice from 2024 with 2026 template = visually different. Fix: pin template version per invoice.
6. Embedding fonts via web URLs Browser without internet = missing fonts. Fix: bundle fonts locally; embed in PDF.
7. Unicode / emoji failures Default fonts don't support all Unicode; emoji renders as boxes. Fix: use a Unicode-coverage font (e.g. Noto Sans).
8. Page break in middle of table row
Default browser breaks rows.
Fix: CSS page-break-inside: avoid on rows.
9. Forgotten print background
Backgrounds stripped by default in browser-print mode.
Fix: printBackground: true in Playwright page.pdf() options.
10. PDF size bloat Embedding 5000×3000 PNG when 800×600 would do. Fix: resize images to actual rendered size before embed.
For my app: [PDF use cases]
Output:
- Top 3 risks
- Mitigations
- Tests to add
The single most-painful production bug: **memory leak from un-closed Puppeteer instances under load**. Server runs fine; gradually accumulates browsers; OOMs in week 3. Always close in `finally`; consider browser pooling for >1 PDF/sec.
## Testing PDF Generation
Help me test PDF generation.
The test categories:
1. Snapshot tests
- Generate PDF for known fixture
- Hash; compare to stored hash
- Detects unintended changes
2. Visual regression tests
- Generate PDF; convert to image (pdf2image)
- Compare to reference image (pixelmatch)
- Catches layout drift
3. Content tests
- Extract text from PDF (pdf-parse)
- Assert specific content present
- Useful for: "invoice contains line items"
4. Page count tests
- Generate; assert page count for known input
- Catches table-overflow regressions
5. Edge cases
- Long customer name (overflow)
- 0 line items (empty state)
- 100 line items (page break)
- Unicode customer name (font handling)
- Very large prices (number formatting)
6. Performance tests
- Time to generate (target: <2s for invoice; <30s for report)
- Memory usage at concurrency
7. Async pipeline tests
- Job created → job runs → file in S3 → email sent
- End-to-end test
For my pipeline: [tests today]
Output:
- Test list
- Tools (pdf-parse, pdf2image, vitest, etc.)
- CI integration
The single most-useful test: **content extraction + assertion**. "Generated invoice contains the invoice number" is dead-simple but catches 80% of regressions. Don't skip it because it's not visual.
## What Done Looks Like
A working PDF generation setup:
- Approach picked per use case (PDFKit for invoices, Playwright for reports, vendor for contracts)
- Templates separated from business logic; versioned
- Async pipeline for >5s generation
- Storage in S3 / Blob with signed-URL delivery
- Memory-leak-safe (browser close + pooling)
- Tests cover happy path + edge cases (overflow, empty, unicode)
- Tamper-evidence for regulated docs (hash + audit log)
- Retention policy enforced (auto-delete ephemeral; preserve regulatory)
- Performance acceptable: invoice in <500ms; report in <30s
- Monitoring: success rate, avg time, P95 time, queue depth
The proof you got it right: a customer says "I need my Q3 invoices for accounting"; you generate 90 days of invoices in 30s; they email; the customer's accountant doesn't ask follow-up questions about formatting.
## See Also
- [Email Template Implementation](email-template-implementation-chat.md) — companion content-rendering concern
- [File Uploads](file-uploads-chat.md) — companion to PDF storage
- [Cron & Scheduled Tasks](cron-scheduled-tasks-chat.md) — scheduled report generation
- [Account Deletion & Data Export](account-deletion-data-export-chat.md) — data exports often want PDF format
- [Background Jobs Providers](https://vibereference.dev/backend-and-data/background-jobs-providers) — async generation infrastructure
- [PDF Document Generation Tools](https://vibereference.dev/backend-and-data/pdf-document-generation-tools) — vendor landscape
- [File Storage Providers](https://vibereference.dev/cloud-and-hosting/file-storage-providers) — S3 / Vercel Blob / R2
- [Audit Logs](audit-logs-chat.md) — tamper-evidence on regulated documents
- [Email Deliverability](email-deliverability-chat.md) — emailing invoices via Resend / SES / Postmark
- [Tax & VAT Handling](tax-vat-handling-chat.md) — invoice content compliance