Markdown Rendering & HTML Sanitization
If you're displaying user-authored content in your B2B SaaS — comments, notes, descriptions, AI chat responses, knowledge-base articles, customer support replies — you need to render markdown / rich text safely. The naive approach: dangerouslySetInnerHTML={{ __html: userContent }} and pray. The structured approach: pick a markdown renderer (react-markdown / remark / marked), sanitize the output (DOMPurify / sanitize-html), allowlist specific HTML elements + attributes, prevent XSS, render code blocks with syntax highlighting, and handle edge cases (tables, embedded images, links). Get this wrong and you ship XSS vulnerabilities.
1. Decide rendering pipeline
Pick a markdown rendering pipeline.
React stack:
react-markdown (recommended default):
- React-native renderer
- Plugin ecosystem (remark + rehype)
- Sanitization built-in
- Most-used in 2026 React projects
marked:
- Generic JS markdown parser → HTML string
- Pair with DOMPurify for sanitization
- Used in non-React contexts
markdown-it:
- Configurable parser
- Plugin ecosystem
- Pair with DOMPurify
mdx (markdown + JSX):
- For static content (docs, blogs)
- Build-time processing
- Don't use for user input (user can't write JSX safely)
remark / rehype (plumbing):
- Lower-level toolkit
- Plugins for syntax highlighting, math, footnotes, GFM tables, etc.
- Used by react-markdown and many docs frameworks
Pipeline:
- Source: markdown string
- Parse: remark (markdown → mdast)
- Transform: remark plugins (GFM, math, etc.)
- Convert: remark-rehype (mdast → hast)
- Transform HTML: rehype plugins (sanitize, syntax highlight, etc.)
- Output: React tree (react-markdown) OR HTML string (rehype-stringify)
Recommendation for 2026 React:
- User-generated content → react-markdown + remark-gfm + rehype-sanitize
- Static docs → MDX + Next.js / Astro
- Server-side rendering → unified pipeline server-side, hydrate client-side
Output:
1. Recommended pipeline for [USE CASE]
2. Plugin choices
3. Bundle-size estimate
4. Security posture (sanitize before render)
5. SSR compatibility
The 2026 default for B2B SaaS displaying user content: react-markdown + remark-gfm + rehype-sanitize. Three packages; ~50KB; covers 95% of needs.
2. Sanitize HTML — defense against XSS
The single most-important rule: never trust user-submitted HTML.
Sanitize HTML output.
Threat model:
- User submits markdown that converts to HTML
- HTML may contain <script>, <iframe>, on* attributes (onerror, onclick), javascript: URLs
- Without sanitization: XSS — attacker JS runs in victim's browser
Sanitization libraries:
DOMPurify (recommended):
- Mature; maintained; broadly used
- Works in browser + Node (with jsdom)
- Allowlist-based by default
- ~50KB
sanitize-html (Node):
- Server-side sanitization
- Deep configuration
- ~30KB
rehype-sanitize:
- Within the unified pipeline
- Schema-based allowlist
- Integrated with react-markdown
Server-vs-client:
- Best practice: sanitize on both sides (defense in depth)
- Server: sanitize before storing OR before sending
- Client: sanitize before rendering (catches anything that slipped through)
Allowlist approach:
- DEFAULT-DENY (safer)
- Explicitly allow: safe elements (<p>, <strong>, <em>, <ul>, <ol>, <li>, <h1-6>, <a>, <code>, <pre>, <blockquote>, <table>, <img> w/ restricted src)
- Explicitly allow: safe attributes (class, id, href, src, alt, title)
- Block: <script>, <iframe>, <object>, <embed>, <style>
- Block: on* event handlers (onclick, onerror, etc.)
- Block: javascript: URLs in href / src
URL filtering:
- Allow: http://, https://, mailto:, relative paths
- Block: javascript:, data: (most), vbscript:
Edge cases:
- SVG: allow but sanitize (SVG can contain script)
- Embedded images (data: URLs): block by default; allow if intentional
- HTML in markdown: pass-through after sanitization
Output:
1. Sanitization library + config
2. Allowlist (elements + attributes + URLs)
3. Server + client sanitization strategy
4. Test cases (XSS payloads to verify)
5. Audit cadence
The defense-in-depth rule: sanitize server-side before storing, sanitize client-side before rendering. If one layer fails, the other catches it.
3. Allowlist tables, code blocks, embeds — common requests
Users want more than basic markdown. Plan extensions.
Configure markdown extensions.
GitHub Flavored Markdown (GFM) extras:
- Tables (| col | col |)
- Strikethrough (~~text~~)
- Task lists (- [ ] item)
- Autolinks (http://...)
- Footnotes (with plugin)
Plugin: remark-gfm (one line; covers all)
Code blocks with syntax highlighting:
Options:
- Prism.js (popular; ~6KB + per-language)
- highlight.js (popular; ~30KB minified)
- Shiki (used by VS Code; high quality; larger bundle but server-renderable)
Recommendation:
- Server-side highlight at build time (Shiki) → no client-side cost
- Or client-side with Prism (smaller bundle)
- Lazy-load language packs (don't bundle all)
Math expressions:
- KaTeX (fast; client-side render)
- MathJax (more features; larger)
- Plugin: remark-math + rehype-katex
Diagrams:
- Mermaid (flowcharts, sequence, gantt)
- Plugin: remark-mermaid
Embed support:
- YouTube / Vimeo / Twitter / Loom embeds
- Server-side: parse URLs; render as <iframe> with allowlist + sandbox
- Beware: iframes are XSS surface; only embed allowlisted domains
Custom components (when MDX or react-markdown):
- Override h1, h2, code, etc. with custom React components
- Insert <Callout>, <Tabs>, <CodeBlock> with own logic
Anti-patterns:
- Allow arbitrary HTML in markdown (defeats sanitization)
- Allow user-supplied iframe src (open redirect / XSS)
- Bundle all syntax-highlight languages (massive)
Output:
1. Extensions to enable for [USE CASE]
2. Plugin chain (remark-gfm + remark-math + remark-mermaid as needed)
3. Syntax-highlight choice + bundle strategy
4. Embed allowlist (which domains are OK)
5. Custom components for branded touch
The server-side syntax highlighting trend: Shiki at build time → output styled HTML → no client-side JS for highlighting. Big bundle savings; trades for build time.
4. Render markdown safely with react-markdown
Implement safe markdown rendering with react-markdown.
Install:
- npm install react-markdown remark-gfm rehype-sanitize rehype-highlight
Basic usage:
import ReactMarkdown from 'react-markdown';
import remarkGfm from 'remark-gfm';
import rehypeSanitize from 'rehype-sanitize';
import rehypeHighlight from 'rehype-highlight';
<ReactMarkdown
remarkPlugins={[remarkGfm]}
rehypePlugins={[
rehypeSanitize, // SECURITY: sanitize first
rehypeHighlight, // syntax highlight after
]}
>
{userContent}
</ReactMarkdown>
Custom components (override defaults):
<ReactMarkdown
components={{
a: ({ node, href, children, ...props }) => (
<a
href={href}
target="_blank"
rel="noopener noreferrer"
{...props}
>
{children}
</a>
),
code: ({ inline, children, className }) => (
inline ? <code>{children}</code> : <CodeBlock className={className}>{children}</CodeBlock>
),
img: ({ src, alt }) => <Image src={src} alt={alt} loading="lazy" />,
}}
>
{userContent}
</ReactMarkdown>
Custom sanitize schema (rehype-sanitize):
import { defaultSchema } from 'hast-util-sanitize';
const schema = {
...defaultSchema,
attributes: {
...defaultSchema.attributes,
code: [...(defaultSchema.attributes.code || []), 'className'], // for syntax highlighting
},
};
<ReactMarkdown rehypePlugins={[[rehypeSanitize, schema]]}>
{userContent}
</ReactMarkdown>
Output:
1. Install commands
2. Basic safe-render component
3. Custom components for links / code / images
4. Sanitize schema customization
5. Test XSS payloads
The link-target gotcha: by default tags don't have target="_blank" + rel="noopener noreferrer". User-content links should open in new tab + be opener-isolated for security.
5. Performance — render large markdown efficiently
Large markdown documents (knowledge base articles, docs, AI chat outputs) can slow render.
Optimize markdown rendering performance.
Bundle size:
- react-markdown: ~25KB
- remark-gfm: ~10KB
- rehype-sanitize: ~8KB
- rehype-highlight: ~30KB + language packs
- Total: ~75KB+ for full pipeline
Lazy load:
- Don't bundle all languages for syntax highlighting (highlight.js core ~30KB; with all langs ~600KB)
- Load specific languages on demand
- Or use Shiki server-side at build time
Rendering large docs:
- 10K+ word docs slow react-markdown
- Solutions:
- Virtualize sections (react-virtuoso)
- Server-render and hydrate (Next.js / RSC)
- Cache rendered HTML (key by content hash)
AI chat streaming:
- LLM tokens stream in
- Partial markdown rendering
- Re-render on each chunk (or debounce)
- Performance: memo on stable parts; only re-render last paragraph
Memoization:
- React.memo on ReactMarkdown wrapper
- useMemo on plugin array (otherwise plugin instances change every render)
- Cache key: content + plugin config
For [USE CASE]:
1. Bundle budget
2. Lazy-load strategy
3. SSR vs CSR
4. Memoization plan
5. Streaming-render strategy (for AI chat)
The plugin-instance memoization gotcha: <ReactMarkdown remarkPlugins={[remarkGfm]}> creates a new array every render → reconciliation re-mounts. Memoize the array or define outside component.
6. Streaming markdown — AI chat use case
LLM responses arrive as streaming tokens. Render progressively without flickering.
Render streaming markdown.
Pattern:
- LLM streams tokens (SSE / WebSocket)
- Buffer accumulates: "The answer..." then "The answer is..." then "The answer is 42."
- Re-render markdown on each chunk
Challenges:
- Incomplete markdown ("**bold without close" mid-stream)
- Code blocks open at start, close later
- Lists / tables span multiple lines
Solutions:
Auto-close incomplete syntax (recommended):
- Detect open ** or open ``` mid-stream
- Append closing tokens before parsing
- Library: streaming-markdown or DIY
Buffer until stable boundaries:
- Wait for newline before parsing previous line
- Adds latency; reduces flicker
Render-on-debounce:
- Debounce 50-100ms
- Reduce re-render churn
Visual cues:
- Cursor at end of stream (▋)
- Disabled state during stream
- "Thinking..." placeholder
Anti-patterns:
- Flash of unparsed markdown (raw asterisks visible)
- Layout jump as content streams in
- Re-rendering entire doc on each token (slow)
For AI chat:
- Use streaming-markdown library
- Memoize message components (only re-render the actively-streaming message)
- Show cursor during stream
- Smooth scroll to bottom
Output:
1. Streaming-markdown library or DIY
2. Auto-close strategy
3. Debounce config
4. Visual cursor / state
5. Test cases (interrupted streams, network errors)
The 2026 standard for AI chat: streaming-markdown library + memoized message list + cursor indicator. Used by ChatGPT / Claude / Perplexity UIs.
7. Edit + preview — split or toggle
For markdown input (not just rendering), users want to see preview.
Implement markdown edit + preview.
Patterns:
Pattern A: Side-by-side (split)
- Edit left; preview right
- Real-time update
- Used by: GitHub, GitLab, many docs sites
- Best for: power users; wide screens
Pattern B: Tab toggle
- "Write" / "Preview" tabs
- One at a time
- Used by: Reddit, Stack Overflow, mobile-first
- Best for: narrow screens; non-technical users
Pattern C: Inline rendering (WYSIWYG-feel)
- As-you-type styling without separate preview
- Used by: Notion, Linear (via TipTap / Lexical)
- Best for: end-users; highest polish
- Note: this is rich-text editor territory, not pure markdown
Implementation:
Pattern A (split):
- Two columns; sync scroll position
- Debounced render (50-100ms)
- Memoize render component
Pattern B (tab):
- Tab state in component
- Cache rendered HTML to avoid re-render on tab switch
Pattern C (inline):
- Use TipTap or Lexical (rich-text editors)
- See rich-text-editor-implementation-chat for details
For [USE CASE]:
- Power user / docs-heavy → split
- General user / mobile → tab toggle
- Non-technical end user → inline (rich-text editor)
Output:
1. Pattern recommendation
2. Component implementation
3. Performance considerations
4. Mobile fallback
5. Keyboard shortcuts (Cmd+Enter to submit; Tab for indent)
The "split panes look professional" trap: split panes are great for engineers. For non-technical users, side-by-side is intimidating. Tab toggle or inline rendering wins for mass adoption.
8. Markdown for AI chat — safe LLM output rendering
LLMs sometimes output unsafe markdown. Sanitize.
Render LLM markdown output safely.
Threats:
- LLM hallucinates malicious links (rare but possible)
- LLM outputs HTML that bypasses markdown
- User prompts LLM to output XSS payload as test
Defenses:
- Same sanitization as user content (DOMPurify / rehype-sanitize)
- Allowlist: standard markdown elements; no script / iframe
- Treat LLM output as untrusted input
- Block javascript: URLs
Trust levels:
- High-trust internal LLM (your own model): can be slightly looser
- Public-LLM-via-API (OpenAI, Anthropic): treat as untrusted
- User-provided LLM output: definitely untrusted
LLM-specific extensions:
- Tool calls / structured output: render as cards (not markdown)
- Citations / sources: render as links with allowlist
- Images: only allow if from your generation pipeline (DALL-E / Midjourney URLs allowed)
Performance:
- Render-on-stream (see above)
- Cache rendered output (by message id)
Output:
1. Sanitization same as user content
2. LLM-specific allowlist (citations, images)
3. Structured output rendering (tool calls)
4. Streaming integration
5. Test cases (prompt injection → markdown output)
The prompt-injection-via-markdown attack: user prompts LLM to "output the following HTML." LLM dutifully outputs <script>alert(1)</script>. Sanitize.
9. Storage format — markdown vs HTML vs AST
Where you store affects what flexibility you have.
Decide storage format for markdown content.
Option 1: Store markdown source
- Pros: human-readable; small; portable; can re-render with different config
- Cons: render at every read (CPU); inconsistency if config changes
- Best for: most B2B SaaS
Option 2: Store rendered HTML
- Pros: faster reads (no parse)
- Cons: stale if rules change; larger storage; harder to edit
- Best for: archival; static publishing
Option 3: Store both (markdown + cached HTML)
- Pros: fast reads + flexible re-render
- Cons: more storage; need invalidation
- Best for: high-traffic content
Option 4: Store AST (mdast / hast JSON)
- Pros: programmatic transformation; easy traversal
- Cons: complex; not portable
- Best for: editors building custom transformations
Recommendation:
- B2B SaaS user content: store markdown source; render on read with cache
- High-traffic blog / docs: store HTML; rebuild on config change
- Rich-text editors (TipTap / Lexical): store editor JSON (AST-like)
Cache strategies:
- Redis / KV cache: key by content hash
- TanStack Query: client-side cache
- Server-render once + hydrate
Output:
1. Recommendation for [USE CASE]
2. Schema (column types, sizes)
3. Caching strategy
4. Invalidation rules
5. Migration path if format changes
The simplest pattern: store markdown; render on read; cache for hot paths. Don't over-engineer.
10. Test XSS — verify the pipeline
Test markdown rendering for XSS.
Test payloads (must NOT execute):
Basic:
- <script>alert(1)</script>
- <img src=x onerror=alert(1)>
- <svg onload=alert(1)>
- <iframe src=javascript:alert(1)>
Markdown-encoded:
- [click me](javascript:alert(1))
- )
- [link](data:text/html,<script>alert(1)</script>)
HTML in markdown:
- <details><summary>open</summary><script>alert(1)</script></details>
- <a href="javascript:alert(1)">click</a>
Mutation XSS:
- <noscript><p title="</noscript><img src=x onerror=alert(1)>">
- Polyglot payloads (XSS that survives multiple parsers)
Test approach:
- Unit tests: feed payload → assert sanitized output (no alert)
- Storybook: visual regression
- Manual: paste in dev environment; check console
- Bug bounty: incentive external testing
Tools:
- DOMPurify test suite (mature; battle-tested)
- OWASP XSS cheat sheet (payload library)
- jest + @testing-library
Audit:
- Manual review every config change
- Automated tests in CI
- Quarterly security review
Output:
1. Test payload library (10-30 cases minimum)
2. Unit tests
3. Visual regression
4. Pen-test process (annually)
5. Bug bounty / report channel
The only-test-the-happy-path failure: tests that pass simple markdown but never test XSS. Always include malicious-input tests.
What Done Looks Like
A v1 markdown rendering system for B2B SaaS in 2026:
- react-markdown + remark-gfm + rehype-sanitize pipeline
- Sanitization on server (before storing) AND client (before rendering)
- Allowlist: elements + attributes + URLs
- Custom link rendering (target="_blank" rel="noopener noreferrer")
- Code blocks with syntax highlighting (Prism / Shiki)
- GFM tables + task lists + autolinks
- XSS test suite (10+ payloads)
- Bundle size budget (<100KB gzipped)
- Edit + preview UI (split / tab / inline based on user)
- Streaming-markdown for AI chat (if applicable)
- Storage as markdown source; render-on-read
Add later when product is mature:
- MDX for docs / static content
- Math expressions (KaTeX)
- Mermaid diagrams
- Embed support (YouTube, Twitter)
- Custom callouts / components
- Localization (RTL languages)
The mistake to avoid: using dangerouslySetInnerHTML without sanitization. Direct path to XSS.
The second mistake: trusting LLM output. Sanitize same as user input.
The third mistake: bundling all syntax-highlight languages. Lazy-load or server-render with Shiki.
See Also
- Rich-Text Editor Implementation — for input (companion guide)
- Comments, Threading & @Mentions — comments use markdown render
- AI Features Implementation — AI chat output rendering
- Email Template Implementation — email markdown / HTML
- Content Moderation Pipeline — moderate markdown content
- Schema Validation Zod — input validation (paired with sanitization)
- Performance Optimization — bundle size + lazy load
- Internationalization — localized content
- Captcha & Bot Protection — adjacent abuse defense
- VibeReference: Markdown — markdown overview
- VibeReference: Components — UI primitives
- VibeReference: AI SDK Core — AI SDK for streaming
- LaunchWeek: Documentation Strategy — docs site markdown
- LaunchWeek: Blog Posts with AI — blog markdown content