Graceful Shutdown & Zero-Downtime Deployments: Don't Drop Requests Mid-Deploy

⬅️ Day 6: Grow Overview

If you ship code multiple times a week in 2026, every deploy is a potential customer-facing outage if you handle shutdown wrong. The naive flow: kill the old process; start the new one. The fallout: in-flight requests fail; long-running jobs die mid-execution; database connections leak; webhook deliveries drop; users see 502s. Most indie SaaS gets a free pass on this because their hosting platform (Vercel / Render / Railway / Fly) handles graceful shutdown automatically. Until you outgrow into self-managed Kubernetes / ECS / VM-based deploys, where shutdown is your problem. Even on managed platforms, long-running jobs and database transactions need explicit handling.

A working shutdown strategy answers: how does the process learn it's about to die (SIGTERM / preStop hooks), what should it do (stop accepting new work; drain in-flight; close connections cleanly), how long do you have (typically 30s; tunable), how do you handle long-running jobs (don't run them in your web process), how do you signal load balancer (health-check 503 before SIGTERM), and how do you test (chaos / kill-9 in staging).

This guide is the implementation playbook for graceful shutdown + zero-downtime deploys. Companion to Incident Response, Multi-region Deployment, Backups & Disaster Recovery, Service Level Agreements, and HTTP Retry & Backoff.

Why Graceful Shutdown Matters

Get the failure modes clear first.

Help me understand shutdown failures.

The 8 failure modes:

**1. In-flight HTTP requests dropped**
User clicks; request lands; deploy starts; process killed; user sees 502.
Direct impact: customer-facing errors during every deploy.

**2. Long-running jobs killed mid-execution**
Background job processing 10K rows; deploy at row 5K; job dies.
Resume needs idempotency or work redone.

**3. Database transactions left open**
Process killed mid-transaction; locks persist on server until timeout.
Brief degradation; queries blocked.

**4. WebSocket / SSE connections dropped without notice**
Client expects "connection closing" message; gets RST.
Reconnect logic might thrash.

**5. Webhook deliveries lost**
Outbound webhook in-flight; process killed; webhook fails to deliver.
Downstream miss; manual recovery.

**6. Cache writes truncated**
Redis pipeline mid-flush; killed; partial writes.
Corrupted cache state.

**7. File uploads interrupted**
Multipart upload streaming; killed; corrupted partial in storage.

**8. Logs / metrics buffered but not flushed**
Last 30s of activity invisible; debugging gaps.

The root cause for most: **process gets SIGKILL without SIGTERM grace period**.

For my system:
- Deploy frequency
- Long-running operations
- Customer-facing failure mode

Output:
1. Top failure mode
2. Risk profile
3. Mitigation priority

The biggest unforced error: assuming "rolling deploy" = "zero downtime". Rolling deploys cycle pods one at a time, but each pod transition needs graceful shutdown to avoid dropped requests. Without graceful shutdown, you just drop fewer requests per deploy.

The Shutdown Lifecycle

Help me understand the shutdown sequence.

The 5 phases (Kubernetes example; same concepts everywhere):

**Phase 1: Pod marked for termination**

K8s API: pod scheduled for delete.
Endpoints controller: removes pod from service load-balancer.
But: this takes ~1-2s to propagate.

**Phase 2: PreStop hook runs (configurable; max ~30s)**

K8s sends preStop command to container.
Common preStop: `sleep 5` to wait for endpoint removal to propagate.
Container can use this to start shutdown coordination.

**Phase 3: SIGTERM sent**

K8s sends SIGTERM to PID 1.
This is your "start shutting down" signal.

**Phase 4: terminationGracePeriodSeconds (default 30s)**

Process has this much time to:
- Stop accepting new connections
- Drain in-flight requests
- Close DB / Redis / queue connections
- Flush logs / metrics
- Exit cleanly

**Phase 5: SIGKILL sent**

If process still running, K8s sends SIGKILL.
No more grace; immediate kill.

**The same pattern in other environments**:

- **Docker Compose**: SIGTERM → 10s → SIGKILL
- **systemd**: SIGTERM → TimeoutStopSec → SIGKILL
- **AWS ECS**: stopTimeout configurable
- **Vercel Functions**: lifecycle hooks for graceful shutdown (Fluid Compute supports)
- **Heroku**: SIGTERM → 30s → SIGKILL
- **Render**: SIGTERM → 30s → SIGKILL

For my platform: [pick]

Output:
1. Phase mapping
2. Configuration knobs
3. Defaults vs your needs

The detail most teams miss: endpoint removal is not instant. SIGTERM + immediately-stop-accepting = some requests still get routed your way. Need preStop sleep OR readiness-check change to drain traffic first.

The Universal Shutdown Pattern

Help me write a graceful shutdown handler.

The Node.js / Express pattern:

```typescript
import express from 'express';
import { createServer } from 'http';

const app = express();
app.get('/health', (req, res) => res.send('ok'));
app.get('/ready', (req, res) => 
  isShuttingDown ? res.status(503).send('shutting down') : res.send('ready')
);

const server = createServer(app);
server.listen(3000);

let isShuttingDown = false;

async function gracefulShutdown(signal: string) {
  if (isShuttingDown) return; // Idempotent
  isShuttingDown = true;
  
  console.log(`Received ${signal}, starting graceful shutdown`);
  
  // 1. Stop accepting new connections (server.close stops listening)
  server.close(() => console.log('HTTP server closed'));
  
  // 2. Wait for in-flight requests (server.close waits)
  // Set timeout for stuck requests
  const timeout = setTimeout(() => {
    console.error('Forced shutdown after 25s');
    process.exit(1);
  }, 25_000);
  
  try {
    // 3. Drain background work
    await drainQueues();
    
    // 4. Close DB / Redis / external connections
    await db.disconnect();
    await redis.quit();
    
    // 5. Flush observability
    await flushLogsAndMetrics();
    
    clearTimeout(timeout);
    console.log('Graceful shutdown complete');
    process.exit(0);
  } catch (err) {
    console.error('Shutdown error', err);
    process.exit(1);
  }
}

process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

Python / FastAPI pattern:

import signal
import asyncio
from fastapi import FastAPI

app = FastAPI()
shutting_down = False

@app.on_event("shutdown")
async def shutdown_event():
    global shutting_down
    shutting_down = True
    print("Graceful shutdown starting")
    
    # Drain queues
    await drain_queues()
    
    # Close DB
    await db.disconnect()
    
    print("Shutdown complete")

uvicorn runs FastAPI; respects SIGTERM by default; calls shutdown event.

Go pattern:

ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGTERM, syscall.SIGINT)
defer cancel()

// Run server with shutdown context
go func() {
  if err := server.ListenAndServe(); err != http.ErrServerClosed {
    log.Fatal(err)
  }
}()

<-ctx.Done()
log.Println("Shutting down")

shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 25*time.Second)
defer shutdownCancel()

if err := server.Shutdown(shutdownCtx); err != nil {
  log.Printf("Shutdown error: %v", err)
}

For my stack: [language]

Output:

Shutdown handler
Order of operations
Test cases


The order matters: **stop accepting → drain in-flight → close external → exit**. Reversing causes errors. New requests fail when external is already closed; or process exits while requests in flight.

## The Health Check Dance

Help me coordinate health checks with shutdown.

The two health checks (Kubernetes terminology; concept universal):

Liveness probe ("is process alive?"):

Failed → restart pod
Should: simple check; fast
Don't: include DB checks (DB hiccup ≠ process broken)

Readiness probe ("can it serve traffic?"):

Failed → remove from load balancer; don't kill
Should: check DB / Redis / dependencies if their failure means we can't serve
Use: signal "I'm shutting down; stop sending me traffic"

The shutdown sequence using readiness:

let isReady = true;
let isShuttingDown = false;

app.get('/healthz', (req, res) => res.status(200).send('alive')); // Liveness
app.get('/readyz', (req, res) => 
  isReady && !isShuttingDown ? res.status(200).send('ready') : res.status(503).send('not ready')
);

async function gracefulShutdown() {
  isShuttingDown = true;
  
  // Wait 5-15 seconds for load balancer to notice readiness=false
  // and stop sending traffic
  await sleep(10_000);
  
  // Now drain
  server.close();
  await drainQueues();
  // ...
}

Why the wait:

Load balancer health-checks every N seconds (typically 5-10s). After readiness flips to 503, LB needs 1-2 cycles to remove pod. Sleep 10s before stopping HTTP server gives LB time to drain.

Without wait: requests still routed; you stop accepting; users see 502.

K8s preStop hook (handles this):

spec:
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          command: ["sh", "-c", "sleep 10"]
    terminationGracePeriodSeconds: 60  # 10s sleep + 50s for actual drain

K8s runs preStop; THEN sends SIGTERM. By that time, LB has stopped sending traffic.

For my deployment: [config]

Output:

Liveness vs readiness
PreStop / health-check coordination
Grace-period config


The mistake to fix: **using liveness for "DB is up"**. DB blip → liveness fails → pod restarts → new pod → DB still down → liveness fails → restart loop. Use readiness for dependencies; liveness for "process responsive."

## Long-Running Jobs: Don't Run Them In-Process

Help me think about long-running work.

The trap: long-running operations in your web process.

Examples:

Generate 50-page PDF report (30s)
Process CSV with 100K rows (5 min)
Train ML model (hours)

If user requests these via HTTP:

Web request blocks for 30s+ (timeout risk)
Deploy mid-request kills the work
Hard to retry / resume

The solution: separate web from worker:

Web tier: handles HTTP; never does long work Worker tier: processes background jobs from queue; can take hours

Stack:

Vercel Workflow / Inngest / BullMQ / Sidekiq / Celery — job queues
Redis / Postgres / SQS — queue backend
Separate worker process — runs jobs

Pattern:

User requests "generate report"
Web creates job record; enqueues; returns 200 + job_id
Worker picks up; processes (can take hours)
Web polls job status OR worker pushes via WebSocket / SSE / email

Graceful shutdown for workers:

Workers need their own shutdown handling.

const queue = new BullMQ.Worker('reports', async (job) => {
  await processReport(job.data);
});

let isShuttingDown = false;
process.on('SIGTERM', async () => {
  isShuttingDown = true;
  
  // Stop accepting new jobs
  await queue.pause();
  
  // Wait for current job to finish (or hit grace timeout)
  await queue.close();
  
  process.exit(0);
});

Idempotency required:

If job killed mid-execution, retried later: must be safe to redo.

Pattern:

Job records progress (e.g. "processed rows 1-5K")
On restart, resumes from last checkpoint
Or: job is fully idempotent (running twice = same result)

Visibility timeout:

When worker picks up job, queue marks "in-flight" with timeout. If worker dies, queue re-delivers after timeout to another worker.

Setting: must exceed max realistic job duration. Otherwise, jobs run twice in parallel (double-processing).

For my workloads:

Long-running operations
Current handling
Migration plan

Output:

Web vs worker split
Queue tool pick
Idempotency design


The discipline: **anything that runs > 5 seconds should be a background job**. Web process kept lean = deploys fast + safe. Background workers handle the heavy lifting with their own shutdown story.

## Database Connection Handling

Help me handle DB during shutdown.

The pattern:

import { Pool } from 'pg';

const pool = new Pool({
  max: 20,
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

// During graceful shutdown:
async function closeDb() {
  console.log('Draining DB pool');
  await pool.end(); // Waits for in-flight queries to finish
  console.log('DB pool closed');
}

pool.end() with pg:

Stops accepting new connection requests
Waits for in-flight queries
Closes connections cleanly

Most ORMs / pools have similar .end() / .disconnect() / .close().

Transactions:

If a request is mid-transaction during shutdown:

The request handler should complete naturally (you stopped accepting; in-flight finishes)
Transaction commits or rolls back as code dictates

DON'T abort transactions externally; they finish or rollback per code path.

Connection pool sizing:

Plan pool size against shutdown:

20 connections in pool, 5 in-flight requests
Shutdown drains 5; closes 15 idle = clean
20 in-flight: 30s grace might not finish all; some abort

Math: pool size × max-query-duration ≈ grace period budget.

Read replicas:

If using read-replicas: close those connections too. Easy to forget.

For my stack: [pool config]

Output:

Shutdown integration
Pool sizing
Transaction discipline


The non-obvious bug: **forgetting to close Redis / queue / search / external connections**. App opens Postgres + Redis + Elasticsearch + queue connections. Shutdown closes Postgres only. Other connections linger; resource leak.

## Zero-Downtime Deployment Strategies

Help me pick a deployment strategy.

The 4 main strategies:

1. Rolling Update (default for K8s; most platforms)

Replace pods one at a time:

maxSurge: 25% (extra pods during deploy)
maxUnavailable: 25% (allowed down)
Old pods drained gracefully; new pods start

Pros: simple; no infra duplication; default Cons: brief mixed-version state (some users on new; some on old)

2. Blue-Green

Two full environments:

Blue (current prod) serving traffic
Green (new version) deployed; tested
Switch DNS / load balancer to green
Keep blue running for instant rollback

Pros: instant rollback; clean cutover Cons: 2x infra cost during deploy; mixed-version not allowed (or LB needs sticky sessions)

3. Canary

Roll out to N% of traffic; observe; expand:

5% on new version
Check error rates; latency
25%; 50%; 100% as confidence builds
Or roll back

Pros: catches bugs at low blast radius Cons: longer deploy; more complex routing

4. Feature Flags

Deploy code with feature off; turn on for subset:

Code shipped to 100% of pods
Feature gated by user / tenant / random %
No deploy = no risk for the new feature

Pros: decouple deploy from release; gradual rollout Cons: code complexity (gating); flag debt

The 2026 default for indie SaaS: rolling deploys + feature flags (LaunchDarkly / Statsig / GrowthBook).

Vercel: Rolling Releases (GA June 2025) — built-in canary.

Database migration coordination:

Each deployment may include schema changes. Order:

Backwards-compatible migration first (add column; don't remove)
Deploy code that uses new schema
After full rollout: cleanup migration (drop old columns)

Never: ship code that requires schema change before migration runs. Never: drop columns in same deploy as the code change.

Tools: dbmate, Atlas, sqitch handle this.

For my deploys:

Strategy today
Frequency
Risk tolerance

Output:

Strategy pick
Migration discipline
Feature-flag tool


The combination that wins: **rolling deploys + DB-backwards-compatible migrations + feature flags**. Code rolls out safely; schema changes don't break in-flight; risky features hidden behind flags. Most outage-causing deploy issues prevented.

## Testing Shutdown

Help me test shutdown discipline.

The test types:

1. Local SIGTERM test:

# Start your app
npm start

# In another terminal:
kill -TERM $(pgrep node)

# Watch logs: did it shut down gracefully?

Verify:

"Graceful shutdown starting" log
In-flight request completed
DB connection closed log
Process exited 0

2. Staging chaos test:

Use chaos-monkey-style tools:

kube-monkey — randomly kill pods
chaos-mesh — schedule failure scenarios
gremlin — managed chaos engineering
manual — kubectl delete pod during traffic

Verify:

Customer-facing requests succeed during pod terminations
Error rate doesn't spike
Traces show clean handoffs

3. Long-running job test:

Start a 5-min background job; deploy mid-job.

Verify:

Job pauses gracefully OR redelivers to other worker
No data corruption
User notified of restart if applicable

4. Connection-leak test:

Run lsof -i after deploy; check for stuck connections to DB / Redis / etc. Should be 0 or normal pool count.

5. Deploy frequency test:

Deploy 10 times in 1 hour during real traffic. Watch error rates; latency; user reports.

Should: invisible to customers.

For my CI/CD:

Tests today
Gaps

Output:

Test list
Tools
Cadence


The single most-useful test: **deploy during real traffic in staging**. Cron a deploy every 30 minutes; run synthetic traffic; observe. If errors spike during deploy, fix before production.

## Common Shutdown Mistakes

Help me avoid shutdown mistakes.

The 10 mistakes:

1. No SIGTERM handler Process killed instantly; in-flight requests dropped.

2. SIGTERM handler that exits immediately Stops accepting; doesn't drain.

3. terminationGracePeriodSeconds < drain time Process gets SIGKILL'd before draining done.

4. No preStop / readiness coordination Endpoint removal lags; traffic routed to dying pod.

5. Long-running jobs in web process Deploys kill jobs mid-execution.

6. DB connections leaked Pool exhaustion on next deploy.

7. Schema migrations in same deploy as breaking code change Rollback impossible if migration applied.

8. No feature flags for risky changes All-or-nothing release; can't gate / partial-rollback.

9. Health-check-as-deep-check (liveness checking DB) DB blip → restart loop.

10. No chaos testing Shutdown bugs surface in prod, not staging.

For my system: [risks]

Output:

Top 3 risks
Mitigations
Tests to add


The single most-painful mistake: **deploying a code change that requires NEW schema in same deploy as the migration that creates it**. Race: migration runs on one pod; another pod still on old code; queries fail; deployment rolls back; data state confused. Always: migration first, deploy second.

## What Done Looks Like

A working shutdown + deploy practice delivers:
- SIGTERM handler in every web / worker process
- Graceful shutdown sequence: stop accepting → drain → close external → exit
- Readiness check that flips to 503 during shutdown
- preStop hook (or equivalent) that gives LB time to drain
- terminationGracePeriodSeconds tuned to your drain time (typically 30-60s)
- Long-running jobs run in workers, not web
- Background-job idempotency
- DB-backwards-compatible migrations; cleanup migrations later
- Feature flags for risky changes
- Rolling deploys (or canary) standard
- Chaos test in staging
- Deploy frequency: multiple per day; invisible to customers

The proof you got it right: a customer browsing your app during a deploy notices nothing. A 10-minute background job survives a mid-execution deploy. Error rate during deploys matches steady-state error rate (not a spike).

## See Also

- [Incident Response](incident-response-chat.md) — companion reliability layer
- [Multi-region Deployment](multi-region-deployment-chat.md) — companion deployment scope
- [Backups & Disaster Recovery](backups-disaster-recovery-chat.md) — adjacent reliability
- [Service Level Agreements](service-level-agreements-chat.md) — what shutdown feeds
- [HTTP Retry & Backoff](http-retry-backoff-chat.md) — handles deploy hiccups upstream
- [Idempotency Patterns](idempotency-patterns-chat.md) — required for safe job retries
- [Database Migrations](database-migrations-chat.md) — coordinated with deploys
- [Logging Strategy & Structured Logs](logging-strategy-structured-logs-chat.md) — flush during shutdown
- [Preview Environments](preview-environments-chat.md) — shutdown matters in previews too
- [Feature Flags](feature-flags-chat.md) — decouple deploy from release
- [VibeReference: CI/CD Providers](https://vibereference.dev/devops-and-tools/cicd-providers) — deploy infra
- [VibeReference: Vercel](https://vibereference.dev/cloud-and-hosting/vercel) — Rolling Releases context
- [VibeReference: Cloud and Hosting](https://vibereference.dev/cloud-and-hosting) — broader hosting context