Customer Health Scoring: Predict Churn and Expansion Before They Happen

Customer Health Scoring for Your New SaaS

Goal: Build a composite customer health score that predicts which accounts are growing, stable, or at risk — not as a vanity metric, but as the operational layer that triggers expansion conversations, save plays, and customer-success outreach. Avoid the failure mode where founders spend a quarter building a sophisticated scoring model that nobody on the team trusts or acts on.

Process: Follow this chat pattern with your AI coding tool such as Claude or v0.app. Pay attention to the notes in [brackets] and replace the bracketed text with your own content.

Timeframe: v1 score with 5-7 signals shipped in 1-2 weeks. Operational rhythm (weekly review, action triggers) embedded in week 3. Quarterly recalibration baked into the calendar from launch onward.

Why Most Founder Customer Health Scores Are Useless

Three failure modes hit founders the same way:

The "machine learning health score" trap. Founder reads about Gainsight or Catalyst's ML-driven scoring, builds a 30-signal composite model with weighted regression. The model has 8% explanatory power because the team has only 80 customers — far below the volume needed for ML to outperform a simple rule-based scoring. The model is a science project; nobody acts on it.
The dashboard that nobody opens. Health score lives in a custom-built dashboard. The CS / sales team checks it once a week if they remember. By month 3, it's a static page nobody visits. Health scores must be embedded in the workflow (the daily / weekly cadence the team already runs); standalone dashboards die.
Score-without-action. The team computes scores but never wires them to specific plays. "This account is at risk" surfaces; nothing happens. "This account is expanding" surfaces; nothing happens. A score with no action is decoration. Every score-tier must have a clear, automatic action — or it's wasted.

The version that works is structured: simple rule-based scoring with 5-7 signals, computed weekly into a single number per account, segmented into 3-5 health tiers, with each tier triggering specific actions wired into the team's existing workflow.

This guide assumes you have already done PostHog Setup (you have the underlying event data), have completed Activation Funnel Diagnosis (you know which behaviors predict success), and have shipped Reduce Churn (the reactive save flow; this guide is the proactive companion).

The Five Signals That Actually Predict Health

Most health scores are over-engineered. The signals that matter are simpler than founders expect.

You're helping me design the customer health score for [your product] at [your-domain.com]. The product is [one-sentence description] with [N paying customers].

The 5 signal categories that actually predict customer health:

**1. Engagement velocity**: are they using the product more or less than expected?
- Sessions per week vs the customer's own historical baseline (NOT vs absolute thresholds)
- Active users in the workspace vs the seat count purchased
- Feature breadth: are they using 1 feature or 5?
- Trend: is engagement increasing, stable, or declining over the last 30/60/90 days?

**2. Outcome attainment**: are they getting what they're paying for?
- Are they hitting the activation event regularly (per [Activation Funnel](activation-funnel-chat.md))?
- Are they completing the workflows that map to their stated use case?
- For products with measurable outcomes (revenue, conversions, errors caught): are these outcomes growing?

**3. Account growth signals**: is the account expanding or shrinking?
- Are seat counts growing within the workspace (positive expansion signal)?
- Are usage metrics approaching tier limits (positive expansion signal)?
- Is multiple-stakeholder engagement happening (multiple users active = stickiness)?

**4. Direct sentiment signals**: what are they actually telling you?
- Last NPS / CSAT score (per [Customer Feedback Surveys](customer-feedback-surveys-chat.md))
- Support ticket sentiment (positive / neutral / negative)
- Recent positive interactions (testimonials, referrals, social mentions)

**5. Risk signals**: any specific red flags?
- Failed payments
- Cancel-page visits
- Champion departure (the original buyer left their company)
- Open critical support tickets unresolved
- Visiting competitor pages (if you can detect via referral / search behavior)

For each signal, output:
- The specific event / data source that captures it
- The PostHog query or SQL that aggregates it
- The default weight in the composite score (start with rough weights; refine over 6 months)
- The threshold / score-band mapping (e.g., "Sessions per week >= 5 = +20 points; <2 = -10 points")

Sanity check: more than 7 signals in v1 is a sign of over-engineering. Cap at 7; refine before adding more.

Three principles I've watched founders re-learn:

Trend matters more than absolute level. A customer who logs in 3x per week consistently is healthier than one who logged in 20x last week and twice this week. Watch the slope, not the snapshot.
Use the customer's own baseline, not industry averages. "5 sessions per week" is great for one customer and concerning for another. Compare to their own trailing 4-8 week average.
Sentiment + behavior beats either alone. A customer who talks happy but barely uses the product is a churn risk. A customer who uses heavily but complains in every support ticket is a different risk. Combining catches both.

1. Compute the Score Simply (Don't Use ML)

Below 500 customers, ML-based scoring underperforms rule-based scoring. Keep it simple.

Help me implement the rule-based health-score formula.

The model: linear combination of weighted signals, normalized to a 0-100 scale.

Pseudocode:

function computeHealthScore(account_id): signals = { 'engagement_velocity': scoreEngagementVelocity(account_id), // 0-100 'outcome_attainment': scoreOutcomeAttainment(account_id), // 0-100 'account_growth': scoreAccountGrowth(account_id), // 0-100 'sentiment': scoreSentiment(account_id), // 0-100 'risk_flags': scoreRiskFlags(account_id), // 0-100, lower = more risk }

weights = { 'engagement_velocity': 0.30, 'outcome_attainment': 0.30, 'account_growth': 0.15, 'sentiment': 0.15, 'risk_flags': 0.10, }

score = sum(signals[k] * weights[k] for k in signals) return clamp(score, 0, 100)


Each signal score is 0-100; composite is the weighted average.

For each signal-scoring function, the implementation pattern:

**scoreEngagementVelocity(account_id)**:
- Get sessions/week last 30 days
- Get sessions/week trailing 90-day baseline
- Ratio = recent / baseline
- Score: 80 if ratio >= 1.0; 60 if 0.7-1.0; 40 if 0.5-0.7; 20 if 0.3-0.5; 0 if <0.3

**scoreOutcomeAttainment(account_id)**:
- Did they hit activation event in the last 30 days? (yes/no = +50 / 0)
- Number of workflows completed last 30 days vs trailing baseline (use the same ratio approach)
- Combine: average of activation-yes-no + workflow-ratio

**scoreAccountGrowth(account_id)**:
- Seat count change last 30 days (gained = +30, stable = 50, lost = 80)
- Usage approaching limits = +bonus
- Combine

**scoreSentiment(account_id)**:
- Last NPS or CSAT score (mapped to 0-100)
- Support tickets last 30 days: 0 = +50, 1-2 neutral = 30, 3+ or any negative = 0
- Combine

**scoreRiskFlags(account_id)**:
- Failed payment = -30 points (max severity)
- Cancel-page visit last 30 days = -50
- Champion change = -20
- Open critical ticket unresolved = -20
- Combine: 100 - sum(flags)

Tier mapping:
- 80-100: **Healthy** — expansion candidate
- 60-79: **Stable** — maintain
- 40-59: **At Risk** — proactive attention
- 0-39: **Churning** — save play active

Output:
1. The computeHealthScore function code
2. The 5 signal-scoring functions with explicit thresholds
3. The weights table (start with my recommendations; you'll tune over 6 months)
4. The tier-mapping rules
5. The data sources required for each signal (PostHog events, Stripe data, support tool API, NPS responses)

Two principles:

Start with intuition-based weights; tune from data. The first version of weights should reflect your beliefs about what drives health. After 6 months of customer history, you'll see which signals were predictive and rebalance.
Linear is fine. Multi-factor models, decision trees, neural nets — all add complexity without lift below 500 customers. Stay linear until the data justifies more.

2. Run the Score Computation Weekly, Not Real-Time

Health scores aren't market data; they don't need real-time updates.

Design the score-computation pipeline.

The pattern:

**Weekly batch job** (every Monday morning):
- For every active paying account, compute the health score
- Store: account_id, computed_at, score, tier, signal_breakdown (for explainability)
- Per [Background Jobs Providers](https://www.vibereference.com/backend-and-data/background-jobs-providers): run on Inngest / Trigger.dev / your scheduler

**Schema**:

```sql
CREATE TABLE customer_health_scores (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  account_id UUID NOT NULL REFERENCES accounts(id),
  computed_at TIMESTAMP NOT NULL DEFAULT NOW(),
  score INT NOT NULL,                    -- 0-100
  tier TEXT NOT NULL,                    -- 'healthy' / 'stable' / 'at_risk' / 'churning'
  prior_score INT,                       -- score from prior week
  prior_tier TEXT,
  delta INT,                             -- score - prior_score
  signals JSONB NOT NULL,                -- breakdown of signal scores for explainability
  trigger_action TEXT                    -- which action was triggered, if any
);

CREATE INDEX idx_health_scores_account_computed ON customer_health_scores(account_id, computed_at DESC);
CREATE INDEX idx_health_scores_tier_computed ON customer_health_scores(tier, computed_at DESC);

Why weekly, not real-time:

Health is a slow-moving signal; daily noise produces false positives
Weekly batch is cheap to compute (one sweep per account)
Weekly review fits human cadence; nobody reviews health daily
Action triggers fire once per change, not on every data update

The change-detection logic:

Compare this week's tier vs last week's
If tier changed: trigger the action associated with the new tier
If tier same but score moved >10 points: log the move; flag for review
If tier same and score similar: no action

Output:

The weekly batch job code
The schema migration
The change-detection logic
The action-triggering wiring (covered in next section)
The PostHog event firing on tier change for downstream analytics


The biggest mistake: **computing scores in real-time.** Daily noise creates false-positive tier transitions; the team gets paged on changes that mean nothing; trust in the system erodes. Weekly is the right cadence.

---

## 3. Wire Tier Transitions to Specific Actions

A score without action is decoration. Every tier transition triggers a specific play.

Design the tier-to-action mapping.

For each tier-transition pattern, define:

Healthy → Healthy (sustained): action = expansion target

Add to expansion-prospects list
Quarterly: founder or CS does the QBR (per Land and Expand)
In-product: surface premium-feature CTAs respectfully

Stable → Healthy (improvement): action = celebrate, then expand

Send a "you're crushing it" email with specific data
Add to expansion-prospects list

Stable → Stable (sustained): action = no action

Monitor; let the relationship continue
Don't introduce friction

Healthy → Stable (degradation): action = soft check-in

Light-touch outreach: "Noticed your usage shifted — anything we can help with?"
Don't pitch; ask
Dig for the cause

Stable → At Risk: action = founder-aware

Surface to weekly review
Investigate the cause (read recent support tickets, look at usage patterns)
Light-touch outreach: ask the customer what changed
Don't deploy save-play yet; might be a temporary dip

At Risk → At Risk (sustained 2+ weeks): action = save-play activation

Per Reduce Churn: targeted save sequence
Founder reach-out for high-tier accounts
Investigate root cause

At Risk → Churning: action = full save-play

Immediate founder outreach for any account paying $200+/mo
All save-flow tactics activated
Document the cause if they churn anyway (for product / pricing learnings)

Any tier → Churning (sudden): action = high-priority intervention

This is unusual; usually means a specific incident (failed payment, support failure, competitor switch)
Founder gets paged
24-hour response time

Churning → improved tier: action = save success — celebrate

Save was successful
Document what worked
Add the customer to potential reference / case study list

Output:

The full transition matrix with associated action
The action playbooks for each (templates, scripts, owners)
The team-routing: who handles each action (founder / CS / sales / support)
The escalation path for high-priority transitions
The action-tracking: every triggered action is logged so we can review effectiveness


The single most useful action: **the founder reach-out for any high-tier account moving At Risk → Churning.** A 15-minute conversation between founder and customer at the moment of risk saves more revenue than any automated sequence.

---

## 4. Embed the Health Score in the Team's Existing Workflow

Standalone health-score dashboards die. Embed in the daily / weekly tools the team already uses.

Help me embed the health score into the team's existing workflow.

The integration points:

1. Customer support tool (Plain / Help Scout / Intercom / per Customer Support Tools):

Display health-score badge on every customer's profile in the support tool
Color-coded: green (healthy), yellow (stable), orange (at risk), red (churning)
When the support agent opens a ticket from an at-risk customer, they see context immediately
Implementation: webhook from your app to the support tool's API; or use the support tool's custom-attributes feature

2. CRM / sales tool (HubSpot / Salesforce / Attio):

Sync health score as a custom field
Trigger alerts on tier changes for the account owner
Use in segmentation: "all healthy accounts past 12 months tenure" = expansion-target list

3. Slack / team chat:

Daily digest in a #customer-health channel: "5 new at-risk accounts this morning; 3 expansion opportunities"
On-demand notification: tier changes for the team's top-20 accounts
Don't over-notify; daily digest > real-time alerts

4. Founder dashboard (per Customer Analytics Dashboards):

Health-tier distribution across the customer base
This-week tier transitions
Top 10 at-risk accounts ranked by ARR
Top 10 expansion candidates ranked by upside

5. Internal admin tools (per Internal Admin Tools):

Search any customer; see their current health score + 12-week trend
Drill into the signal breakdown for explainability
See the action history for the account

6. Team weekly meeting:

15-minute review at the start of every week
Walk through the at-risk and churning lists
Assign owners for outreach
Review prior-week actions: did the save play work?

Anti-patterns:

Health-score-only dashboard (dies)
Real-time alerts on every change (alert fatigue)
Score visible to customers (don't expose internal scoring; can be misinterpreted)
Score that's not explainable (black-box ML scores erode team trust)

Output:

The 6 integration points with implementation specifics
The data flow: PostHog / app → health-score table → integrations
The weekly meeting agenda template
The Slack digest format


The single highest-leverage embed: **the health-tier badge in the support tool.** When a support agent opens a ticket from an at-risk customer, they treat the conversation differently. That single piece of context across hundreds of support tickets per quarter is the difference between losing accounts and saving them.

---

## 5. Make the Score Explainable

Black-box scores erode team trust. Explainability builds it.

Design the score-explainability surface.

Whenever a team member opens a customer's health score, they see:

The headline:

"Customer X: Health 67 (Stable, +5 vs last week)"
Color-coded tier
Direction indicator

The signal breakdown:

Engagement Velocity: 75/100 (sessions stable; +5% vs trailing baseline)
Outcome Attainment: 80/100 (hit activation 4/4 weeks; workflow completion strong)
Account Growth: 50/100 (no seat changes; usage 60% of tier limit)
Sentiment: 90/100 (NPS 9 last quarter; no negative tickets)
Risk Flags: 100/100 (no flags)
Composite: weighted average = 73 → 67 (rounded with prior-week influence)

The 12-week trend:

Line chart showing score over time
Annotations for tier transitions
Mark significant events (NPS response, support ticket spike, etc.)

The action history:

"2 weeks ago: founder check-in (saved from At Risk)"
"5 weeks ago: NPS response collected"
"8 weeks ago: 2 new seats added"

The "why am I seeing this customer right now?":

If transitioning between tiers: "Moved from Stable → At Risk because: Sentiment dropped from 90 to 40 (last week's negative ticket)."
If unchanged: "No tier change; reviewed weekly per cadence."

Why explainability matters:

Team members trust the score when they understand it
Bad scores get challenged ("the score says risky but I just talked to them and they're fine") — investigate the discrepancy; sometimes the data is wrong, sometimes the human's read is wrong
Customer-facing decisions benefit from understanding (if you're going to call a customer about churn risk, you need to know what the model thinks the risk is)

Output:

The customer-detail page UI
The signal-breakdown visualization
The trend-chart spec
The "why" explanation generator
The action-history display


The discipline that builds team trust: **let team members challenge the score.** When the score says "at risk" and the salesperson knows the customer just expanded, that's a real signal. The model is wrong; investigate which signal is mismeasured. Treat the team's knowledge as a debugging tool for the model.

---

## 6. Recalibrate Quarterly

The score's predictive power decays as your product, customer base, and market evolve. Recalibrate.

Design the quarterly recalibration process.

Every 90 days:

Step 1: Pull churn cohort

Customers who churned in the last 90 days
Look up their health score 30 days before they churned
Ideal: most should have been "At Risk" or "Churning" tier — predictive power
Bad: many were "Healthy" 30 days before churn — model is missing signals

Step 2: Pull expansion cohort

Customers who upgraded / added seats / expanded ARR in the last 90 days
Look up their health score 30 days before expansion
Ideal: most should have been "Healthy" — predictive power for expansion
Bad: expansion happening across all tiers randomly — model is blind to growth

Step 3: Pull false-positive cohort

Customers in "At Risk" or "Churning" 30 days ago who are still active and healthy now
These are accounts where the model overcorrected
Investigate what flagged them incorrectly — adjust the signal

Step 4: Adjust weights

Signals that strongly predict outcomes get higher weight
Signals that produce false positives get lower weight
Document the adjustment with rationale

Step 5: Add / retire signals

New signals worth adding (something the team consistently uses to assess health that's not in the model)
Old signals to retire (signals that haven't been predictive in 6+ months)

Step 6: Communicate to team

The calibration changed weights / signals
Explain in a 1-page summary
Re-run scores against the prior 30 days to see how the new model would have ranked

Quarterly review meeting (30 minutes):

Founder + CS + sales + product
Walk through the cohort analysis
Decide on adjustments
Commit to next quarter's signals + weights

Output:

The cohort analysis SQL / queries
The recalibration process checklist
The 1-page summary template for team communication
The decision-log for tracking weight changes over time


The discipline that prevents drift: **document every weight change.** A year in, you can see the model's evolution and which signals proved most predictive — informing future iteration.

---

## 7. Avoid the Black-Box Trap

ML-based scoring sounds sophisticated but kills team trust below scale.

Resist the temptation to upgrade to ML scoring prematurely.

When ML scoring is the right move:

1,000+ customers (sample size sufficient)
200+ historical churn / expansion events (target variable rich enough)
Dedicated ML / data team that can maintain it
Explainability solved (use SHAP values, partial dependence plots — black box not acceptable)

When ML scoring is NOT the right move (i.e., almost every indie SaaS in 2026):

Below 500 customers
Score is reviewed by humans who need to trust it
No dedicated data person to own the model
The team's intuition is more valuable than weak statistical signal

Stay rule-based until:

The rule-based score has been refined through 4+ quarterly recalibrations
You have 1,000+ customers with rich churn / expansion history
You have specific predictions ML can make that rules genuinely can't

Even then, ML should augment rule-based, not replace it. The simplest pattern:

Rules-based score is the primary
ML model provides "additional signal" — its own score
Both are visible to the team
Discrepancies are investigated, not auto-resolved

Output:

The "ML readiness" checklist
The phased upgrade plan (rules → rules + ML augmentation → ML primary if/when warranted)


The single most-useful position: **stay rule-based until the data forces you not to.** Most indie SaaS founders never need to upgrade beyond rule-based.

---

## What Done Looks Like

By end of week 3 of building customer health scoring:
1. **5-7 signals** defined with PostHog / SQL queries
2. **Composite score formula** computing weekly per account
3. **4-tier system** (Healthy / Stable / At Risk / Churning) with clear boundaries
4. **Action playbooks** for each tier transition
5. **Embedded badges** in support tool + CRM
6. **Weekly review meeting** running with founder + CS

Within 90 days:
- Tier transitions triggering actions reliably
- Save plays from At Risk transitions documented; success rate measured
- Expansion plays from Healthy transitions producing 1-3 expansion conversations
- 1 quarterly recalibration completed with documented weight changes

Within 12 months:
- Health-score predictive power: At Risk customers 30 days out churn at 5-10x the rate of Healthy customers
- Expansion: Healthy customers expand at 3-5x the rate of Stable customers
- Team trust in the model is high; adjustments based on cohort analysis, not gut
- Health score is a primary input to founder's weekly customer-priority decisions

---

## Common Pitfalls

- **Over-engineering with ML below 500 customers.** Rule-based wins.
- **Real-time computation.** Weekly is the right cadence; daily creates noise.
- **Standalone dashboard.** Embed in existing tools (support, CRM, founder dashboard) or it dies.
- **Score without action.** Every tier-transition must trigger a specific play.
- **Black-box score.** Team needs to see the breakdown to trust it.
- **No quarterly recalibration.** Predictive power decays as product / customer base evolves.
- **Showing scores to customers.** Don't. They can be misinterpreted; might erode trust.
- **Too many signals in v1.** 5-7 is the sweet spot; 15-20 is over-engineering.
- **Using absolute thresholds, not customer baselines.** "5 sessions/week" punishes light users; ratios vs trailing baseline are the right model.
- **Ignoring sentiment signals.** Behavior + sentiment together; either alone misses cases.

---

## Where Customer Health Scoring Plugs Into the Rest of the Stack

- [Reduce Churn](reduce-churn-chat.md) — the reactive save-flow companion; this guide is the proactive layer
- [Land and Expand](https://www.launchweek.ai/convert/expansion-revenue) — health score drives expansion prioritization
- [PostHog Setup](posthog-setup-chat.md) — feeds the engagement / outcome signals
- [Activation Funnel](activation-funnel-chat.md) — activation event is a primary signal
- [Customer Feedback Surveys](customer-feedback-surveys-chat.md) — NPS/CSAT feeds sentiment signal
- [Customer Support Tools](https://www.vibereference.com/product-and-design/customer-support-tools) — embed health-score badge
- [Internal Admin Tools](internal-admin-tools-chat.md) — health score visible per customer in admin UI
- [Customer Analytics Dashboards](customer-analytics-dashboards-chat.md) — different audience (this is internal; that is customer-facing)
- [Multi-Tenant Data Isolation](multi-tenancy-chat.md) — score is per-account, scoped by tenant
- [Background Jobs Providers](https://www.vibereference.com/backend-and-data/background-jobs-providers) — weekly batch runs here

---

## What's Next

Customer health scoring is the operational layer that turns "we should keep an eye on accounts" into "we know which 8 accounts to focus on this week and what to do about each." The team that ships rule-based scoring in week 3 of launch builds a customer-success motion that scales; the team that defers it operates on intuition until it doesn't scale.

Build the discipline now. The 5 signals, the weekly batch, the tier-to-action mapping, the embedded badges — none are individually big projects. Together they're the difference between reactive customer-success (responding to fires) and proactive customer-success (preventing them).

---

[⬅️ Growth Overview](README.md)