Customer Health Scoring: Predict Churn and Expansion Before They Happen
Customer Health Scoring for Your New SaaS
Goal: Build a composite customer health score that predicts which accounts are growing, stable, or at risk — not as a vanity metric, but as the operational layer that triggers expansion conversations, save plays, and customer-success outreach. Avoid the failure mode where founders spend a quarter building a sophisticated scoring model that nobody on the team trusts or acts on.
Process: Follow this chat pattern with your AI coding tool such as Claude or v0.app. Pay attention to the notes in [brackets] and replace the bracketed text with your own content.
Timeframe: v1 score with 5-7 signals shipped in 1-2 weeks. Operational rhythm (weekly review, action triggers) embedded in week 3. Quarterly recalibration baked into the calendar from launch onward.
Why Most Founder Customer Health Scores Are Useless
Three failure modes hit founders the same way:
- The "machine learning health score" trap. Founder reads about Gainsight or Catalyst's ML-driven scoring, builds a 30-signal composite model with weighted regression. The model has 8% explanatory power because the team has only 80 customers — far below the volume needed for ML to outperform a simple rule-based scoring. The model is a science project; nobody acts on it.
- The dashboard that nobody opens. Health score lives in a custom-built dashboard. The CS / sales team checks it once a week if they remember. By month 3, it's a static page nobody visits. Health scores must be embedded in the workflow (the daily / weekly cadence the team already runs); standalone dashboards die.
- Score-without-action. The team computes scores but never wires them to specific plays. "This account is at risk" surfaces; nothing happens. "This account is expanding" surfaces; nothing happens. A score with no action is decoration. Every score-tier must have a clear, automatic action — or it's wasted.
The version that works is structured: simple rule-based scoring with 5-7 signals, computed weekly into a single number per account, segmented into 3-5 health tiers, with each tier triggering specific actions wired into the team's existing workflow.
This guide assumes you have already done PostHog Setup (you have the underlying event data), have completed Activation Funnel Diagnosis (you know which behaviors predict success), and have shipped Reduce Churn (the reactive save flow; this guide is the proactive companion).
The Five Signals That Actually Predict Health
Most health scores are over-engineered. The signals that matter are simpler than founders expect.
You're helping me design the customer health score for [your product] at [your-domain.com]. The product is [one-sentence description] with [N paying customers].
The 5 signal categories that actually predict customer health:
**1. Engagement velocity**: are they using the product more or less than expected?
- Sessions per week vs the customer's own historical baseline (NOT vs absolute thresholds)
- Active users in the workspace vs the seat count purchased
- Feature breadth: are they using 1 feature or 5?
- Trend: is engagement increasing, stable, or declining over the last 30/60/90 days?
**2. Outcome attainment**: are they getting what they're paying for?
- Are they hitting the activation event regularly (per [Activation Funnel](activation-funnel-chat.md))?
- Are they completing the workflows that map to their stated use case?
- For products with measurable outcomes (revenue, conversions, errors caught): are these outcomes growing?
**3. Account growth signals**: is the account expanding or shrinking?
- Are seat counts growing within the workspace (positive expansion signal)?
- Are usage metrics approaching tier limits (positive expansion signal)?
- Is multiple-stakeholder engagement happening (multiple users active = stickiness)?
**4. Direct sentiment signals**: what are they actually telling you?
- Last NPS / CSAT score (per [Customer Feedback Surveys](customer-feedback-surveys-chat.md))
- Support ticket sentiment (positive / neutral / negative)
- Recent positive interactions (testimonials, referrals, social mentions)
**5. Risk signals**: any specific red flags?
- Failed payments
- Cancel-page visits
- Champion departure (the original buyer left their company)
- Open critical support tickets unresolved
- Visiting competitor pages (if you can detect via referral / search behavior)
For each signal, output:
- The specific event / data source that captures it
- The PostHog query or SQL that aggregates it
- The default weight in the composite score (start with rough weights; refine over 6 months)
- The threshold / score-band mapping (e.g., "Sessions per week >= 5 = +20 points; <2 = -10 points")
Sanity check: more than 7 signals in v1 is a sign of over-engineering. Cap at 7; refine before adding more.
Three principles I've watched founders re-learn:
- Trend matters more than absolute level. A customer who logs in 3x per week consistently is healthier than one who logged in 20x last week and twice this week. Watch the slope, not the snapshot.
- Use the customer's own baseline, not industry averages. "5 sessions per week" is great for one customer and concerning for another. Compare to their own trailing 4-8 week average.
- Sentiment + behavior beats either alone. A customer who talks happy but barely uses the product is a churn risk. A customer who uses heavily but complains in every support ticket is a different risk. Combining catches both.
1. Compute the Score Simply (Don't Use ML)
Below 500 customers, ML-based scoring underperforms rule-based scoring. Keep it simple.
Help me implement the rule-based health-score formula.
The model: linear combination of weighted signals, normalized to a 0-100 scale.
Pseudocode:
function computeHealthScore(account_id): signals = { 'engagement_velocity': scoreEngagementVelocity(account_id), // 0-100 'outcome_attainment': scoreOutcomeAttainment(account_id), // 0-100 'account_growth': scoreAccountGrowth(account_id), // 0-100 'sentiment': scoreSentiment(account_id), // 0-100 'risk_flags': scoreRiskFlags(account_id), // 0-100, lower = more risk }
weights = { 'engagement_velocity': 0.30, 'outcome_attainment': 0.30, 'account_growth': 0.15, 'sentiment': 0.15, 'risk_flags': 0.10, }
score = sum(signals[k] * weights[k] for k in signals) return clamp(score, 0, 100)
Each signal score is 0-100; composite is the weighted average.
For each signal-scoring function, the implementation pattern:
**scoreEngagementVelocity(account_id)**:
- Get sessions/week last 30 days
- Get sessions/week trailing 90-day baseline
- Ratio = recent / baseline
- Score: 80 if ratio >= 1.0; 60 if 0.7-1.0; 40 if 0.5-0.7; 20 if 0.3-0.5; 0 if <0.3
**scoreOutcomeAttainment(account_id)**:
- Did they hit activation event in the last 30 days? (yes/no = +50 / 0)
- Number of workflows completed last 30 days vs trailing baseline (use the same ratio approach)
- Combine: average of activation-yes-no + workflow-ratio
**scoreAccountGrowth(account_id)**:
- Seat count change last 30 days (gained = +30, stable = 50, lost = 80)
- Usage approaching limits = +bonus
- Combine
**scoreSentiment(account_id)**:
- Last NPS or CSAT score (mapped to 0-100)
- Support tickets last 30 days: 0 = +50, 1-2 neutral = 30, 3+ or any negative = 0
- Combine
**scoreRiskFlags(account_id)**:
- Failed payment = -30 points (max severity)
- Cancel-page visit last 30 days = -50
- Champion change = -20
- Open critical ticket unresolved = -20
- Combine: 100 - sum(flags)
Tier mapping:
- 80-100: **Healthy** — expansion candidate
- 60-79: **Stable** — maintain
- 40-59: **At Risk** — proactive attention
- 0-39: **Churning** — save play active
Output:
1. The computeHealthScore function code
2. The 5 signal-scoring functions with explicit thresholds
3. The weights table (start with my recommendations; you'll tune over 6 months)
4. The tier-mapping rules
5. The data sources required for each signal (PostHog events, Stripe data, support tool API, NPS responses)
Two principles:
- Start with intuition-based weights; tune from data. The first version of weights should reflect your beliefs about what drives health. After 6 months of customer history, you'll see which signals were predictive and rebalance.
- Linear is fine. Multi-factor models, decision trees, neural nets — all add complexity without lift below 500 customers. Stay linear until the data justifies more.
2. Run the Score Computation Weekly, Not Real-Time
Health scores aren't market data; they don't need real-time updates.
Design the score-computation pipeline.
The pattern:
**Weekly batch job** (every Monday morning):
- For every active paying account, compute the health score
- Store: account_id, computed_at, score, tier, signal_breakdown (for explainability)
- Per [Background Jobs Providers](https://www.vibereference.com/backend-and-data/background-jobs-providers): run on Inngest / Trigger.dev / your scheduler
**Schema**:
```sql
CREATE TABLE customer_health_scores (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id UUID NOT NULL REFERENCES accounts(id),
computed_at TIMESTAMP NOT NULL DEFAULT NOW(),
score INT NOT NULL, -- 0-100
tier TEXT NOT NULL, -- 'healthy' / 'stable' / 'at_risk' / 'churning'
prior_score INT, -- score from prior week
prior_tier TEXT,
delta INT, -- score - prior_score
signals JSONB NOT NULL, -- breakdown of signal scores for explainability
trigger_action TEXT -- which action was triggered, if any
);
CREATE INDEX idx_health_scores_account_computed ON customer_health_scores(account_id, computed_at DESC);
CREATE INDEX idx_health_scores_tier_computed ON customer_health_scores(tier, computed_at DESC);
Why weekly, not real-time:
- Health is a slow-moving signal; daily noise produces false positives
- Weekly batch is cheap to compute (one sweep per account)
- Weekly review fits human cadence; nobody reviews health daily
- Action triggers fire once per change, not on every data update
The change-detection logic:
- Compare this week's tier vs last week's
- If tier changed: trigger the action associated with the new tier
- If tier same but score moved >10 points: log the move; flag for review
- If tier same and score similar: no action
Output:
- The weekly batch job code
- The schema migration
- The change-detection logic
- The action-triggering wiring (covered in next section)
- The PostHog event firing on tier change for downstream analytics
The biggest mistake: **computing scores in real-time.** Daily noise creates false-positive tier transitions; the team gets paged on changes that mean nothing; trust in the system erodes. Weekly is the right cadence.
---
## 3. Wire Tier Transitions to Specific Actions
A score without action is decoration. Every tier transition triggers a specific play.
Design the tier-to-action mapping.
For each tier-transition pattern, define:
Healthy → Healthy (sustained): action = expansion target
- Add to expansion-prospects list
- Quarterly: founder or CS does the QBR (per Land and Expand)
- In-product: surface premium-feature CTAs respectfully
Stable → Healthy (improvement): action = celebrate, then expand
- Send a "you're crushing it" email with specific data
- Add to expansion-prospects list
Stable → Stable (sustained): action = no action
- Monitor; let the relationship continue
- Don't introduce friction
Healthy → Stable (degradation): action = soft check-in
- Light-touch outreach: "Noticed your usage shifted — anything we can help with?"
- Don't pitch; ask
- Dig for the cause
Stable → At Risk: action = founder-aware
- Surface to weekly review
- Investigate the cause (read recent support tickets, look at usage patterns)
- Light-touch outreach: ask the customer what changed
- Don't deploy save-play yet; might be a temporary dip
At Risk → At Risk (sustained 2+ weeks): action = save-play activation
- Per Reduce Churn: targeted save sequence
- Founder reach-out for high-tier accounts
- Investigate root cause
At Risk → Churning: action = full save-play
- Immediate founder outreach for any account paying $200+/mo
- All save-flow tactics activated
- Document the cause if they churn anyway (for product / pricing learnings)
Any tier → Churning (sudden): action = high-priority intervention
- This is unusual; usually means a specific incident (failed payment, support failure, competitor switch)
- Founder gets paged
- 24-hour response time
Churning → improved tier: action = save success — celebrate
- Save was successful
- Document what worked
- Add the customer to potential reference / case study list
Output:
- The full transition matrix with associated action
- The action playbooks for each (templates, scripts, owners)
- The team-routing: who handles each action (founder / CS / sales / support)
- The escalation path for high-priority transitions
- The action-tracking: every triggered action is logged so we can review effectiveness
The single most useful action: **the founder reach-out for any high-tier account moving At Risk → Churning.** A 15-minute conversation between founder and customer at the moment of risk saves more revenue than any automated sequence.
---
## 4. Embed the Health Score in the Team's Existing Workflow
Standalone health-score dashboards die. Embed in the daily / weekly tools the team already uses.
Help me embed the health score into the team's existing workflow.
The integration points:
1. Customer support tool (Plain / Help Scout / Intercom / per Customer Support Tools):
- Display health-score badge on every customer's profile in the support tool
- Color-coded: green (healthy), yellow (stable), orange (at risk), red (churning)
- When the support agent opens a ticket from an at-risk customer, they see context immediately
- Implementation: webhook from your app to the support tool's API; or use the support tool's custom-attributes feature
2. CRM / sales tool (HubSpot / Salesforce / Attio):
- Sync health score as a custom field
- Trigger alerts on tier changes for the account owner
- Use in segmentation: "all healthy accounts past 12 months tenure" = expansion-target list
3. Slack / team chat:
- Daily digest in a #customer-health channel: "5 new at-risk accounts this morning; 3 expansion opportunities"
- On-demand notification: tier changes for the team's top-20 accounts
- Don't over-notify; daily digest > real-time alerts
4. Founder dashboard (per Customer Analytics Dashboards):
- Health-tier distribution across the customer base
- This-week tier transitions
- Top 10 at-risk accounts ranked by ARR
- Top 10 expansion candidates ranked by upside
5. Internal admin tools (per Internal Admin Tools):
- Search any customer; see their current health score + 12-week trend
- Drill into the signal breakdown for explainability
- See the action history for the account
6. Team weekly meeting:
- 15-minute review at the start of every week
- Walk through the at-risk and churning lists
- Assign owners for outreach
- Review prior-week actions: did the save play work?
Anti-patterns:
- Health-score-only dashboard (dies)
- Real-time alerts on every change (alert fatigue)
- Score visible to customers (don't expose internal scoring; can be misinterpreted)
- Score that's not explainable (black-box ML scores erode team trust)
Output:
- The 6 integration points with implementation specifics
- The data flow: PostHog / app → health-score table → integrations
- The weekly meeting agenda template
- The Slack digest format
The single highest-leverage embed: **the health-tier badge in the support tool.** When a support agent opens a ticket from an at-risk customer, they treat the conversation differently. That single piece of context across hundreds of support tickets per quarter is the difference between losing accounts and saving them.
---
## 5. Make the Score Explainable
Black-box scores erode team trust. Explainability builds it.
Design the score-explainability surface.
Whenever a team member opens a customer's health score, they see:
The headline:
- "Customer X: Health 67 (Stable, +5 vs last week)"
- Color-coded tier
- Direction indicator
The signal breakdown:
- Engagement Velocity: 75/100 (sessions stable; +5% vs trailing baseline)
- Outcome Attainment: 80/100 (hit activation 4/4 weeks; workflow completion strong)
- Account Growth: 50/100 (no seat changes; usage 60% of tier limit)
- Sentiment: 90/100 (NPS 9 last quarter; no negative tickets)
- Risk Flags: 100/100 (no flags)
- Composite: weighted average = 73 → 67 (rounded with prior-week influence)
The 12-week trend:
- Line chart showing score over time
- Annotations for tier transitions
- Mark significant events (NPS response, support ticket spike, etc.)
The action history:
- "2 weeks ago: founder check-in (saved from At Risk)"
- "5 weeks ago: NPS response collected"
- "8 weeks ago: 2 new seats added"
The "why am I seeing this customer right now?":
- If transitioning between tiers: "Moved from Stable → At Risk because: Sentiment dropped from 90 to 40 (last week's negative ticket)."
- If unchanged: "No tier change; reviewed weekly per cadence."
Why explainability matters:
- Team members trust the score when they understand it
- Bad scores get challenged ("the score says risky but I just talked to them and they're fine") — investigate the discrepancy; sometimes the data is wrong, sometimes the human's read is wrong
- Customer-facing decisions benefit from understanding (if you're going to call a customer about churn risk, you need to know what the model thinks the risk is)
Output:
- The customer-detail page UI
- The signal-breakdown visualization
- The trend-chart spec
- The "why" explanation generator
- The action-history display
The discipline that builds team trust: **let team members challenge the score.** When the score says "at risk" and the salesperson knows the customer just expanded, that's a real signal. The model is wrong; investigate which signal is mismeasured. Treat the team's knowledge as a debugging tool for the model.
---
## 6. Recalibrate Quarterly
The score's predictive power decays as your product, customer base, and market evolve. Recalibrate.
Design the quarterly recalibration process.
Every 90 days:
Step 1: Pull churn cohort
- Customers who churned in the last 90 days
- Look up their health score 30 days before they churned
- Ideal: most should have been "At Risk" or "Churning" tier — predictive power
- Bad: many were "Healthy" 30 days before churn — model is missing signals
Step 2: Pull expansion cohort
- Customers who upgraded / added seats / expanded ARR in the last 90 days
- Look up their health score 30 days before expansion
- Ideal: most should have been "Healthy" — predictive power for expansion
- Bad: expansion happening across all tiers randomly — model is blind to growth
Step 3: Pull false-positive cohort
- Customers in "At Risk" or "Churning" 30 days ago who are still active and healthy now
- These are accounts where the model overcorrected
- Investigate what flagged them incorrectly — adjust the signal
Step 4: Adjust weights
- Signals that strongly predict outcomes get higher weight
- Signals that produce false positives get lower weight
- Document the adjustment with rationale
Step 5: Add / retire signals
- New signals worth adding (something the team consistently uses to assess health that's not in the model)
- Old signals to retire (signals that haven't been predictive in 6+ months)
Step 6: Communicate to team
- The calibration changed weights / signals
- Explain in a 1-page summary
- Re-run scores against the prior 30 days to see how the new model would have ranked
Quarterly review meeting (30 minutes):
- Founder + CS + sales + product
- Walk through the cohort analysis
- Decide on adjustments
- Commit to next quarter's signals + weights
Output:
- The cohort analysis SQL / queries
- The recalibration process checklist
- The 1-page summary template for team communication
- The decision-log for tracking weight changes over time
The discipline that prevents drift: **document every weight change.** A year in, you can see the model's evolution and which signals proved most predictive — informing future iteration.
---
## 7. Avoid the Black-Box Trap
ML-based scoring sounds sophisticated but kills team trust below scale.
Resist the temptation to upgrade to ML scoring prematurely.
When ML scoring is the right move:
- 1,000+ customers (sample size sufficient)
- 200+ historical churn / expansion events (target variable rich enough)
- Dedicated ML / data team that can maintain it
- Explainability solved (use SHAP values, partial dependence plots — black box not acceptable)
When ML scoring is NOT the right move (i.e., almost every indie SaaS in 2026):
- Below 500 customers
- Score is reviewed by humans who need to trust it
- No dedicated data person to own the model
- The team's intuition is more valuable than weak statistical signal
Stay rule-based until:
- The rule-based score has been refined through 4+ quarterly recalibrations
- You have 1,000+ customers with rich churn / expansion history
- You have specific predictions ML can make that rules genuinely can't
Even then, ML should augment rule-based, not replace it. The simplest pattern:
- Rules-based score is the primary
- ML model provides "additional signal" — its own score
- Both are visible to the team
- Discrepancies are investigated, not auto-resolved
Output:
- The "ML readiness" checklist
- The phased upgrade plan (rules → rules + ML augmentation → ML primary if/when warranted)
The single most-useful position: **stay rule-based until the data forces you not to.** Most indie SaaS founders never need to upgrade beyond rule-based.
---
## What Done Looks Like
By end of week 3 of building customer health scoring:
1. **5-7 signals** defined with PostHog / SQL queries
2. **Composite score formula** computing weekly per account
3. **4-tier system** (Healthy / Stable / At Risk / Churning) with clear boundaries
4. **Action playbooks** for each tier transition
5. **Embedded badges** in support tool + CRM
6. **Weekly review meeting** running with founder + CS
Within 90 days:
- Tier transitions triggering actions reliably
- Save plays from At Risk transitions documented; success rate measured
- Expansion plays from Healthy transitions producing 1-3 expansion conversations
- 1 quarterly recalibration completed with documented weight changes
Within 12 months:
- Health-score predictive power: At Risk customers 30 days out churn at 5-10x the rate of Healthy customers
- Expansion: Healthy customers expand at 3-5x the rate of Stable customers
- Team trust in the model is high; adjustments based on cohort analysis, not gut
- Health score is a primary input to founder's weekly customer-priority decisions
---
## Common Pitfalls
- **Over-engineering with ML below 500 customers.** Rule-based wins.
- **Real-time computation.** Weekly is the right cadence; daily creates noise.
- **Standalone dashboard.** Embed in existing tools (support, CRM, founder dashboard) or it dies.
- **Score without action.** Every tier-transition must trigger a specific play.
- **Black-box score.** Team needs to see the breakdown to trust it.
- **No quarterly recalibration.** Predictive power decays as product / customer base evolves.
- **Showing scores to customers.** Don't. They can be misinterpreted; might erode trust.
- **Too many signals in v1.** 5-7 is the sweet spot; 15-20 is over-engineering.
- **Using absolute thresholds, not customer baselines.** "5 sessions/week" punishes light users; ratios vs trailing baseline are the right model.
- **Ignoring sentiment signals.** Behavior + sentiment together; either alone misses cases.
---
## Where Customer Health Scoring Plugs Into the Rest of the Stack
- [Reduce Churn](reduce-churn-chat.md) — the reactive save-flow companion; this guide is the proactive layer
- [Land and Expand](https://www.launchweek.ai/convert/expansion-revenue) — health score drives expansion prioritization
- [PostHog Setup](posthog-setup-chat.md) — feeds the engagement / outcome signals
- [Activation Funnel](activation-funnel-chat.md) — activation event is a primary signal
- [Customer Feedback Surveys](customer-feedback-surveys-chat.md) — NPS/CSAT feeds sentiment signal
- [Customer Support Tools](https://www.vibereference.com/product-and-design/customer-support-tools) — embed health-score badge
- [Internal Admin Tools](internal-admin-tools-chat.md) — health score visible per customer in admin UI
- [Customer Analytics Dashboards](customer-analytics-dashboards-chat.md) — different audience (this is internal; that is customer-facing)
- [Multi-Tenant Data Isolation](multi-tenancy-chat.md) — score is per-account, scoped by tenant
- [Background Jobs Providers](https://www.vibereference.com/backend-and-data/background-jobs-providers) — weekly batch runs here
---
## What's Next
Customer health scoring is the operational layer that turns "we should keep an eye on accounts" into "we know which 8 accounts to focus on this week and what to do about each." The team that ships rule-based scoring in week 3 of launch builds a customer-success motion that scales; the team that defers it operates on intuition until it doesn't scale.
Build the discipline now. The 5 signals, the weekly batch, the tier-to-action mapping, the embedded badges — none are individually big projects. Together they're the difference between reactive customer-success (responding to fires) and proactive customer-success (preventing them).
---
[⬅️ Growth Overview](README.md)