SCANNING● LIVE DATAUTC 2026-06-29 00:00:00
VENDORS 23RECEIPTS 32CRITICAL 57D 0 NEWv0.9.4-rc1
gotnerfedgotnerfed

The score is the rigorous thing. The jury is the unusual thing. Skip straight to a worked example or read top-to-bottom.

01 · THE AI JURY

Three frontier models, grading us in public.

For every receipt at major severity or above, three independent LLMs from three different vendors read the change diff, the sources, and the impact context. Each issues a private verdict - harmful, neutral, beneficial, or unclear - plus a 0-100 score and a short rationale. Verdicts are sealed, published, and immutable.

JUROR · 01
claude-haiku-4-5
Anthropic
JUROR · 02
gpt-5-nano
OpenAI
JUROR · 03
gemini-2-5-flash
Google
WHAT THE JURY DOES · AND DOES NOT
● DOES
  • Render a public verdict per receipt, signed by model
  • Provide a short rationale you can argue with
  • Get Brier-scored against subsequent reality - accuracy public
  • Disagree with each other; disagreement is the data
● DOES NOT
  • Influence the deterministic Nerf Score in any way
  • Decide which receipts get filed (humans + scanner do that)
  • Get rerun if a model “changes its mind” later - verdicts are sealed
  • Speak for gotnerfed - they speak for themselves

Why three models from three vendors? Because no single LLM can be trusted to grade plan changes from its own parent company without conflict. When Anthropic gets nerfed, claude-haiku-4-5 still votes - and we log how often it votes differently from the other two.

02 · PER-EVENT NERF SCORE

Score = kind_weight × severity_multiplier

Each receipt is classified by kind (rate-limit cut, billing model shift, etc.) and severity (critical / major / minor / info). Multiply the two, clamp to 0-100. Every coefficient is in lib/score.ts on GitHub.

KIND WEIGHTS · HIGHER = MORE USER HARM · NEGATIVE = POSITIVE CHANGE
CHANGE KINDWEIGHTWHY
tier-removed+28Bought-thing is gone (Claude Code Apr 2026)
billing-model-shift+26Surprise-bill territory (Cursor Jun 2025)
rate-limit-cut+24Same price, less throughput
free-tier-nerf+22Funnel collapse, trust hit
model-swap+20Silent quality regression
feature-gated+18Used to have it, now upsell
price-increase+16Direct cost increase
tos-change+12Slow-burn rights creep
feature-ungated6Positive: gated thing now free
tier-added4Positive (unless it cannibalizes existing tier)
rate-limit-raise8Positive: more throughput
price-decrease10Positive: direct cost reduction
03 · SEVERITY MULTIPLIERS

How bad is bad?

critical× 2.6Materially reduces paid-user value.
major× 1.8Substantive user-facing change.
minor× 1.0Clarification or copy edit.
info× 0.5Additive without removing existing value.

Bars show relative harm; values multiply kind weight before the 0–100 clamp.

04 · WORKED EXAMPLE · CLAUDE CODE · APR 2026

One real receipt, walked through the rubric.

On April 21, 2026 Anthropic removed Claude Code from the $20 Pro tier, requiring Max ($100+/mo) for continued use. The receipt is here. The Nerf Score came out at 73. Here’s the math.

STEP-BY-STEP · DETERMINISTIC · NO LLM IN THIS LOOP
KINDtier-removed+28bought-thing is gone
×
SEVERITYcritical× 2.6paid-user value lost
=
RAW28 × 2.672.8unclamped product
NERF SCOREclamp · round73/ 100 · critical band
lib/score.ts · eventScore()view on github ↗
export function eventScore(e: Pick<ChangelogEntry, "kind" | "severity">): number {
  // 1. Look up the kind weight (signed integer, may be negative for positive changes)
  const base = KIND_WEIGHTS[e.kind] ?? 8;
  // 2. Look up the severity multiplier (critical = 2.6, info = 0.5, etc.)
  const mult = SEVERITY_MULTIPLIER[e.severity] ?? 1;
  // 3. Multiply, clamp to [-30, 100], max with 0 → only harm goes public
  const raw = base * mult;
  const score = Math.max(-30, Math.min(100, raw));
  return Math.max(0, Math.round(score));
}

The jury verdicts on this receipt - 88 / 92 / 89, all harmful, average 90 - are a separate signal entirely. They don’t enter the score. If you disagree with the kind classification, file a PR. If you disagree with the jury, you can wait - they get Brier-scored against subsequent reality.

05 · PER-VENDOR NERF INDEX

Recency-weighted aggregate, 0-100.

Aggregate of every receipt for a vendor, weighted by recency. Half-life is 180 days - a year-old nerf still counts but at a quarter weight. Mapped onto 0-100 via a logistic squash so a single critical nerf doesn’t max the index; repeated nerfs stack with diminishing returns.

1.00.50.250.00d180d · half-life360d · ¼ weight720dRECENCY WEIGHT · 0.5^(days/180)
HALF-LIFE
180 days
A 6-month-old receipt counts half as much as today’s.
QUARTER-WEIGHT
360 days
A year-old receipt still counts - at 25% of its original score.
INDEX BANDS · WHAT THE NUMBER MEANS
RANGELABELINTERPRETATION
0–4no receiptsVendor has no logged plan changes - either too new to track or unusually stable
5–24cleanA few minor edits, no material harm to users
25–49on watchPattern of cuts emerging - worth watching, not yet escape-velocity
50–74shrinkingRepeated material cuts - users feel the squeeze every renewal
75–100predatorHostile to paid users - migration recommended
06 · PREDICTED IMPACT

How we estimate user impact and financial damage.

Every receipt with a delta gets two predicted-impact numbers: an estimate of the percentage of paid users affected, and a 12-month financial-damage range per affected user. Both are heuristic - derived from kind + severity + the breadth of the affected tier - and explicitly labeled as estimates, not measurements.

Heuristic confidence is graded high / medium / low based on how much the receipt’s kind constrains the impact range. A tier removal at critical severity has a well-bounded financial impact (the cost difference between tiers); a model swap has a fuzzier one (depends on how much you used the swapped model). Confidence pills on each receipt make this explicit.