Forecasts & accuracy

Probabilistic 30/60/90-day risk forecasts per AI vendor. We publish them here. We Brier-score them against subsequent reality. We publish that too.

A model that forecasts and never grades itself is a guru. A model that forecasts and grades itself in public is a forecaster. We are the latter.

Live forecasts (sorted by 30-day risk)

Vendor	Band	30d	60d	90d	Most likely	Severity
GitHub Copilot	on watch	73%	93%	98%	billing-model-shift	major
ChatGPT	on watch	67%	89%	96%	billing-model-shift	major
Cursor	clean	59%	83%	93%	billing-model-shift	critical
Claude (Anthropic)	on watch	31%	52%	67%	feature-gated	major
Claude Code	shrinking	25%	44%	58%	tier-removed	critical

Public scoreboard

forecasts published

forecasts resolved

awaiting outcomes

—

brier score

needs ≥10 resolutions

—

calibration

needs ≥30 resolutions

We've just published our first 20 forecasts (one per tracked vendor). As prediction windows close, we'll grade each forecast against what actually happened. The board updates automatically. No cherry-picking. We will publish the wrong calls in red.

How forecasts work

Probability of any tracked change landing in the next 30/60/90 days, per vendor.
v0 model: exponential rate from observed cadence with a recency multiplier (1.3 if last receipt was within 30 days).
Capped at 95/97/98% to leave room for upside surprise — no certainty claims.
Likely change kind = mode of observed kinds. Expected severity = severity-weighted average.
v1 model (planned): gradient boost over funding events, exec hires, ToS commit velocity, GitHub repo activity, hiring spikes.
Source code: lib/forecast.ts. PRs welcome.