Measurement Changes Behavior
When you measure something, you change it. People optimize for what's measured. Metrics become targets. The measure decouples from what it was measuring. This is Goodhart's Law—and understanding it is essential for anyone who manages, governs, or evaluates.
"When a measure becomes a target, it ceases to be a good measure." — Charles Goodhart (paraphrased)
This single insight explains enormous amounts of organizational, educational, scientific, and social dysfunction.
The Mechanism
Step 1: You care about X (learning, health, productivity, quality).
Step 2: X is hard to measure directly. You find a proxy M that correlates with X.
Step 3: You incentivize M (rewards, punishments, attention).
Step 4: People optimize for M.
Step 5: Optimization for M breaks the correlation with X. M goes up; X doesn't.
Step 6: You're now rewarding something that doesn't track what you cared about.
Examples
Education
Care about: learning. Measure: test scores. Result: teaching to the test, cramming, narrowed curriculum. Test scores can rise while learning stays flat or declines.
Academia
Care about: knowledge creation. Measure: publications, citations. Result: salami-slicing papers, citation rings, file-drawer problem. Metric gaming flourishes; breakthrough thinking doesn't necessarily follow.
Healthcare
Care about: health outcomes. Measure: procedures performed, patients seen. Result: over-treatment, rushed appointments, avoiding complex cases. Metrics look good; patients don't get healthier.
Software Development
Care about: value delivered. Measure: lines of code, story points, velocity. Result: verbose code, inflated estimates, gaming sprint metrics. Metrics go up; product quality doesn't.
Social Media
Care about: user value. Measure: engagement, time-on-site. Result: outrage optimization, addiction loops, content that engages but doesn't satisfy. Engagement skyrockets; wellbeing declines.
Why It's Unavoidable
Goodhart effects aren't bugs in implementation—they're inherent to measurement-based management.
You can't directly observe what you care about. If you could, you wouldn't need proxies. But proxies can be gamed. And when you incentivize them, they will be.
The only question is severity. Some proxies are more robust than others. Some incentive structures create weaker gaming pressure. But the dynamic is always present.
Mitigation Strategies
Multiple Metrics
Harder to game several metrics simultaneously than one. Multi-dimensional evaluation is more robust. But complexity increases.
Rotate Metrics
Change what's measured periodically. Prevents entrenchment of gaming strategies. But loses comparability over time.
Measure Outcomes, Not Proxies
Get closer to what you actually care about. Direct outcomes are harder to game than indirect proxies. But they're often lagging and hard to attribute.
Accept Judgment
Some evaluation requires human judgment rather than metrics. Costly and inconsistent, but less gameable. Appropriate for complex, high-stakes assessments.
Lower Stakes
When metrics are lower-stakes, gaming pressure is lower. Information without high-powered incentives can still inform decisions.
The Meta-Problem
Attempts to fix Goodhart effects often create new ones.
You add metrics to prevent gaming → people game the combination.
You add monitoring → people game the monitoring.
You add judgment → people game the judges.
This doesn't mean all attempts are futile—some measurement systems work better than others. But it means there's no clean solution. Every system will be gamed. The question is how badly.
Implications
- Don't worship metrics. They're tools, not truth. All metrics are gameable.
- Watch for decoupling. When metrics go up but underlying reality doesn't improve, you're seeing Goodhart.
- Design for robustness. Multiple metrics, outcome focus, judgment integration.
- Expect gaming. It's not deviance—it's response to incentives. Build assuming it happens.
How I Decoded This
Synthesized from: economics (Goodhart's original context was monetary policy), organizational behavior, education research, science studies. Cross-verified: identical pattern appears across every domain where metrics drive incentives. The mechanism is universal.
— Decoded by DECODER