Skip to content
Chimera readability score 55 out of 100, Graduate reading level.

If you’re reading this blog post, you’re likely familiar with the pull toward more metrics. As organizations grow, so too does the list of things people want to measure. Different metrics matter for different teams, and everyone has Metrics FOMO, worried that leaving one out could prevent us from reaching our Next Big Insight.
At Discord, this happened with our Default Metric List: a set of metrics that are automatically included in every experiment. Over time, that default list grew as teams added metrics they cared about, while few were removed. We took a step back and asked if we might be better off measuring less.
To data teams, suggesting we measure less feels like heresy. “Our job is to measure! Why would we, the organization’s shrewdest pattern finders, knowingly leave data on the table?” The encounter below might look familiar:
This urge is real, but having too many metrics brings a new set of issues. Beyond higher compute costs and a harder time navigating experiment readouts, having more metrics highlights an inherent tradeoff:
- Leaving p-values as-is has the potential for too many false positives. For example, if you have 100 metrics and set a 5% p-value threshold for statistical significance, 5 of your metrics are going to be statistically significant just by random chance.
- Adjusting p-values using a multiple hypothesis correction can result in fewer false positives, but worse recall in detecting real changes. In this situation, ”Recall” is defined as the proportion of true positives that we catch.
In this article, we explore our journey to address this issue and show that there is no One Fancy Statistical Method™️ to get around this. The best solution is to use fewer, high-quality metrics that capture distinct concepts.

Facts Only

* Organizations experience a pull toward measuring more metrics.
* Teams often worry that omitting a metric will prevent reaching a "Next Big Insight" (Metrics FOMO).
* A system (Discord) defaults to a list of metrics included in every experiment.
* Having many metrics increases compute costs.
* More metrics make navigating experiment readouts harder.
* Leaving p-values as-is can lead to too many false positives across a large number of metrics.
* Adjusting p-values with multiple hypothesis correction can reduce false positives but decrease recall in detecting real changes.
* The suggested solution is to use fewer, high-quality metrics capturing distinct concepts.

Executive Summary

Organizations face a tension between the desire to measure numerous metrics and the practical limitations they impose. The phenomenon of Metrics FOMO drives teams to include a growing list of metrics, which increases compute costs and complicates the navigation of experiment readouts. This complexity introduces statistical trade-offs, such as the risk of false positives when using standard p-value thresholds, and the potential loss of statistical recall when applying multiple hypothesis corrections. The article suggests that instead of seeking a single statistical solution, the optimal approach is to measure fewer, high-quality metrics that capture distinct concepts. This shift aims to improve the reliability of insights by focusing on conceptual distinction rather than sheer volume of data points.

Full Take

The narrative describes a systemic conflict between organizational pressure for comprehensive measurement and the statistical reality of trade-offs. The argument positions the complexity of metrics not as a neutral data description, but as an active impediment to accurate insight. The system implicitly rewards quantity over conceptual quality, framing the desire for more metrics as a pursuit of knowledge rather than a management challenge. This structure exploits the fear of being left behind (FOMO) to justify an inefficient process.
The core pattern is the devaluation of statistical fidelity in favor of data volume. The concern over false positives and recall highlights a critical failure in the current paradigm: that statistical methods are treated as plug-and-play tools rather than components requiring careful calibration based on the goals of the observation. The suggestion to use fewer, high-quality metrics is a call for cognitive sovereignty—a shift from an engineering focus on data aggregation to a principled focus on conceptual distillation.
The implication is that the true cost of increased measurement is not just computational, but epistemic: sacrificing the precision required to detect true change for the sake of tracking more superficial indicators. The pressure to adopt a "One Fancy Statistical Method" ignores the necessity of defining what constitutes a meaningful, distinct concept before applying any statistical correction. The question remains: if quality metrics require less data, how can organizational mandates prioritize insight integrity over mere data volume?

Sentinel — Human

Confidence

The text exhibits a strong, opinionated voice rooted in practical experience, skillfully blending organizational anxiety with precise statistical concepts, indicating human authorship.

Signals Detected
low severity: Sentence length variance is natural and varied; the flow reflects conversational argumentation rather than uniform rhythm.
low severity: The argument flows logically from a problem statement (FOMO) to specific statistical challenges (p-values) to a proposed solution (fewer, high-quality metrics). The tone is specific and engaged.
low severity: The text uses specific examples (Discord metrics) and addresses a specific, technical domain (statistical methodology) with appropriate depth, suggesting specialized knowledge or direct experience.
low severity: No overtly fabricated claims or suspiciously smooth attribution. The text references known statistical concepts and frames a real organizational dilemma.
Human Indicators
Use of direct, experiential framing ('our journey,' 'we took a step back').
The introduction of nuanced statistical trade-offs (recall vs. false positives) grounded in practical organizational consequences.
The voice contains a specific, internal tension between organizational drive and statistical rigor.