The Multiple-Comparisons Trap
Test 1,000 strategies and a few look great by pure luck. Why a low p-value is not enough.
The multiple-comparisons trap is one of the most underappreciated ways quant traders fool themselves: if you test enough strategies, some willArranging how your wealth passes on after death. look brilliant purely by chance — even with no real edgeA repeatable, structural reason your trades win over time. at all. It’s overfitting’s sneaky cousin, hiding in the search process itself.
- The trap — test enough strategies and some look great by luck alone; you then keep the lucky winner and forget the rest.
- Why p-values mislead — “significant” for one test is meaningless for the best of thousands (you selected for luck).
- Data-mining danger — brute-forcing millions of combinations reliably finds flukes that fail live.
- The fix — fewer tests (hypothesis-first), correct for the number of trials, and validate the winner out-of-sample.
Isn’t running lots of backtests how you find good strategies?
Exploration is fine, but *unconstrained* searching invites the multiple-comparisons trap — the more you try, the more luck contaminates your “best.” Far better to start from a *hypothesis* with an economic reason, test a *small* number of variations, and reserve out-of-sample data to validate. If you must search broadly, account for the number of trials and treat any winner with heavy skepticism.