In-Sample vs Out-of-Sample

intermediate7 min read

Build on one slice of history, test on a slice you never saw. The discipline that catches self-deception.

The single most important discipline in backtesting is splitting your history into two parts: *in-sample data (which you use to build and tune the strategy) and out-of-sample data (which you set aside and test on only after* the strategy is finalised, never having seen it).

Out-of-sample testing is your defence against fooling yourself — it’s the trading equivalent of grading a student on questions they didn’t see while studying. If you build and judge a strategy on the same data, of course it looks great: you (knowingly or not) shaped it to fit that exact history. The honest question is never “does it fit the past I built it on?” but “does it work on data it has never seen?” A strategy that shines in-sample but falls apart out-of-sample was overfitted — it memorised noise, not signal. This split is what separates a discovery from a delusion. The iron rule: touch the out-of-sample data only once, at the very end — every time you peek and re-tweak, you contaminate it, turning your “test” back into more fitting.

In-sample — the data you build, tune and optimise the strategy on; results here are expected to look good.
Out-of-sample — held-back data the strategy never saw; the only honest test of whether the edge is real.
The iron rule — test out-of-sample once, at the end; repeated peeking-and-tweaking contaminates it into in-sample.

ExampleYou build a strategy on 2010–2018 data (in-sample) and it returns 22% CAGR. The real test: run it untouched on 2019–2023 (out-of-sample). If it still returns ~18%, you likely have a genuine edge. If it collapses to −5%, you overfitted 2010–2018’s noise. Only the data you didn’t build on could tell you.

Key takeawaySplit history into in-sample (build/tune) and out-of-sample (test only, never seen). A real edge survives on data it never saw; one that shines in-sample but dies out-of-sample was overfitted. Touch the out-of-sample set once, at the end — peeking and re-tweaking destroys its value.

FAQs

What if my strategy fails out-of-sample — can I just adjust it?

If you re-tune based on out-of-sample results, that data is now *in-sample* (you’ve fitted to it), and you no longer have an honest test. The disciplined response is to go back to the drawing board with a *new* hypothesis and reserve a *fresh* untouched slice — or use walk-forward analysis (later module), which formalises repeated honest testing. Don’t quietly fit to your “test” set.