Data Quality & Adjustments

intermediate7 min read

Splits, dividends, bonus issues and bad ticks — why dirty data quietly ruins backtests.

A backtest is only as trustworthy as the data underneath it — and raw market data is full of traps that silently corrupt results. “Garbage in, garbage out” applies brutally here, because dirty data often produces plausible-looking (but fake) results rather than obvious errors.

The most dangerous data problems are the ones that look like real price moves but aren’t — and corporate actions are the biggest culprit. When a stock does a 1:1 split or a 1:1 bonus, its price halves overnight — not because it fell, but because there are now twice as many shares. If your data isn’t adjusted, your backtest sees a fake −50% “crash,” triggers stops, and books an imaginary loss. The same goes for dividends, bonuses and consolidations. You must use adjusted price data that smooths these mechanical changes out. Add to that bad ticks (erroneous prints), missing data, and survivorship-biased datasets (that quietly dropped delisted companies), and you have several ways for clean-looking numbers to be quietly fictional. Auditing your data — adjusted, complete, point-in-time, including the dead companies — is unglamorous but decides whether your backtest is real.

Corporate actions — splits, bonuses and consolidations mechanically change price; use adjusted data or your test sees phantom crashes/spikes.
Dividends — total-return vs price-only data changes results; be consistent and explicit about which you use.
Bad ticks & gaps — erroneous prints and missing bars can trigger fake signals; clean and sanity-check the data.
Survivorship — datasets that exclude delisted/dead companies inflate results (covered in the bias module); include the graveyard.

ExampleA stock at ₹2,000 does a 1:1 bonus and opens at ₹1,000. Unadjusted, your backtest reads a −50% single-day collapse, fires every stop, and records a catastrophic loss that never happened — shareholders were unaffected. With split/bonus-adjusted data, the series shows continuity and the “crash” vanishes. The data, not the strategy, was the problem.

Key takeawayDirty data produces plausible-but-fake results. Use adjusted prices (so splits/bonuses/dividends don’t look like real moves), clean bad ticks and gaps, be explicit about total-return vs price data, and include delisted companies. Auditing data quality is unglamorous but decides whether a backtest is real.

FAQs

Where do data problems most often hide?

In corporate-action adjustments (phantom crashes from splits/bonuses), in survivorship-biased datasets (silently missing dead companies), and in subtle point-in-time errors (restated fundamentals). These rarely throw obvious errors — they produce *believable* numbers, which is exactly why they’re dangerous. Always verify your data source handles adjustments and includes delisted names.