WealthJot.ai

Data Quality & Adjustments

intermediate7 min read

Splits, dividends, bonus issues and bad ticks — why dirty data quietly ruins backtests.

A backtestTesting a trading strategy on historical data. is only as trustworthy as the data underneath it — and raw market data is full of traps that silently corrupt results. “Garbage in, garbage out” applies brutally here, because dirty data often produces plausible-looking (but fake) results rather than obvious errors.

The most dangerous data problems are the ones that look like real price moves but aren’t — and corporate actionsA company event that affects its shares. are the biggest culprit. When a stock does a 1:1 split or a 1:1 bonus, its price halves overnight — not because it fell, but because there are now twice as many sharesA unit of ownership in a company.. If your data isn’t adjusted, your backtestTesting a trading strategy on historical data. sees a fake −50% “crash,” triggers stops, and books an imaginary loss. The same goes for dividendsA cash payout of company profits to shareholders., bonuses and consolidations. You must use adjusted price data that smooths these mechanical changes out. Add to that bad ticks (erroneous prints), missing data, and survivorship-biased datasets (that quietly dropped delisted companies), and you have several ways for clean-looking numbers to be quietly fictional. Auditing your data — adjusted, complete, point-in-time, including the dead companies — is unglamorous but decides whether your backtestTesting a trading strategy on historical data. is real.
  • Corporate actionsA company event that affects its shares. — splits, bonuses and consolidations mechanically change price; use adjusted data or your test sees phantom crashes/spikes.
  • DividendsA cash payout of company profits to shareholders. — total-return vs price-only data changes results; be consistent and explicit about which you use.
  • Bad ticks & gaps — erroneous prints and missing bars can trigger fake signals; clean and sanity-check the data.
  • Survivorship — datasets that exclude delisted/dead companies inflate results (covered in the bias module); include the graveyard.
ExampleA stock at ₹2,000 does a 1:1 bonus and opens at ₹1,000. Unadjusted, your backtestTesting a trading strategy on historical data. reads a −50% single-day collapse, fires every stopA pre-set exit that caps your loss if a trade goes wrong., and records a catastrophic loss that never happened — shareholders were unaffected. With split/bonus-adjusted data, the series shows continuity and the “crash” vanishes. The data, not the strategy, was the problem.
Key takeawayDirty data produces plausible-but-fake results. Use adjusted prices (so splits/bonuses/dividendsA cash payout of company profits to shareholders. don’t look like real moves), clean bad ticks and gaps, be explicit about total-return vs price data, and include delisted companies. Auditing data quality is unglamorous but decides whether a backtestTesting a trading strategy on historical data. is real.
FAQs
Where do data problems most often hide?

In corporate-action adjustments (phantom crashes from splits/bonuses), in survivorship-biased datasets (silently missing dead companies), and in subtle point-in-time errors (restated fundamentals). These rarely throw obvious errors — they produce *believable* numbers, which is exactly why they’re dangerous. Always verify your data source handles adjustments and includes delisted names.