Features: Garbage In, Garbage Out

advanced7 min read

The model is only as good as what you feed it. Building features that carry real signal.

In ML, “features” are the input variables you feed the model (momentum, valuation ratios, volatility, etc.). Feature engineering — choosing and constructing good features — is where most of the real value is created, far more than the choice of fancy algorithm.

The iron law is garbage in, garbage out: a model can only find signal that’s actually present in its inputs — no algorithm, however sophisticated, conjures predictive power from features that don’t contain it. This flips the beginner’s instinct: people obsess over the model (deep learning! gradient boosting!) when the leverage is overwhelmingly in the features. Great features encode real economic/behavioural signal (the factors from Module 6 make excellent features) and are constructed with point-in-time discipline (no look-ahead leaking the future into a feature). Bad features — noisy, redundant, or subtly forward-looking — guarantee a useless or deceptively-good-then-failing model. The practical wisdom: spend your effort building a small set of meaningful, leak-free features grounded in why markets move, and a simple model will usually beat a fancy model fed mediocre inputs. The model is a lens; features are the light. No light, no picture.

Features matter most — value lives in what you feed the model, far more than the algorithm choice.
Encode real signal — good features capture economic/behavioural drivers (factors make great features).
Point-in-time discipline — features must use only data available at the time (no look-ahead leakage, Module 3).
Less is more — a few meaningful, leak-free features beat hundreds of noisy/redundant ones (and reduce overfitting).

ExampleFeed a model purely random or look-ahead-contaminated features and even the best algorithm produces noise (or fake-great-then-failing results). Feed a simpler model a handful of well-constructed, economically-grounded features — momentum, quality, valuation, all point-in-time correct — and it can extract genuine, if modest, predictive value. The difference was the inputs, not the model.

Key takeawayGarbage in, garbage out: a model only finds signal that’s in its features — no algorithm creates predictive power from inputs that lack it. Value lives in feature engineering (encode real economic signal, point-in-time, few not many), not in fancy algorithms. Good features + simple model beats mediocre features + complex model.

FAQs

Should I throw hundreds of features at the model and let it sort them out?

No — that invites overfitting and noise-fitting, especially in markets. Prefer a *small set* of meaningful, economically-grounded, leak-free features. More features mean more ways to fit noise and more chances of hidden look-ahead. Disciplined feature selection beats brute-force feature dumping, which is a classic route to a great-backtest-then-live-failure ML model.