The indicator sets the ceiling that no model can break through. A linear regression on a high-quality indicator beats a deep neural network on a low-quality one. Most R&D effort is spent on the model, where the marginal returns are smallest. The bigger gains live in the inputs.
A garbage indicator has four structural defects: non-stationary distribution, heavy tails, clumped values, or lookback artifacts. The model treats your input as the truth and propagates the defect into the forecast. Diagnose the indicator before you train anything.
Raw indicators rarely satisfy the geometric assumptions a model needs: stable scale, spread distribution, bounded tails. Six transforms cover most repairs. Each fixes a specific defect, costs a hyperparameter, and risks lookahead if computed non-causally.
Relative entropy as a quality score is the cheapest single-number test for whether an indicator uses the range it lives on. The catch: a well-shaped histogram of pure noise scores as well as a well-shaped histogram of signal.
R/IQR is the ratio of total range to interquartile range. The denominator is anchored to the body of the distribution. The numerator follows the tails. The ratio is the only honest tail measurement on data where the standard deviation is already contaminated by the tails it is supposed to describe.
R/IQR detects stretched distributions but says nothing about whether the stretch carries the signal. On market data the stretch usually carries it. The Tail Concentration Ratio splits per-decile mutual information and tells you whether the tails are noise to squash or signal to preserve.
scanning 41 RSI thresholds and reporting the best one inflates the naive p-value by an order of magnitude. The right test shuffles the target, re-runs the full threshold scan thousands of times, and compares the observed best statistic to the distribution of best statistics from noise.
Long-side and short-side threshold scans on the same indicator are two hypotheses, not one. Equity drift, return skew, and conditional-distribution asymmetry break the mirror.
Raw price is non-stationary in mean, non-stationary in variance, and incomparable across instruments. A model trained on SPX from 1990 to 2010 sees 71% of the 2010 to 2026 test rows outside its training support. The in-sample AUC of 0.582 collapses to 0.498 live.
Six transforms turn non-stationary prices into stationary indicators, log returns to forced centering. Pick the lightest passing ADF, coverage, rolling-variance. ATR-norm 20d momentum wins on SPX.
ATR captures within-bar and between-bar movement in the instrument's own units. On SPX 20d momentum, 2020/2017 std ratio drops 5.0 raw → 1.05 ATR-normalized, no MI loss. Structural, not heuristic.
CMMA fixes Close − MA(k) with log prices, prev-bar MA, ATR with current bar, √(k+1) divisor. On SPX, test/train std ratio drops 5.25 → 1.03; MI lifts 1.4 → 2.0. Pool-compatible.
The histogram is the primary diagnostic. Scalars (R/IQR, RE, MI, TCR) confuse shapes needing different fixes. Identical scalars hide light-tail, heavy-tail, or bimodal — only bimodal needs a split.
A wrong-scale sigmoid destroys the signal. Center on training median; scale α so IQR lands in the linear region (α≈1.2 for tanh). On SPX Range/Close, α=1.2 lifts AUC 0.503 → 0.519.
Mean breakdown 1/n; median 0.5. On SPX 100-bar center through March 2020, mean spikes 5σ; median moves 0.4. Median for centering, MAD/IQR for spread, Spearman for correlation, LAD for regression.
Feature engineering contributes 90-95% of the edge; model selection 5-10%. Same XGBoost: raw OHLC → SPX AUC 0.502; full statistical pipeline → 0.524. Features are IP. The model is a commodity.
Every linear filter is y(t)=Σ b_k x(t−k)−Σ a_k y(t−k). H(z) has only negative powers, no z⁺¹. Filters summarize the past; they don't extrapolate. SMA(50) on next-day return: R²≈0.0002.
Every MA pays a lag tax. Symmetric FIR length N has lag (N−1)/2. On SPX, 200-day SMA captures 18% of a 10% move; 50-day captures 56%. WMA buys lag with phase distortion. Structural trade.
SMA's sinc response has the worst sidelobes of any common smoother: −13.3 dB (22% leakage) regardless of N. Same lag, Hann leaks 2.6%, Blackman 0.12%. Critical period ~2N. Use Hann or Blackman.
EMA is the 1-pole IIR low-pass: y_t = α x_t + (1−α) y_{t−1}. One state, one parameter, no sidelobes. EMA wins on O(1) state and composability — the building block for HPF, BPF, AGC, decycler.
Each pole adds 6 dB/oct rolloff and one bar of lag at critical period T. 1-pole EMA: 6 dB/oct. 2-pole and super-smoother: 12 dB/oct. The super-smoother is critically damped with clean step response.
HPF output = input − LPF. Each pole adds 6 dB/octave rolloff and one bar of lag. The 2-pole HPF at critical period T is the cleanest detrender for daily mean-reversion features.
The band-pass rejects both trend and noise, keeping only the cycle band. Direct second-order form has two parameters (T, δ). On SPX at T = 20, δ = 0.3, BPF MI is 1.9 vs 1.2 for close-minus-MA.
The decycler extracts trend by subtracting the cycle, not by smoothing. On SPX, decycler at T=30 cuts EMA(50)'s 18% cycle leakage to 4% with less lag. Specify the cycle to cancel, not a lookback.
At a turning point, the MA reports the prior regime as the present. EMA(50) signaled the SPX March 2020 bottom 27 bars late; cycle-mode detectors fired in 2-5 bars. Structural lag, not a tuning bug.
Every linear indicator is a filter with a unique frequency response. RSI(14), MACD(12,26), Stochastic(14) read the same 15-40 bar cycle band three different ways. The confluence is redundancy.
Every linear indicator has known lag. Sum cascades, compare to time-to-half-move. EMA+RSI on 5-bar mean reversion: 17-bar lag, structurally broken. Audit lag before backtesting, not after deployment.
Linear filters integrate every bar including outliers. Median filters select the typical value. For volume, true range, and tick data, median (or Hampel) is the default; linear smoothers come after.
Indicators inherit the input's amplitude. BPF on SPX swings ±0.4% in 2017 and ±3.1% in March 2020. AGC rescales continuously so thresholds become regime-independent. The fix is one pipeline stage.
The dominant cycle is measurable. Autocorrelation periodogram gives a number with known precision. Elliott Wave, Gann, Bradley dates produce no falsifiable measurement. Numbers feed adaptive logic.
Market cycles exist but are evanescent: period drifts, amplitude decays, phase loses coherence. SPX cycle ranged 8 to 28 bars across six regimes in seven years. Gate cycle-mode strategies by regime.
Every indicator is a filter: a machine that reshapes price. Learn signal, frequency, lag, and the four filter jobs once, and the whole cycles literature stops being a wall.
Write any linear indicator as a transfer function H(z), a ratio of two short polynomials, and its full behavior reads out: gain at every cycle, lag in bars, and the poles that make it ring.
The Butterworth is the maximally flat low-pass: slow cycles pass with no ripple, each pole cuts noise harder. The bill is lag, about one EMA's worth per pole, so two poles is the sweet spot.
The sinc is the exact brick-wall low-pass: full pass below the cutoff, nothing above. But it runs infinitely long and needs future bars, so you build only a truncated, windowed version that rings.
A fixed EMA smooths the same in calm and chaos. The adaptive EMA ties alpha to cycle speed: track tight when clean, smooth hard when noisy, at most one bar lag, if you can estimate omega.
The zero-lag EMA borrows the Kalman predict step: smooth Price + kappa*velocity, so the slope cancels lag in trends. The same guess overshoots reversals, so it is no turn detector.
Estimate slope from a 4-point cubic with skipped taps and the modified EMA turns faster and lags less. But a derivative is no low-pass: the output is not smooth, and that is the real trade.
A wavelet is a band-pass whose width scales, so one Mexican Hat (the 2nd derivative of a Gaussian) scans every cycle at once. Compact support makes it usable; the resolution tradeoff makes it honest.
A two-bar RSI screams on every shock. Regress it on a slow 20-bar RSI, inverse-logistic its U-shaped distribution first, and trade the residual: how far price strayed from where the trend says it belongs.
Model price as one local sine and solve its frequency from four bars: difference out the offset, ratio out the amplitude, read cos(omega). Low lag, high variance, and it lies when no cycle exists.
The raw stochastic lurches every time an old high or low drops out of its window, noise from a denominator that keeps redefining itself. Smooth it once or twice, center at 50, and the same trick fixes the StochRSI.
Model price as a local sine and its velocity and acceleration are exact derivatives, with no differencing lag. Acceleration flags a swing running out of gas early, but rides on a fragile sine fit.
Short MA minus long MA swings tenfold between calm and chaos. Divide by ATR times the square root of the bar-distance between the two averages' centers, lag the long one, compress the tails, and the trend gap finally holds still.
Momentum is price now minus price N-1 bars ago, a high-pass filter: it kills the trend, peaks on a passband set by N, goes blind at nulls, and lags by (N-1)*omega/2. Not a strength gauge.
Most indicators keep the close and toss the rest of the bar. Price intensity reads open-to-close travel against the true range, a pure intrabar conviction gauge, and the volume-free version is a clean mean-reversion feature once smoothed and normalized.
Design an indicator by stating the phase (lag) you want, then solving for the filter. Causality couples magnitude to phase, so you move lag around the spectrum but never delete it. Zero-lag, buried.
The ADX is not a price indicator, it is a ratio of ATR-normalized range expansions with two hidden smoothing stages. Build it right and you get a stationary trend-strength filter, but respect the lag: it stays high after sharp trends die.
Every indicator is one of two families: non-recursive (FIR: inputs only, stable, lag = degree/2) or recursive (IIR: eats its own output, cheap, sharp, can ring). The family predicts the failures.
Aroon ignores price size and measures time: how long since the last new high or low. The Up-minus-Down difference is a bounded, stationary, cross-instrument oscillator for free, but it reads timing, not magnitude.
EMA, low-pass, high-pass, band-pass, band-stop are one second-order equation with different coefficients, built from the period via alpha. One engine and a recipe table, not a drawer of indicators.
Fit a trend, project it one bar forward, and subtract it from the actual close. A small gap means the trend holds; a big one means it just broke. A regime-break kill switch, built on log prices and ATR, not raw price.
Sample once per bar and the fastest cycle you can see is 2 bars (Nyquist). Anything faster aliases: it folds down and masquerades as a slow cycle that was never there, and no filter can unmask it.
Short-term movement minus a long-term baseline, normalized by ATR, reads volatility expansion against contraction. But normalize too hard and you delete the very regime signal you wanted: the stationarity-vs-information trade-off, with a dial.
The SMA is the optimal least-squares fit of a flat line to the window: the intercept is the average. But that assumes the market is constant plus noise, so it lags trends and erases matched cycles.
Reactivity weights momentum by an aspect ratio, range per unit of volume, so a big move on thin volume scores high. Powerful for trading cycles, but it multiplies two noisy parts, so normalize both halves or it lies.
A filter's critical period is its half-power point: amplitude 0.707, not 0.5. For an SMA it's twice the length, so SMA(50) targets a 100-bar cycle. Pick length on purpose; the lag comes attached.
Raw Intraday Intensity whipsaws and drifts upward for years as volume grows, useless. Divide summed money flow by summed volume and you get Chaikin Money Flow: bounded, stationary, model-ready.
The WMA ramps weights toward recent bars, so its coefficients are asymmetric, breaking linear phase and warping shape. It lags more than the SMA at equal noise rejection; a real window beats both.
OBV's running sum wanders like a random walk and depends on when you started counting. Window it, divide signed volume by total volume, scale by root-lookback, and you get a bounded, stationary flow oscillator.
Subtract a long-cutoff high-pass from a short-cutoff one and you get the decycler oscillator, a band-pass whose zero crossings flag trend transitions. Use it to classify the regime, not as a trigger.
The volume-weighted MA ratio asks where the crowd transacted, not which way it pushed: VWMA over SMA, above one when expensive bars carried the volume. Log it, scale by root-lookback, compress, and it's model-ready.
Q is the band-pass narrowness dial: center frequency over half-power bandwidth (30% gives Q~3.33). High Q isolates a pure cycle but rings and lags; low Q is fast but blurry. Pick Q from the job.
Volume Momentum ignores price and asks one thing: is the tape hotter than its own baseline. Short volume over long volume, logged and CDF-squashed into a bounded regime gauge that tells you if your signals have fuel.
Run price through a band-pass, count bars between zero crossings, double it: that's the dominant cycle, at a few bars' lag. But widen the passband, or a narrow filter measures its own tuned period.
Markets have a color: the slope of their log power spectrum. White means no memory, brownian means random walk, pink means long memory. One number tells you whether price remembers its past.
Oscillators drift and whipsaw because raw price feeds them trend and noise at once. The roofing filter, a high-pass plus a SuperSmoother, strips both and passes only the tradeable cycle band first.
Long cycles are bigger, not just slower, so a raw spectrum reports them as dominant by default. Across several octaves you must compensate for this amplitude tilt or your cycle estimate lies.
The Fourier transform demands an infinite, stationary, whole-cycle window that price never provides. The autocorrelation periodogram builds the spectrum from correlation instead, with less lag and no amplitude tilt.
At a real reversal the autocorrelation flips at every lag at once, not just one. Sum the bar-to-bar changes across all lags and the spike flags the turn, no cycle period required.
Stop hardcoding RSI(14). Set the lookback to half the measured dominant cycle, feed it a band-pass input, and recompute every bar. Tuned to a persistent cycle it can even lead, but only in cycle mode.
Convert the swing to a clean sine wave, then advance its phase to read the cycle a few bars early. Genuine prediction, valid only in cycle mode; in a trend it fires confident false reversals, so gate it hard.
A market turn is a reflection in time, so fold price about a candidate bar and correlate the halves. A bright, scale-persistent symmetry stripe marks the reversal, at the cost of honest, unavoidable confirmation lag.
Pair price with its quarter-cycle shift to read instantaneous amplitude and phase. The textbook Hilbert transformer lags forever; the modified version is accurate only across the cycle band, which is the band you trade.
Most indicators tell you which way price went. Fit a line to the window with Legendre polynomials and divide the slope by the within-window scatter, and you get a t-statistic for trend: direction plus how much to trust it.
The clean linear factor model fails because its weights are not constants. Momentum and carry only pay in the right vol regime, and interactions are the one thing a linear model cannot represent.
Relabel every long as a short and a return model should flip its sign. Linear models get this free through their weights; a tree relearns each split's mirror, so engineer the symmetry back.
One tree is fragile: a small data change flips its top split. Bagging averages many trees to cut variance; boosting fits each new tree to the residual to cut bias. XGBoost and LightGBM are boosting.
A decision tree mines conditional alphas by carving feature space into boxes. It picks each split to maximize a similarity gain and predicts the mean return per leaf. Stop early or it memorizes noise.
Pick the model by interaction strength: ridge when near-additive, XGBoost when interactions dominate. XGBoost also handles NaNs natively, avoiding imputation lookahead; else use an IC-weighted ensemble.
Replace the volatility term with volume in most alphas and the backtest barely moves, because they ride the same information clock. Feeding a model both is double-counting one factor. Keep one scale, and add their ratio, Amihud illiquidity, as the residual that actually carries new information.
OLS is the best unbiased fit only under assumptions markets shatter, so on correlated alphas it hands you wild coefficients. Add an L2 or L1 penalty: biased, lower variance, steadier out of sample.