2.63 The Normalized Moving-Average Difference

Short MA minus long MA swings tenfold between calm and chaos. Divide by ATR times the square root of the bar-distance between the two averages' centers, lag the long one, compress the tails, and the trend gap finally holds still.

2.63 The Normalized Moving-Average Difference

Subtract a long moving average from a short one and you get the oldest trend indicator there is, the bones of the MACD and a hundred crossover systems. You also get a number whose variance swings by an order of magnitude between calm and volatile periods, which makes it nearly worthless as a model feature in its raw form. The old article "Why ATR Normalization Is More Than a Volatility Trick" showed that the right denominator is not a cosmetic rescale but a structural match to how the numerator was generated. The moving-average difference is where that idea pays off most cleanly, because the noise contaminating it comes from a random walk whose variance you can write down.

Why the raw difference is unusable

The short MA minus the long MA measures how far recent price has pulled away from its slower baseline. The trouble is the size of that gap depends on the volatility regime far more than on the trend. In a quiet market the two averages hug each other and the difference is tiny; in a violent one the same trend strength produces a difference five or ten times larger. Feed that into a model and the feature spends most of its variance encoding which year it is, not what the trend is doing, which is the non-stationarity the old ATR article was built to kill.

There is a second, subtler source of noise. The two moving averages have centers of mass at different points in the past, and the price has to random-walk from one center to the other. That walk injects error variance into the difference that has nothing to do with trend, and crucially, the amount of it scales with the distance between the two centers.

Normalize by the random walk that contaminates you

The variance of a random walk's net change from start to finish equals the number of steps times the per-step variance. So if you know how many bars separate the center of the long MA from the center of the short MA, you know how much walk-noise is baked into the difference, and the square root of that distance is the right multiplier on a per-bar volatility unit to cancel it. ATR is that per-bar unit, the structurally correct one from the old ATR article because it lives in the instrument's own price units and scales with the same noise the numerator does.

$$ \text{NormDiff} = \frac{\text{SMA}_{\text{short}} - \text{SMA}_{\text{long}}}{\text{ATR}\,\sqrt{\big(0.5(L-1) + \text{lag}\big) - 0.5(S-1)}} $$

Read the denominator carefully because every piece earns its place. The terms L and S are the long and short lookbacks, and the center of mass of a simple moving average of length k sits at 0.5(k-1) bars back. The expression inside the square root is the distance between the two centers: the long MA's center, pushed back by its lag term, minus the short MA's center. ATR supplies the per-bar volatility, and the square root of the inter-center distance scales it to the exact amount of random-walk noise the difference carries. Divide by that product and the result has roughly constant variance across regimes, which is the entire goal.

Lag the long average so the two windows do not overlap

The lag term in that formula is the third parameter, and it is the part most people leave at zero. By default both moving averages include the current bar as their most recent point, so the long MA is partly tracking the same recent movement the short MA is built to capture. The two windows overlap, and the difference double-counts the present. Lag the long MA backward by an amount equal to the short MA's lookback and the windows become contiguous but non-overlapping: the short MA covers the recent stretch, the long MA covers the stretch before it, and the difference cleanly compares now against before.

$$ \text{MADiff} = 100 \cdot \Phi\big(1 + c\,\text{NormDiff}\big) - 50 $$

The last step tames the tails. Even after volatility normalization a huge price swing can throw the difference into a fat-tailed extreme, so push it through the normal cumulative distribution function, written here as the capital phi. The constant c sets how hard you compress, the normal CDF maps the bell-shaped NormDiff onto a bounded range while leaving its near-linear middle mostly intact, and the final scaling recenters the output around zero. This mirrors the construction logic of the old CMMA article almost exactly: log-clean the numerator, divide by ATR times the square root of the relevant lookback distance to cancel both volatility regime and lookback dependence, then optionally squash the tails. CMMA does it for close-minus-average; this does it for short-average-minus-long-average.

What you traded away

The normalized difference is a stationary, cross-instrument-comparable trend feature, and the normalization is not free. Dividing by ATR times the root-distance removes the volatility regime and the lookback dependence, but it also strips the indicator of any sense of absolute move size, so a two-ATR pull in a calm market and a two-ATR pull in chaos read the same, which is the point and also a real information loss you accepted. The lag parameter, the compression constant c, and the two lookbacks are all knobs, and every knob is a chance to fit noise, so set them from the timescale you intend to trade rather than from whatever maximizes the backtest. The ATR window must be much longer than the lookbacks and must exclude the current bar, or you reintroduce the lookahead the old ATR article warned about and inflate every metric that follows. A beautifully normalized difference is still a trend feature that has to prove predictive value on your instrument before it is worth the trouble.

KEY POINTS

  • Short MA minus long MA is the classic trend gap, but its raw variance swings tenfold between calm and volatile regimes, so the feature encodes the era instead of the trend.
  • A second noise source is the random walk price takes between the two MA centers; its variance equals the bar-distance between the centers, so the square root of that distance is the correct multiplier on ATR.
  • Divide the difference by ATR times the square root of the inter-center distance to get a roughly variance-stationary measure, using ATR as the structurally correct per-bar denominator from the old ATR-normalization article.
  • Lag the long MA backward by the short lookback so the two windows are contiguous but non-overlapping, which stops the difference from double-counting the current bar.
  • Compress the result with the normal CDF to bound the tails and recenter at zero. This is the same build pattern as the old CMMA article: clean numerator, cancel volatility and lookback with ATR times a root-length, then squash tails.
  • Normalization discards absolute move size; the lag, compression, and lookbacks are tunable knobs to set from the trade timescale, and the ATR window must exclude the current bar to avoid lookahead.

References