5. Microstructure Alpha June 2, 2026 5 min

5.16 Fill Probability from Trade Size CDFs

A quote's PnL is easy; its fill probability is the hard part. Build a CDF of taker order sizes and read off the chance the next trade is big enough to clear the queue ahead of you plus your own size.

A market maker's quote has two unknowns: how much you make if it fills, and how likely it is to fill. The first is easy, it is the distance from your quote to the fair value. The second is the hard one, and getting it wrong is what makes a desk quote too wide and never trade, or too tight and barely profit. The clean way to estimate fill probability uses the distribution of taker order sizes, the cumulative distribution function of how big incoming market orders tend to be, read against how much liquidity sits in front of your order.

This article builds the fill-probability estimate that the EV-maximization quoting model depends on. It pairs with the improved market making strategy that optimizes expected PnL times probability of fill.

The quote is an expected-value problem

Quoting is maximizing profit per unit time, which is expected value. The profit side is simple: your PnL on a fill is the distance from your quote to the mid (or wherever you can exit), minus fees.

$$ \text{PnL}_{\text{buy}} = \text{distance}_{\text{buy}} - \text{fee}, \qquad \text{EV} = \text{PnL} \cdot P[\text{filled}] $$

The PnL on a buy quote is its distance from fair value minus the fee, and the expected value of the quote is that PnL times the probability it fills. Quote too wide and the PnL per fill is large but the fill probability collapses to near zero. Quote too tight and you fill constantly but the PnL per fill is thin. The EV product peaks somewhere in between, and finding that peak requires the fill probability, the only hard term in the expression.

Estimating fill probability from the size CDF

The fill-probability estimate runs on the statistics of taker order sizes. The procedure has a few steps. Collect historical trades data over some window, an hour works. Build the cumulative distribution function of taker order sizes from that data, which tells you the probability that an incoming market order is at least any given size. Then read your fill probability off the CDF: the probability that the next trade is large enough to clear all the liquidity resting between the touch and your order, plus your own order size.

$$ P[\text{fill on next trade}] = P[\text{taker size} \geq q_{\text{ahead}} + q_{\text{own}}] $$

Your order fills on the next trade if that trade's size is at least the quantity resting ahead of you in the queue plus your own order size, and the CDF gives that probability directly. The model assumes one big order wipes out everything in front of you and fills you in the same sweep, which is the simplifying assumption that makes the estimate tractable. The deeper you place, the more liquidity sits ahead of you, the larger the order needed to reach you, and the lower the probability the CDF returns.

From one-trade probability to time

Fill probability per trade is not yet what the EV needs, because EV is per unit time. Convert it. Count the total number of takers per minute, which gives the arrival rate of trades. With the per-trade fill probability from the CDF and the trade arrival rate, compute the average number of trades you must wait for before one is large enough to fill you, then convert that count of trades into an expected wait time using the arrival rate. Now you have an expected time to fill, which turns the EV into profit per unit time and lets you compare quotes at different depths on the same footing.

The full quoting model optimizes the PnL times fill probability on both sides of the book independently, a buy PnL with a buy fill probability and a sell PnL with a sell fill probability, since the quotes are not symmetric once skew enters.

The moving-target caveat

The estimate rests on assumptions that drift. The mid price wanders, so your distance to the mid constantly changes, which means your fill probability is not static, it moves as the price moves relative to your resting order, and a snapshot estimate goes stale. The one-big-order assumption overstates fills, since real fills often come from several smaller trades chipping at the queue rather than a single sweep, so the CDF read is an approximation. And the taker size distribution itself is non-stationary, clustering and spiking, so the hour of history you fit the CDF on may not describe the next hour. You can build an order book simulator from these statistical properties to stress the estimate, but the fill probability remains a model of a moving target, useful for placing quotes and wrong if you trust it as exact.

Visualizing fill probability

KEY POINTS

A quote has two unknowns: PnL if filled (easy, the distance to fair value minus fees) and probability of fill (hard). The EV is their product, and it peaks between quoting too wide and too tight.
Estimate fill probability from taker order sizes: collect an hour of trades, build the CDF of taker sizes, and read off the probability that the next trade is at least the liquidity ahead of you plus your own size.
The model assumes one big order wipes out everything in front of you and fills you in the same sweep. Deeper placement means more liquidity ahead, a larger order needed, and lower fill probability.
Convert per-trade probability to time: use the taker arrival rate (takers per minute) to get the average number of trades to wait, then the expected time to fill, so EV becomes profit per unit time.
The full model optimizes PnL times fill probability on each side independently, since skew makes the quotes asymmetric.
Caveats: the mid wanders so fill probability is not static; the one-big-order assumption overstates fills; and the taker size distribution is non-stationary, so an hour of history may not describe the next hour.

References

A note on AI. The ideas, research, analysis, and conclusions in this article are my own. I use AI tools to help with editing and wordsmithing, because English is not my first language, and I am not shy about that. AI-generated ideas and AI-assisted writing are not the same thing: the first is empty slop from a generic prompt, the second is a tool for communicating years of real research more clearly. Judge the work by its substance, not by whether software helped polish the prose.