9.10 Trillions of Outcomes, 200 Constraints: Reading Tournament Rules as Assert Statements

A bracket has 9.2 quintillion outcomes and you can describe the legal ones with about 200 rules. Read them as assert statements, hand the solver a direction, and it returns one valid outcome at a time.

9.10 Trillions of Outcomes, 200 Constraints: Reading Tournament Rules as Assert Statements

An NCAA bracket has about 9.2 quintillion possible outcomes. You cannot store that list, cannot loop over it, cannot project onto a shape whose corners you are unable to write down. The article "The Marginal Polytope" left you with exactly this wall: the right geometric object, and no way to touch it. The escape is to stop listing outcomes and start describing them with rules. A few hundred linear inequalities can pin down which of the quintillion combinations are legal, and a solver can then hand you valid outcomes on demand without ever enumerating the set. If you have written a function full of assert statements, you already understand the trick.

Describe the set, do not list it

The valid payoff set is a collection of ones-and-zeros vectors. Instead of storing them, write down the conditions every valid vector must satisfy.

$$ Z = \{\, z \in \{0,1\}^{|\mathcal{I}|} : A^\top z \ge b \,\} $$

Read it as a membership rule, not a formula to evaluate. The z is a candidate outcome, a vector of ones and zeros over the securities. The matrix A and vector b encode the logical dependencies; the inequality is a stack of linear rules. Any z that satisfies all of them is a valid outcome, and any z that violates one is impossible. The whole exponential set is now the solution set of a small system of inequalities. A solver like Gurobi searches that system directly and never builds the list.

The compression this buys is not marginal.

Market Number of outcomes Constraints Compression
Duke vs Cornell (bracket slice) 16,384 3 about 5,461 to 1
NCAA tournament about 9.2 quintillion around 200 astronomical
2024 US election 17,218 conditions around 1,500 dependency pairs very large

Three rules stand in for 16,384 combinations. Around 200 rules stand in for 9.2 quintillion. The market looks exponential and the constraint structure collapses it to something polynomial, and that collapse is what makes real-time arbitrage detection possible at all.

The worked case: two teams, three lines of logic

Take a bracket slice, Duke against Cornell. Each team has seven securities, one for each possible number of wins from 0 through 6. Fourteen binary variables, so a naive space of 2 to the 14, which is 16,384 combinations. Three constraints carve out the legal ones.

$$ \sum_{k=0}^{6} z_{\text{Duke},k} = 1, \qquad \sum_{k=0}^{6} z_{\text{Cornell},k} = 1, \qquad z_{\text{Duke},5} + z_{\text{Duke},6} + z_{\text{Cornell},5} + z_{\text{Cornell},6} \le 1 $$

The first says Duke finishes with exactly one win total, some single number between 0 and 6. The second says the same for Cornell. The third says both teams cannot reach five or more wins, because to do that they would have to meet in the semifinals and one must lose. Read those three the way you would read code.

assert sum(duke_wins[0:7]) == 1        # Duke lands on exactly one win total
assert sum(cornell_wins[0:7]) == 1     # Cornell lands on exactly one win total
assert duke_wins[5] + duke_wins[6] + cornell_wins[5] + cornell_wins[6] <= 1
                                       # at most one of them reaches the semifinals

Any assignment of the fourteen booleans that passes all three assertions is a legal tournament outcome. Any assignment that trips one is a fantasy the market cannot pay out on. The solver's job is to search the passing assignments efficiently, and it does that without trying all 16,384. You wrote three lines of logic and eliminated 16,381 combinations you never had to look at.

A vocabulary of dependencies

Most real market relationships come from a short list of constraint shapes, and learning the four covers the overwhelming majority of cases.

Dependency Constraint Example
Subset / implication z_A less than or equal to z_B "margin greater than 5" implies "candidate wins"
Mutual exclusion z_A plus z_B less than or equal to 1 two teams cannot both win the final
Partition sum of z_k equals 1 exactly one outcome per group
Conditional z_A less than or equal to z_B "wins championship" implies "reaches the final four"

Subset and conditional share the same algebra, an implication written as an inequality: A can only be 1 if B is also 1. Mutual exclusion caps a pair at one. Partition forces exactly one member of a group to fire. Encode a market's rulebook with these four and you have described a polytope with quintillions of corners in a page of inequalities. The article "Outcome Geometry" showed why the valid set collapses below the naive cube; this is the collapse written in a form a solver can read.

The one thing the solver actually needs

The projection machinery from the article "Arbitrage Is Just Projection" does not need the full set Z. It needs one function, called repeatedly. Give it a cost vector, and it returns the single cheapest valid outcome under that cost.

$$ \text{oracle}(c) = \arg\min_{z \,:\, A^\top z \ge b}\; c \cdot z $$

The c is a direction handed in by the projection algorithm. The oracle returns the valid outcome z that scores lowest against it. That is a single integer linear program, one solve, and a modern solver returns one optimal corner without walking the rest. This is the entire interface between the geometry and the computation: a black box you query with a direction and that answers with a corner. The article "Frank-Wolfe: Solving a Quintillion-Vertex Problem in 100 Steps" is built on nothing but repeated calls to this oracle, roughly a hundred of them, to find a near-optimal trade over a set with quintillions of members.

Solve times are not uniform, and the pattern is worth knowing. Early in a tournament, with few games settled, a call runs on the order of a second. Mid-tournament, with 30 to 40 games decided, seconds to tens of seconds. Late, with 50-plus settled, a few seconds or less. The acceleration is not luck: every settled game locks a variable, which shrinks the free space the solver searches. The problem gets easier as the world resolves, which is convenient, because that is also when prices move fastest.

Where this helps and where it does not

The compression is genuine and it is the reason any of this runs in production. It also does nothing for the two problems that come next.

It does not make the projection converge on a log-scoring market. The oracle answers questions; it does not run the search. The search is Frank-Wolfe, and on a log market maker the standard version fails to converge because the gradient blows up at the boundary, which is why the article "Why Your Arbitrage Solver Crashes at 99 Cents" exists. Encoding the set correctly and still watching the solver diverge is a real and common failure.

And it does nothing for execution. A perfectly encoded, perfectly solved trade is a set of target orders, not filled positions. The 5,461-to-1 compression buys you the ability to find the trade in time. Whether the second leg is still there when your first fills, whether the fees leave anything behind, whether two faster bots already took it, all live in the article "Execution Is Part of Expected Value," and that is where most of the theoretical profit from a correctly encoded market quietly disappears.

KEY POINTS

  • Do not list outcomes, describe them. The valid set is the solution set of a small system of linear inequalities, A-transpose z greater than or equal to b, and a solver finds valid outcomes on demand without enumerating the exponential list.
  • Three constraints replace 16,384 combinations in the Duke-Cornell slice; around 200 replace 9.2 quintillion for a full NCAA bracket. The market looks exponential and the constraint structure collapses it to polynomial size.
  • Read the constraints as assert statements: each valid outcome is a boolean assignment that passes every assertion, and the solver searches only the passing assignments.
  • Four dependency shapes cover most markets: implication (z_A at most z_B), mutual exclusion (z_A plus z_B at most 1), partition (a group sums to 1), and conditional (same algebra as implication).
  • The projection needs one thing from all this: a linear-minimization oracle that takes a direction and returns the single cheapest valid outcome, one integer-program solve per call. Frank-Wolfe is roughly a hundred such calls.
  • Encoding solves the size problem, not the numerical one or the execution one. The solver still diverges on log-market boundaries without the barrier fix, and a solved trade is target orders, not filled positions.

References