Stream 04
Polymarket Arbitrage
LLM probability vs market price. We act when the gap is wide.
How it works
Two probabilities. One edge.
Every two hours we pull every Polymarket contract trading with at least $10,000 of 24-hour volume. For each contract we ask Copilot for an independent probability estimate, with a written reasoning trace and an explicit confidence band. The model has no access to the live market price during inference.
We compute the absolute gap between the model's probability and the market price. If the gap is at least fifteen percentage points and the market is liquid enough to absorb a $500 ticket without moving the mid by more than 1%, we open a position. The side is whichever is mispriced — we take YES when the model is high and the market is low, NO when the inverse holds.
Position size scales with liquidity, capped at the per-ticket max. Every trade carries a written rationale captured from the model output. The desk reviews the rationale before the order is sent — there is no fully automatic execution on prediction markets in v1.
Live opportunities · latest scan
Where the gap is wide.
| Market | Side | Price | Copilot prob | Edge |
|---|---|---|---|---|
| Strait of Hormuz traffic returns to normal by May 15 | YES | $0.0660 | 65% | +58pp |
| PSG win 2025-26 Champions League | NO | $0.5750 | 15% | -42pp |
| LoL Worlds: WB defeats WE | NO | $0.9995 | 55% | -45pp |
| Internazionali BNL d'Italia — set winner | NO | $0.9995 | 55% | -45pp |
Why this works (and why it might not)
Real edge, real limits.
Polymarket prices reflect real money. They are not poll numbers and they are not opinion. That is the basis for the edge — when a contract is mispriced, the mispricing is reachable. But the books are thinner than equity venues, and the tickets that move the price are smaller than the tickets that move a stock. Capacity is the binding constraint, not signal quality.
The other limit is the model. Large language models are biased toward consensus — they are trained on the world's recent prose, and recent prose tends to repeat the consensus view. On fast-moving events, the model is stale by hours or days. We compensate by re-prompting after major news drops and by ignoring opportunities where the model's reasoning trace cites information older than 24 hours on a fast-developing market.
The edge is real. It is also capacity-limited and it is also model-limited. Both limits are features for us — they are why the edge is not arbed away by a hedge fund with a billion dollars and a Bloomberg terminal.
Risk frame