Methodology

Two years. Three commodities. 1,000 predictions.

We back-tested our options-implied probability methodology against realized spot from 2024-01-02 → 2026-05-14, evaluating five synthetic strikes (95% / 98% / 100% / 102% / 105% of spot) at a 7-day horizon for silver, gold, and oil. Results are out-of-sample — we re-built the IV smile from each day's historical options chain and computed the probability fresh, then resolved it against what spot actually did.

What you want from this chart: monotonicity. A bar that says "predicted 60%" should resolve YES more often than one that says "predicted 30%". The absolute level is informative too — when predicted and realized track the diagonal, our model is well-calibrated; when they diverge, it tells you something about the regime.

1,000 of 1,000 predictions resolved (100.0%). Remainder are recent dates whose 7-day horizon has not yet landed.

Summary by commodity

CommodityPredictionsResolvedYES hit rateMean predictedAvg IV
silver35035060.0%50.4%35.5%
gold28028061.1%51.3%27.5%
oil37037062.4%49.6%37.5%

Calibration plots

Each decile pairs the mean predicted probability against the realized hit rate for predictions that fell in that bucket. Dashed diagonal is perfect calibration.

silver — calibration (350 predictions)
0%25%50%75%100%0-10%n=1210-20%n=3420-30%n=4330-40%n=4940-50%n=3450-60%n=4460-70%n=3970-80%n=4380-90%n=2690-100%n=26PREDICTED PROBABILITY (DECILE)REALIZED HIT RATE
Predicted (mean) Realized— — — Perfect calibration
gold — calibration (280 predictions)
0%25%50%75%100%0-10%n=3510-20%n=3420-30%n=2730-40%n=1040-50%n=1650-60%n=5660-70%n=1170-80%n=1280-90%n=2390-100%n=56PREDICTED PROBABILITY (DECILE)REALIZED HIT RATE
Predicted (mean) Realized— — — Perfect calibration
oil — calibration (370 predictions)
0%25%50%75%100%0-10%n=910-20%n=2920-30%n=6330-40%n=4540-50%n=3450-60%n=4860-70%n=4870-80%n=6180-90%n=3090-100%n=3PREDICTED PROBABILITY (DECILE)REALIZED HIT RATE
Predicted (mean) Realized— — — Perfect calibration

How to read the bias

Silver and gold systematically under-predict in the middle buckets — our 50% says ~66% realized. This is the classic signature of a bull regime. Black-Scholes prices off the risk-neutral drift (r − q); realized drift over the 2024-2026 window was much higher than that, so YES outcomes happened more often than the risk-neutral measure suggested.

Oil is the tightest — it traded sideways over the same window and the model lands almost on the diagonal. That's the cleanest evidence the methodology itself is well-shaped: the bias scales with the underlying's realized drift, not with anything our process is doing wrong.

All three are monotonic across every bucket: a higher predicted decile resolves higher. That's the table-stakes claim — when you see us publish a 70% read on a Kalshi market, you should trust it as a 70% probability under our model, with absolute bias bounded by realized drift.

Methodology disclosure