Polymarket research: opportunities and arbitrage API

Elyra’s Polymarket research pipeline fetches active markets from the Polymarket Gamma API in paginated batches, clusters them by semantic similarity to find related markets, detects mispricing relative to cluster peers, and ranks opportunities by a composite score of liquidity, volume, and probability deviation. The result is a structured report of the top trading opportunities, arbitrage candidates, and mispriced markets — ready for programmatic consumption or terminal output.

CLI usage

Run the pipeline from the command line using main.py with the polymarket command, or invoke the module directly.

# Top 5 opportunities (default)
python3 main.py polymarket

# JSON output
python3 main.py polymarket --json

# Custom parameters
python3 main.py polymarket --top 10 --max-markets 800

# Direct module
python3 -m skills.trade_research.trade_research --top 5 --json

CLI flags

--top

integer

Number of top opportunities and arbitrage rows to return per table. Defaults to 5.

--max-markets

integer

Maximum number of active markets to fetch from the Polymarket Gamma API before analysis begins. Defaults to 600. The API is paged in batches of 200; fetching stops early if fewer markets are returned than the batch size.

--json

flag

Print raw JSON to stdout instead of the formatted Rich table. Pipe this output to jq or any JSON processor for downstream use.

Python usage

Call run_research directly from your own code. It returns the same structured dict that the CLI serialises to JSON.

import asyncio
from skills.trade_research.trade_research import run_research

result = asyncio.run(run_research(max_markets=600, top_n=5))

Parameters

max_markets

integer

Maximum markets to fetch before analysis. Passed through to fetch_all_markets. Defaults to 600.

top_n

integer

Number of rows to include in each output section (top_opportunities and arbitrage). Defaults to 5.

Return value

run_research returns a dict with three top-level keys.

{
  "top_opportunities": [...],
  "arbitrage": [...],
  "mispriced_markets": [...]
}

`top_opportunities`

An array of ranked trading opportunities, sorted by a composite score of liquidity, volume, and probability deviation from cluster peers. If the detector finds fewer scored opportunities than top_n, the remaining slots are filled with the highest-activity markets by log(liquidity) × log(volume).

rank

integer

Position in the ranked list, starting from 1.

market_id

string

Polymarket market identifier (condition ID or numeric ID from the Gamma API).

question

string

Market question text, truncated to 80 characters with a trailing ... if longer.

yes_price

float

Current YES outcome price as a decimal between 0 and 1.

no_price

float

Current NO outcome price as a decimal between 0 and 1.

liquidity

float

Total on-book liquidity in USD.

volume

float

Total traded volume in USD.

reason

string

Pipe-separated list of detection signals that triggered this opportunity, such as prob_diff_vs_cluster=0.18 | low_liq_volume_spike or yes_plus_no=1.04. High-activity fill-ins carry high_liquidity_volume (activity).

score

float

Composite score used for ranking: log(liquidity) / log(max_liquidity) × mispricing × log(volume) / log(max_volume). Higher is better. Fill-in rows score 0.0.

url

string | null

Direct Polymarket event URL (https://polymarket.com/event/{slug}) when a slug is available; null otherwise.

`arbitrage`

An array of arbitrage candidates and watchlist entries, ranked by total implied probability descending. The detector flags any market where YES + NO ≥ 1.01 as a same_market arbitrage. When fewer than top_n structural opportunities exist, the list is padded with same_market_relaxed entries (threshold 1.005), then price_sum_deviation watchlist entries, then high-liquidity leaders.

rank

integer

Position in the ranked list, starting from 1.

type

string

Classification of the entry. One of: same_market, same_market_relaxed, price_sum_deviation, or liquidity_leader.

market_ids

array of strings

Market IDs involved. Single-market entries contain one ID; multi-leg entries list all legs.

questions

array of strings

Question text for each market in market_ids, truncated to 60 characters.

total_probability

float

Sum of YES and NO prices (YES + NO). Values above 1.0 indicate a potential arbitrage; values below indicate a correlated discount.

profit_potential_pct

float

Estimated gross profit as a percentage of capital deployed, calculated as (total_probability − 1.0) × 100. Does not account for trading fees or slippage.

details

string

Human-readable description of the signal, for example YES=0.58 + NO=0.46 = 1.04 or a watchlist note to verify executable prices.

url

string | null

Polymarket event URL for single-market entries; null for multi-leg entries or when no slug is available.

`mispriced_markets`

Markets whose YES price deviates from the mean YES price of their semantic cluster by at least 0.15. These are candidates for mean-reversion trades within a thematic group.

market_id

string

Polymarket market identifier.

question

string

Market question text.

yes_price

float

Current YES price on this market.

cluster_mean_yes

float

Mean YES price across all other markets in the same semantic cluster.

mispricing

float

Absolute deviation |yes_price − cluster_mean_yes|, rounded to 3 decimal places. The detection threshold is 0.15.

Semantic clustering

Before scoring, the pipeline groups markets by topic using cosine similarity on question text. It prefers sentence-transformers/all-MiniLM-L6-v2 and falls back to TF-IDF (via scikit-learn) when the library is unavailable. Markets are merged into a cluster when their pairwise similarity meets the threshold (0.75 for sentence-transformers, 0.60 for the TF-IDF fallback). Clusters with fewer than two members are discarded. Clustering is used to compute cluster_mean_yes for mispricing detection and to populate the mispriced_markets list. It does not affect the arbitrage detector, which operates on individual market price sums.

The first run downloads sentence-transformers/all-MiniLM-L6-v2 from Hugging Face Hub and caches it to .cache/huggingface/ inside your project root. This download is roughly 90 MB and only happens once. To skip it entirely, omit sentence-transformers from your environment; the pipeline will use TF-IDF clustering automatically.

Research Pipeline

Prediction Markets

Polymarket research: opportunities and arbitrage API

CLI usage

CLI flags

Python usage

Parameters

Return value

`top_opportunities`

`arbitrage`

`mispriced_markets`

Semantic clustering

Research Pipeline

Prediction Markets

Documentation Index

​CLI usage

​CLI flags

​Python usage

​Parameters

​Return value

​top_opportunities

​arbitrage

​mispriced_markets

​Semantic clustering

CLI usage

CLI flags

Python usage

Parameters

Return value

`top_opportunities`

`arbitrage`

`mispriced_markets`

Semantic clustering