Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getelyra.xyz/llms.txt

Use this file to discover all available pages before exploring further.

Elyra’s Polymarket research pipeline fetches active markets from the Polymarket Gamma API in paginated batches, clusters them by semantic similarity to find related markets, detects mispricing relative to cluster peers, and ranks opportunities by a composite score of liquidity, volume, and probability deviation. The result is a structured report of the top trading opportunities, arbitrage candidates, and mispriced markets — ready for programmatic consumption or terminal output.

CLI usage

Run the pipeline from the command line using main.py with the polymarket command, or invoke the module directly.
# Top 5 opportunities (default)
python3 main.py polymarket

# JSON output
python3 main.py polymarket --json

# Custom parameters
python3 main.py polymarket --top 10 --max-markets 800

# Direct module
python3 -m skills.trade_research.trade_research --top 5 --json

CLI flags

--top
integer
Number of top opportunities and arbitrage rows to return per table. Defaults to 5.
--max-markets
integer
Maximum number of active markets to fetch from the Polymarket Gamma API before analysis begins. Defaults to 600. The API is paged in batches of 200; fetching stops early if fewer markets are returned than the batch size.
--json
flag
Print raw JSON to stdout instead of the formatted Rich table. Pipe this output to jq or any JSON processor for downstream use.

Python usage

Call run_research directly from your own code. It returns the same structured dict that the CLI serialises to JSON.
import asyncio
from skills.trade_research.trade_research import run_research

result = asyncio.run(run_research(max_markets=600, top_n=5))

Parameters

max_markets
integer
Maximum markets to fetch before analysis. Passed through to fetch_all_markets. Defaults to 600.
top_n
integer
Number of rows to include in each output section (top_opportunities and arbitrage). Defaults to 5.

Return value

run_research returns a dict with three top-level keys.
{
  "top_opportunities": [...],
  "arbitrage": [...],
  "mispriced_markets": [...]
}

top_opportunities

An array of ranked trading opportunities, sorted by a composite score of liquidity, volume, and probability deviation from cluster peers. If the detector finds fewer scored opportunities than top_n, the remaining slots are filled with the highest-activity markets by log(liquidity) × log(volume).
rank
integer
Position in the ranked list, starting from 1.
market_id
string
Polymarket market identifier (condition ID or numeric ID from the Gamma API).
question
string
Market question text, truncated to 80 characters with a trailing ... if longer.
yes_price
float
Current YES outcome price as a decimal between 0 and 1.
no_price
float
Current NO outcome price as a decimal between 0 and 1.
liquidity
float
Total on-book liquidity in USD.
volume
float
Total traded volume in USD.
reason
string
Pipe-separated list of detection signals that triggered this opportunity, such as prob_diff_vs_cluster=0.18 | low_liq_volume_spike or yes_plus_no=1.04. High-activity fill-ins carry high_liquidity_volume (activity).
score
float
Composite score used for ranking: log(liquidity) / log(max_liquidity) × mispricing × log(volume) / log(max_volume). Higher is better. Fill-in rows score 0.0.
url
string | null
Direct Polymarket event URL (https://polymarket.com/event/{slug}) when a slug is available; null otherwise.

arbitrage

An array of arbitrage candidates and watchlist entries, ranked by total implied probability descending. The detector flags any market where YES + NO ≥ 1.01 as a same_market arbitrage. When fewer than top_n structural opportunities exist, the list is padded with same_market_relaxed entries (threshold 1.005), then price_sum_deviation watchlist entries, then high-liquidity leaders.
rank
integer
Position in the ranked list, starting from 1.
type
string
Classification of the entry. One of: same_market, same_market_relaxed, price_sum_deviation, or liquidity_leader.
market_ids
array of strings
Market IDs involved. Single-market entries contain one ID; multi-leg entries list all legs.
questions
array of strings
Question text for each market in market_ids, truncated to 60 characters.
total_probability
float
Sum of YES and NO prices (YES + NO). Values above 1.0 indicate a potential arbitrage; values below indicate a correlated discount.
profit_potential_pct
float
Estimated gross profit as a percentage of capital deployed, calculated as (total_probability − 1.0) × 100. Does not account for trading fees or slippage.
details
string
Human-readable description of the signal, for example YES=0.58 + NO=0.46 = 1.04 or a watchlist note to verify executable prices.
url
string | null
Polymarket event URL for single-market entries; null for multi-leg entries or when no slug is available.

mispriced_markets

Markets whose YES price deviates from the mean YES price of their semantic cluster by at least 0.15. These are candidates for mean-reversion trades within a thematic group.
market_id
string
Polymarket market identifier.
question
string
Market question text.
yes_price
float
Current YES price on this market.
cluster_mean_yes
float
Mean YES price across all other markets in the same semantic cluster.
mispricing
float
Absolute deviation |yes_price − cluster_mean_yes|, rounded to 3 decimal places. The detection threshold is 0.15.

Semantic clustering

Before scoring, the pipeline groups markets by topic using cosine similarity on question text. It prefers sentence-transformers/all-MiniLM-L6-v2 and falls back to TF-IDF (via scikit-learn) when the library is unavailable. Markets are merged into a cluster when their pairwise similarity meets the threshold (0.75 for sentence-transformers, 0.60 for the TF-IDF fallback). Clusters with fewer than two members are discarded. Clustering is used to compute cluster_mean_yes for mispricing detection and to populate the mispriced_markets list. It does not affect the arbitrage detector, which operates on individual market price sums.
The first run downloads sentence-transformers/all-MiniLM-L6-v2 from Hugging Face Hub and caches it to .cache/huggingface/ inside your project root. This download is roughly 90 MB and only happens once. To skip it entirely, omit sentence-transformers from your environment; the pipeline will use TF-IDF clustering automatically.