Elyra’s Polymarket research pipeline fetches active markets from the Polymarket Gamma API in paginated batches, clusters them by semantic similarity to find related markets, detects mispricing relative to cluster peers, and ranks opportunities by a composite score of liquidity, volume, and probability deviation. The result is a structured report of the top trading opportunities, arbitrage candidates, and mispriced markets — ready for programmatic consumption or terminal output.Documentation Index
Fetch the complete documentation index at: https://docs.getelyra.xyz/llms.txt
Use this file to discover all available pages before exploring further.
CLI usage
Run the pipeline from the command line usingmain.py with the polymarket command, or invoke the module directly.
CLI flags
Number of top opportunities and arbitrage rows to return per table. Defaults to
5.Maximum number of active markets to fetch from the Polymarket Gamma API before analysis begins. Defaults to
600. The API is paged in batches of 200; fetching stops early if fewer markets are returned than the batch size.Print raw JSON to stdout instead of the formatted Rich table. Pipe this output to
jq or any JSON processor for downstream use.Python usage
Callrun_research directly from your own code. It returns the same structured dict that the CLI serialises to JSON.
Parameters
Maximum markets to fetch before analysis. Passed through to
fetch_all_markets. Defaults to 600.Number of rows to include in each output section (
top_opportunities and arbitrage). Defaults to 5.Return value
run_research returns a dict with three top-level keys.
top_opportunities
An array of ranked trading opportunities, sorted by a composite score of liquidity, volume, and probability deviation from cluster peers. If the detector finds fewer scored opportunities than top_n, the remaining slots are filled with the highest-activity markets by log(liquidity) × log(volume).
Position in the ranked list, starting from
1.Polymarket market identifier (condition ID or numeric ID from the Gamma API).
Market question text, truncated to 80 characters with a trailing
... if longer.Current YES outcome price as a decimal between
0 and 1.Current NO outcome price as a decimal between
0 and 1.Total on-book liquidity in USD.
Total traded volume in USD.
Pipe-separated list of detection signals that triggered this opportunity, such as
prob_diff_vs_cluster=0.18 | low_liq_volume_spike or yes_plus_no=1.04. High-activity fill-ins carry high_liquidity_volume (activity).Composite score used for ranking:
log(liquidity) / log(max_liquidity) × mispricing × log(volume) / log(max_volume). Higher is better. Fill-in rows score 0.0.Direct Polymarket event URL (
https://polymarket.com/event/{slug}) when a slug is available; null otherwise.arbitrage
An array of arbitrage candidates and watchlist entries, ranked by total implied probability descending. The detector flags any market where YES + NO ≥ 1.01 as a same_market arbitrage. When fewer than top_n structural opportunities exist, the list is padded with same_market_relaxed entries (threshold 1.005), then price_sum_deviation watchlist entries, then high-liquidity leaders.
Position in the ranked list, starting from
1.Classification of the entry. One of:
same_market, same_market_relaxed, price_sum_deviation, or liquidity_leader.Market IDs involved. Single-market entries contain one ID; multi-leg entries list all legs.
Question text for each market in
market_ids, truncated to 60 characters.Sum of YES and NO prices (
YES + NO). Values above 1.0 indicate a potential arbitrage; values below indicate a correlated discount.Estimated gross profit as a percentage of capital deployed, calculated as
(total_probability − 1.0) × 100. Does not account for trading fees or slippage.Human-readable description of the signal, for example
YES=0.58 + NO=0.46 = 1.04 or a watchlist note to verify executable prices.Polymarket event URL for single-market entries;
null for multi-leg entries or when no slug is available.mispriced_markets
Markets whose YES price deviates from the mean YES price of their semantic cluster by at least 0.15. These are candidates for mean-reversion trades within a thematic group.
Polymarket market identifier.
Market question text.
Current YES price on this market.
Mean YES price across all other markets in the same semantic cluster.
Absolute deviation
|yes_price − cluster_mean_yes|, rounded to 3 decimal places. The detection threshold is 0.15.Semantic clustering
Before scoring, the pipeline groups markets by topic using cosine similarity on question text. It preferssentence-transformers/all-MiniLM-L6-v2 and falls back to TF-IDF (via scikit-learn) when the library is unavailable. Markets are merged into a cluster when their pairwise similarity meets the threshold (0.75 for sentence-transformers, 0.60 for the TF-IDF fallback). Clusters with fewer than two members are discarded.
Clustering is used to compute cluster_mean_yes for mispricing detection and to populate the mispriced_markets list. It does not affect the arbitrage detector, which operates on individual market price sums.
The first run downloads
sentence-transformers/all-MiniLM-L6-v2 from Hugging Face Hub and caches it to .cache/huggingface/ inside your project root. This download is roughly 90 MB and only happens once. To skip it entirely, omit sentence-transformers from your environment; the pipeline will use TF-IDF clustering automatically.