> ## Documentation Index > Fetch the complete documentation index at: https://docs.getelyra.xyz/llms.txt > Use this file to discover all available pages before exploring further. # Polymarket research: opportunities and arbitrage API > Scan up to 800 active Polymarket prediction markets, cluster them semantically, score by edge, and surface the top N by expected value. Elyra's Polymarket research pipeline fetches active markets from the Polymarket Gamma API in paginated batches, clusters them by semantic similarity to find related markets, detects mispricing relative to cluster peers, and ranks opportunities by a composite score of liquidity, volume, and probability deviation. The result is a structured report of the top trading opportunities, arbitrage candidates, and mispriced markets — ready for programmatic consumption or terminal output. ## CLI usage Run the pipeline from the command line using `main.py` with the `polymarket` command, or invoke the module directly. ```bash theme={null} # Top 5 opportunities (default) python3 main.py polymarket # JSON output python3 main.py polymarket --json # Custom parameters python3 main.py polymarket --top 10 --max-markets 800 # Direct module python3 -m skills.trade_research.trade_research --top 5 --json ``` ### CLI flags Number of top opportunities and arbitrage rows to return per table. Defaults to `5`. Maximum number of active markets to fetch from the Polymarket Gamma API before analysis begins. Defaults to `600`. The API is paged in batches of 200; fetching stops early if fewer markets are returned than the batch size. Print raw JSON to stdout instead of the formatted Rich table. Pipe this output to `jq` or any JSON processor for downstream use. ## Python usage Call `run_research` directly from your own code. It returns the same structured dict that the CLI serialises to JSON. ```python theme={null} import asyncio from skills.trade_research.trade_research import run_research result = asyncio.run(run_research(max_markets=600, top_n=5)) ``` ### Parameters Maximum markets to fetch before analysis. Passed through to `fetch_all_markets`. Defaults to `600`. Number of rows to include in each output section (`top_opportunities` and `arbitrage`). Defaults to `5`. ## Return value `run_research` returns a `dict` with three top-level keys. ```json theme={null} { "top_opportunities": [...], "arbitrage": [...], "mispriced_markets": [...] } ``` ### `top_opportunities` An array of ranked trading opportunities, sorted by a composite score of liquidity, volume, and probability deviation from cluster peers. If the detector finds fewer scored opportunities than `top_n`, the remaining slots are filled with the highest-activity markets by `log(liquidity) × log(volume)`. Position in the ranked list, starting from `1`. Polymarket market identifier (condition ID or numeric ID from the Gamma API). Market question text, truncated to 80 characters with a trailing `...` if longer. Current YES outcome price as a decimal between `0` and `1`. Current NO outcome price as a decimal between `0` and `1`. Total on-book liquidity in USD. Total traded volume in USD. Pipe-separated list of detection signals that triggered this opportunity, such as `prob_diff_vs_cluster=0.18 | low_liq_volume_spike` or `yes_plus_no=1.04`. High-activity fill-ins carry `high_liquidity_volume (activity)`. Composite score used for ranking: `log(liquidity) / log(max_liquidity) × mispricing × log(volume) / log(max_volume)`. Higher is better. Fill-in rows score `0.0`. Direct Polymarket event URL (`https://polymarket.com/event/{slug}`) when a slug is available; `null` otherwise. ### `arbitrage` An array of arbitrage candidates and watchlist entries, ranked by total implied probability descending. The detector flags any market where `YES + NO ≥ 1.01` as a `same_market` arbitrage. When fewer than `top_n` structural opportunities exist, the list is padded with `same_market_relaxed` entries (threshold `1.005`), then `price_sum_deviation` watchlist entries, then high-liquidity leaders. Position in the ranked list, starting from `1`. Classification of the entry. One of: `same_market`, `same_market_relaxed`, `price_sum_deviation`, or `liquidity_leader`. Market IDs involved. Single-market entries contain one ID; multi-leg entries list all legs. Question text for each market in `market_ids`, truncated to 60 characters. Sum of YES and NO prices (`YES + NO`). Values above `1.0` indicate a potential arbitrage; values below indicate a correlated discount. Estimated gross profit as a percentage of capital deployed, calculated as `(total_probability − 1.0) × 100`. Does not account for trading fees or slippage. Human-readable description of the signal, for example `YES=0.58 + NO=0.46 = 1.04` or a watchlist note to verify executable prices. Polymarket event URL for single-market entries; `null` for multi-leg entries or when no slug is available. ### `mispriced_markets` Markets whose YES price deviates from the mean YES price of their semantic cluster by at least `0.15`. These are candidates for mean-reversion trades within a thematic group. Polymarket market identifier. Market question text. Current YES price on this market. Mean YES price across all other markets in the same semantic cluster. Absolute deviation `|yes_price − cluster_mean_yes|`, rounded to 3 decimal places. The detection threshold is `0.15`. ## Semantic clustering Before scoring, the pipeline groups markets by topic using cosine similarity on question text. It prefers `sentence-transformers/all-MiniLM-L6-v2` and falls back to TF-IDF (via scikit-learn) when the library is unavailable. Markets are merged into a cluster when their pairwise similarity meets the threshold (`0.75` for sentence-transformers, `0.60` for the TF-IDF fallback). Clusters with fewer than two members are discarded. Clustering is used to compute `cluster_mean_yes` for mispricing detection and to populate the `mispriced_markets` list. It does not affect the arbitrage detector, which operates on individual market price sums. The first run downloads `sentence-transformers/all-MiniLM-L6-v2` from Hugging Face Hub and caches it to `.cache/huggingface/` inside your project root. This download is roughly 90 MB and only happens once. To skip it entirely, omit `sentence-transformers` from your environment; the pipeline will use TF-IDF clustering automatically.