{"id":770,"date":"2026-01-08T11:09:25","date_gmt":"2026-01-08T11:09:25","guid":{"rendered":"https:\/\/maskproxy.io\/blog\/?p=770"},"modified":"2026-01-08T11:10:02","modified_gmt":"2026-01-08T11:10:02","slug":"scrape-ebay-proxies","status":"publish","type":"post","link":"https:\/\/maskproxy.io\/blog\/scrape-ebay-proxies\/","title":{"rendered":"eBay Proxies for Scraping eBay in 2026 with Stable Results"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">If your eBay scraping keeps hitting <strong>403<\/strong>, <strong>captchas<\/strong>, <strong>missing price or variants<\/strong>, or <strong>slow runs<\/strong>, it\u2019s usually caused by one of four things: request pace is too aggressive, rotation is too jumpy, exits are mismatched to the task, or you\u2019re parsing a challenge page instead of real content. Start with a small benchmark, lock a steady cadence, rotate on a schedule, and validate field completeness for every page type.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>The shortest working route<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Confirm you\u2019re only scraping <strong>public pages<\/strong> and not stepping into login walls or personal data.<\/li>\n\n\n\n<li>Benchmark <strong>20\u201350 URLs<\/strong> across product, search, category, and seller pages.<\/li>\n\n\n\n<li>Choose proxy type based on the job (stable monitoring vs bulk discovery).<\/li>\n\n\n\n<li>Start slow per exit and scale only when block rate stays low.<\/li>\n\n\n\n<li>Troubleshoot in a fixed order: rate \u2192 headers \u2192 exit quality \u2192 parsing.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">If you\u2019re setting up a proxy plan specifically for eBay, this overview of <a href=\"https:\/\/maskproxy.io\/ebay-proxy.html\">eBay Proxies<\/a> aligns well with how the page types behave in practice.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Decide what to scrape on eBay so the data is actually usable<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Treat eBay as four different scraping tasks. Each one needs slightly different pacing and validation.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Product pages<\/strong>: title, current price, shipping, condition, and variant logic<\/li>\n\n\n\n<li><strong>Search results<\/strong>: discovery at scale, pagination, URL harvesting, ranking order<\/li>\n\n\n\n<li><strong>Category pages<\/strong>: broad inventory coverage, more stable \u201cfeed-like\u201d discovery<\/li>\n\n\n\n<li><strong>Seller pages<\/strong>: seller-level summaries and inventory sampling without personal detail collection<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Before you scale anything, confirm you\u2019re not scraping restricted areas. A quick reference for how robots rules work is the robots standard explanation at <a href=\"https:\/\/www.robotstxt.org\/robotstxt.html\" target=\"_blank\" rel=\"noopener\">robotstxt.org<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: list the fields you truly need for your use case (price monitoring, inventory discovery, competitor tracking) and remove anything you do not need.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Choose proxies without wasting money by matching the task<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Buying proxies is a decision problem, not a feature checklist. \u201cMore IPs\u201d won\u2019t fix a bad rhythm, and \u201cfast rotation\u201d often increases captchas.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Proxy type comparison for scraping eBay<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Proxy type<\/th><th>Speed<\/th><th>Stability<\/th><th>Typical cost<\/th><th>Block risk<\/th><th>Best for<\/th><\/tr><\/thead><tbody><tr><td>Datacenter<\/td><td>High<\/td><td>Medium<\/td><td>Low<\/td><td>Higher<\/td><td>tiny tests, very low-frequency checks<\/td><\/tr><tr><td>Residential<\/td><td>Medium<\/td><td>High<\/td><td>Medium-high<\/td><td>Lower<\/td><td>stable long-running scraping and monitoring<\/td><\/tr><tr><td>ISP<\/td><td>Medium-high<\/td><td>High<\/td><td>Medium-high<\/td><td>Lower<\/td><td>stable scraping with better speed consistency<\/td><\/tr><tr><td>Rotating residential<\/td><td>Medium<\/td><td>Medium-high<\/td><td>High<\/td><td>Medium<\/td><td>multi-region discovery with controlled rotation<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">For beginners who want stable completion rates on product pages, many teams start with a residential pool such as <a href=\"https:\/\/maskproxy.io\/residential-proxies.html\">Residential Proxies<\/a> and only scale after the benchmark proves the cadence is safe. MaskProxy users typically get the best consistency when they rotate less often than they think they need to.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Rules that work in real runs<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Per-exit pace<\/strong>: begin at <strong>1 request every 2\u20135 seconds<\/strong> per exit<\/li>\n\n\n\n<li><strong>Concurrency per exit<\/strong>: keep it at <strong>1\u20132<\/strong> until blocks are near zero<\/li>\n\n\n\n<li><strong>Rotation<\/strong>: rotate on schedule, not randomly<\/li>\n\n\n\n<li><strong>Region<\/strong>: keep exits aligned to the market you\u2019re scraping to avoid price\/shipping inconsistencies<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: pick one proxy type and run a 200-request benchmark before you buy more capacity.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Understand why 403 and 429 are not random events<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">When your scraper \u201csuddenly\u201d starts failing, it\u2019s usually because you changed traffic shape: faster cadence, higher concurrency, noisier rotation, or mixed regions.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Use these as your quick meaning checks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>403 meaning: <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/HTTP\/Status\/403\" target=\"_blank\" rel=\"noopener\">HTTP 403 Forbidden<\/a><\/li>\n\n\n\n<li>429 meaning: <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/HTTP\/Status\/429\" target=\"_blank\" rel=\"noopener\">HTTP 429 Too Many Requests<\/a><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Practical interpretation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>429 rising<\/strong> usually means \u201cslow down and respect Retry-After.\u201d<\/li>\n\n\n\n<li><strong>403 rising<\/strong> usually means \u201creduce traffic shape risk and stabilize exits,\u201d not \u201cretry harder.\u201d<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: start logging status code, latency, proxy ID, and whether key fields were present. Debugging without these numbers is guesswork.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Use a simple scraping architecture that stays stable when you scale<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Here\u2019s the smallest architecture that still behaves like a real system.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart LR\n  A&#091;URL inputs product search category seller] --&gt; B&#091;Queue with dedupe and priority]\n  B --&gt; C&#091;Fetcher proxies rate limit retries]\n  C --&gt; D&#091;Parser HTML and embedded scripts]\n  D --&gt; E&#091;Cleaner normalize validate completeness]\n  E --&gt; F&#091;Exporter CSV JSON DB]\n  C --&gt; G&#091;Logs status latency proxy attempts]\n  G --&gt; H&#091;Metrics success rate block rate cost]\n  H --&gt; B\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: decide the \u201ckey fields\u201d for each page type and treat missing key fields as a failure that needs a controlled second attempt.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Quick Start for a minimal working scraper in minutes<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This quick start is intentionally small: fetch HTML, parse a few fields, and keep it stable. It uses <strong>httpx<\/strong> because proxy handling and timeouts are straightforward.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Install<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><code>pip install httpx lxml beautifulsoup4 pandas<\/code><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Minimal product page fetch and parse<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import httpx\nfrom bs4 import BeautifulSoup\n\ndef fetch_html(url: str, proxy: str | None = None, timeout=20) -&gt; str:\n    headers = {\n        \"User-Agent\": \"Mozilla\/5.0\",\n        \"Accept-Language\": \"en-US,en;q=0.9\",\n        \"Accept\": \"text\/html,application\/xhtml+xml\",\n    }\n    with httpx.Client(\n        proxies=proxy,\n        headers=headers,\n        timeout=timeout,\n        follow_redirects=True\n    ) as client:\n        r = client.get(url)\n        r.raise_for_status()\n        return r.text\n\ndef parse_product_basic(html: str) -&gt; dict:\n    soup = BeautifulSoup(html, \"lxml\")\n    title = soup.select_one(\"h1\")\n    price = soup.select_one('&#091;itemprop=\"price\"], .x-price-primary span')\n    condition = soup.select_one(\".x-item-condition-text span\")\n    return {\n        \"title\": title.get_text(strip=True) if title else None,\n        \"price\": (price.get(\"content\") if price and price.has_attr(\"content\") else (price.get_text(strip=True) if price else None)),\n        \"condition\": condition.get_text(strip=True) if condition else None,\n    }\n\nif __name__ == \"__main__\":\n    url = \"https:\/\/www.ebay.com\/itm\/176212861437\"\n    proxy = None  # e.g. \"http:\/\/user:pass@host:port\"\n    html = fetch_html(url, proxy=proxy)\n    print(parse_product_basic(html))\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: run this on 10 product URLs and note how often price and condition are missing. If missing fields are common, you\u2019ll solve that in the product-page section before scaling.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Scrape product, search, category, and seller pages with a repeatable workflow<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Product pages: prevent missing price and variants<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Missing price or variants often happens because the values live in embedded scripts, not the visible HTML. Treat it as a data placement problem first.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Workflow<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Parse visible HTML for basic fields.<\/li>\n\n\n\n<li>Validate completeness.<\/li>\n\n\n\n<li>If incomplete, extract embedded JSON or script data.<\/li>\n\n\n\n<li>Only use Playwright if data is truly rendered after load.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Here is a safe pattern: if key fields are missing, mark the row incomplete, then attempt a script extraction pass.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import re, json\n\ndef extract_embedded_json_candidates(html: str) -&gt; list&#091;dict]:\n    # Heuristic: find JSON-like blocks in scripts and try decoding a few candidates.\n    candidates = &#091;]\n    for m in re.finditer(r\"&lt;script&#091;^&gt;]*&gt;(.*?)&lt;\/script&gt;\", html, flags=re.S | re.I):\n        s = m.group(1).strip()\n        if len(s) &lt; 200:\n            continue\n        # Look for obvious JSON objects\n        if \"{\" in s and \"}\" in s:\n            # Try to locate a JSON object substring\n            j = re.search(r\"(\\{.*\\})\", s, flags=re.S)\n            if not j:\n                continue\n            chunk = j.group(1)\n            try:\n                candidates.append(json.loads(chunk))\n            except Exception:\n                pass\n    return candidates\n\ndef completeness_score(row: dict, required: list&#091;str]) -&gt; int:\n    present = sum(1 for k in required if row.get(k))\n    return int((present \/ max(1, len(required))) * 100)\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: define a required field list for product pages like <code>[\"title\", \"price\", \"condition\"]<\/code> and store completeness score. It makes \u201cdata quality\u201d measurable.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Search pages: pagination, dedupe, and checkpointing<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Search pages are where you gather item URLs safely and repeatedly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Workflow<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Fetch page 1<\/li>\n\n\n\n<li>Extract item links<\/li>\n\n\n\n<li>Dedupe<\/li>\n\n\n\n<li>Save a progress checkpoint<\/li>\n\n\n\n<li>Continue to the next page<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>import json\nimport httpx\nfrom bs4 import BeautifulSoup\nfrom pathlib import Path\n\nPROGRESS = Path(\"search_progress.json\")\n\ndef load_progress():\n    if PROGRESS.exists():\n        return json.loads(PROGRESS.read_text(\"utf-8\"))\n    return {\"page\": 1, \"seen\": &#091;]}\n\ndef save_progress(state):\n    PROGRESS.write_text(json.dumps(state, ensure_ascii=False, indent=2), \"utf-8\")\n\ndef search_page_links(keyword: str, page: int, proxy: str | None = None) -&gt; list&#091;str]:\n    base = \"https:\/\/www.ebay.com\/sch\/i.html\"\n    params = {\"_nkw\": keyword, \"_pgn\": page}\n    headers = {\"User-Agent\": \"Mozilla\/5.0\", \"Accept-Language\": \"en-US,en;q=0.9\"}\n    with httpx.Client(proxies=proxy, headers=headers, timeout=25, follow_redirects=True) as client:\n        r = client.get(base, params=params)\n        r.raise_for_status()\n    soup = BeautifulSoup(r.text, \"lxml\")\n    out = &#091;]\n    for a in soup.select(\"a.s-item__link\"):\n        href = a.get(\"href\")\n        if href and \"\/itm\/\" in href:\n            out.append(href.split(\"?\")&#091;0])\n    return out\n\ndef dedupe_urls(urls: list&#091;str], seen: set&#091;str]) -&gt; list&#091;str]:\n    out = &#091;]\n    for u in urls:\n        if u not in seen:\n            out.append(u)\n            seen.add(u)\n    return out\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: stop when a page produces almost no new URLs after dedupe. That\u2019s a natural stability limit for discovery runs.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Category pages: use them as stable inventory feeds<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Category pages behave like \u201ccurated search.\u201d They\u2019re useful when keyword search results are noisy.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Approach:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Treat category pages as URL sources.<\/li>\n\n\n\n<li>Extract item URLs the same way.<\/li>\n\n\n\n<li>Keep pacing conservative because category pages are often heavily crawled.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: run category discovery with a slower cadence than search runs, and rotate less frequently to reduce \u201cjumpy\u201d behavior.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h3 class=\"wp-block-heading\">Seller pages: collect summary metrics without crossing lines<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Seller pages are best for summary-level monitoring:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>feedback score summary<\/li>\n\n\n\n<li>listing count<\/li>\n\n\n\n<li>store inventory sampling<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Avoid collecting identifying personal details. If your use case can be covered by official APIs, check the entry point at <a href=\"https:\/\/developer.ebay.com\/\" target=\"_blank\" rel=\"noopener\">eBay Developers Program<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: keep seller scraping to summary fields and listing URLs only, and drop anything you don\u2019t need for analysis.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Build a production version with retries, backoff, caching, concurrency, and logs<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Production stability is mainly about what you do when things fail.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Backoff decision flow<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>flowchart TD\n  A&#091;Request URL] --&gt; B{Status code}\n  B --&gt;|200| C&#091;Parse and validate]\n  C --&gt;|Complete| D&#091;Write output]\n  C --&gt;|Incomplete| E&#091;Mark incomplete and attempt script extraction]\n  B --&gt;|429| F&#091;Read Retry-After or apply exponential backoff]\n  F --&gt; G&#091;Lower rate and concurrency]\n  G --&gt; A\n  B --&gt;|403| H&#091;Slow down switch exit check headers]\n  H --&gt; I{Still blocked}\n  I --&gt;|Yes| J&#091;Pause reduce scope or use API coverage]\n  I --&gt;|No| A\n  B --&gt;|Captcha page| K&#091;Stop forcing stabilize exits reduce rotation]\n  K --&gt; A\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Before you scale, make sure your proxy strings and schemes are consistent across your team. This reference is useful for keeping formats correct: <a href=\"https:\/\/maskproxy.io\/proxy-protocols.html\">Proxy Protocols<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Production fetcher with cache and smarter retry<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>import time, json, hashlib\nimport httpx\nfrom pathlib import Path\n\nCACHE_DIR = Path(\".\/cache\")\nCACHE_DIR.mkdir(exist_ok=True)\n\ndef cache_key(url: str) -&gt; str:\n    return hashlib.md5(url.encode(\"utf-8\")).hexdigest()\n\ndef load_cache(url: str) -&gt; str | None:\n    p = CACHE_DIR \/ f\"{cache_key(url)}.html\"\n    return p.read_text(encoding=\"utf-8\") if p.exists() else None\n\ndef save_cache(url: str, html: str) -&gt; None:\n    p = CACHE_DIR \/ f\"{cache_key(url)}.html\"\n    p.write_text(html, encoding=\"utf-8\")\n\ndef fetch_with_retry(url: str, proxy: str | None, max_attempts=4, base_sleep=2.0):\n    headers = {\n        \"User-Agent\": \"Mozilla\/5.0\",\n        \"Accept-Language\": \"en-US,en;q=0.9\",\n        \"Accept\": \"text\/html,application\/xhtml+xml\",\n    }\n\n    cached = load_cache(url)\n    if cached:\n        return 200, cached\n\n    with httpx.Client(proxies=proxy, headers=headers, timeout=25, follow_redirects=True) as client:\n        last_status = 0\n        for attempt in range(1, max_attempts + 1):\n            t0 = time.time()\n            r = client.get(url)\n            latency_ms = int((time.time() - t0) * 1000)\n            status = r.status_code\n            last_status = status\n\n            print(json.dumps({\n                \"url\": url,\n                \"status\": status,\n                \"latency_ms\": latency_ms,\n                \"attempt\": attempt,\n                \"proxy\": proxy or \"DIRECT\",\n            }, ensure_ascii=False))\n\n            if status == 200:\n                save_cache(url, r.text)\n                return status, r.text\n\n            if status == 429:\n                ra = r.headers.get(\"Retry-After\")\n                sleep_s = float(ra) if (ra and ra.isdigit()) else base_sleep * (2 ** (attempt - 1))\n                time.sleep(sleep_s)\n                continue\n\n            if status == 403:\n                time.sleep(base_sleep * (2 ** (attempt - 1)))\n                continue\n\n            time.sleep(base_sleep)\n\n        return last_status, \"\"\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: start with concurrency-per-exit set to 1. If your success rate stays high and captcha rate stays low, increase slowly.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Apply proxy rotation and sticky sessions with clear rules<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Rotation helps when it is controlled and auditable. It hurts when it is random.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"574\" src=\"https:\/\/maskproxy.io\/blog\/wp-content\/uploads\/ebay-proxy-rotation-sticky-session-rules-1024x574.webp\" alt=\"Proxy rotation and sticky sessions for stable eBay scraping\" class=\"wp-image-773\" srcset=\"https:\/\/maskproxy.io\/blog\/wp-content\/uploads\/ebay-proxy-rotation-sticky-session-rules-1024x574.webp 1024w, https:\/\/maskproxy.io\/blog\/wp-content\/uploads\/ebay-proxy-rotation-sticky-session-rules-300x168.webp 300w, https:\/\/maskproxy.io\/blog\/wp-content\/uploads\/ebay-proxy-rotation-sticky-session-rules-768x431.webp 768w, https:\/\/maskproxy.io\/blog\/wp-content\/uploads\/ebay-proxy-rotation-sticky-session-rules.webp 1125w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Simple rules for rotation frequency, sessions, and regions.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Rotation frequency<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Search and category discovery: rotate every <strong>5\u201320 requests<\/strong> per exit<\/li>\n\n\n\n<li>Product details: rotate every <strong>10\u201350 requests<\/strong> per exit<\/li>\n\n\n\n<li>If captcha rate rises: rotate <strong>less<\/strong>, not more<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Sticky sessions<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Keep the same exit for <strong>10 minutes<\/strong> when scraping a single page type<\/li>\n\n\n\n<li>Avoid mixing regions in the same batch if you care about consistent pricing and shipping<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Region selection<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Align exits to the market domain you\u2019re scraping<\/li>\n\n\n\n<li>Keep separate pools per market so your benchmarks stay comparable<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For controlled rotation at scale, <a href=\"https:\/\/maskproxy.io\/rotating-proxies.html\">Rotating Proxies<\/a> fits well when you want predictable churn while keeping data completeness measurable. MaskProxy setups generally perform best when rotation is treated as a parameter you tune, not a lever you spam.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: lock one region, one proxy type, and one cadence. Benchmark first, then scale.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Clean, dedupe, and export so the results can be used immediately<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For price monitoring, competitor tracking, and inventory discovery, \u201cusable output\u201d usually means:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>deduped URLs<\/li>\n\n\n\n<li>normalized currency and shipping fields<\/li>\n\n\n\n<li>missing fields flagged rather than silently dropped<\/li>\n\n\n\n<li>stable schema exported to CSV or JSON<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">A beginner-safe rule: never overwrite your schema mid-run. Add new fields as optional columns and keep older output readable.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: add a boolean <code>parse_ok<\/code> and a numeric completeness score per record. It turns \u201cbad runs\u201d into searchable causes.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"574\" src=\"https:\/\/maskproxy.io\/blog\/wp-content\/uploads\/ebay-scraping-troubleshooting-playbook-1024x574.webp\" alt=\"Troubleshooting 403, 429, and captcha when scraping eBay\" class=\"wp-image-772\" srcset=\"https:\/\/maskproxy.io\/blog\/wp-content\/uploads\/ebay-scraping-troubleshooting-playbook-1024x574.webp 1024w, https:\/\/maskproxy.io\/blog\/wp-content\/uploads\/ebay-scraping-troubleshooting-playbook-300x168.webp 300w, https:\/\/maskproxy.io\/blog\/wp-content\/uploads\/ebay-scraping-troubleshooting-playbook-768x431.webp 768w, https:\/\/maskproxy.io\/blog\/wp-content\/uploads\/ebay-scraping-troubleshooting-playbook.webp 1125w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">A quick routine to identify the cause and fix it fast.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Troubleshoot 403, 429, captchas, missing data, and render-loaded content<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">The 10-minute isolation routine<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Check the ratio of 429 vs 403<\/li>\n\n\n\n<li>Test the same URL direct vs proxy<\/li>\n\n\n\n<li>Reduce concurrency and slow the cadence<\/li>\n\n\n\n<li>Stabilize exits and reduce rotation<\/li>\n\n\n\n<li>Confirm you are not parsing a challenge page by logging HTML length and title<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Symptom to cause to fix table<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Symptom<\/th><th>Likely cause<\/th><th>First fix that usually works<\/th><\/tr><\/thead><tbody><tr><td>403 rises after scaling<\/td><td>traffic shape too aggressive, exit reputation, unstable rotation<\/td><td>cut concurrency in half, double delay, stabilize exits<\/td><\/tr><tr><td>429 increases<\/td><td>rate limit triggered<\/td><td>respect Retry-After, slow per IP<\/td><\/tr><tr><td>repeated captchas<\/td><td>jumpy exits, region mixing, too-fast rotation<\/td><td>rotate less, keep sticky sessions, unify region<\/td><\/tr><tr><td>price or variants missing<\/td><td>data embedded in scripts<\/td><td>validate completeness and extract embedded data<\/td><\/tr><tr><td>many fields missing<\/td><td>challenge page served instead of content<\/td><td>log HTML length and title, slow down and retry<\/td><\/tr><tr><td>run becomes slow over time<\/td><td>timeouts and retries dominate<\/td><td>lower max retries, improve exit quality, cache responses<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: treat \u201cmissing key fields\u201d as a failure mode, not a partial success. That\u2019s how you prevent silent data corruption.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Use a benchmark template to choose proxy plans by results, not claims<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Buy proxies with numbers, not promises.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Metrics to track<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>success rate: 200 plus key fields present<\/li>\n\n\n\n<li>block rate: 403 plus challenge-page frequency<\/li>\n\n\n\n<li>rate-limit rate: 429 frequency<\/li>\n\n\n\n<li>completeness: price, shipping, variants present<\/li>\n\n\n\n<li>cost per 1000 successful requests: total spend \u00f7 successes \u00d7 1000<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Reusable evaluation table<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Plan<\/th><th>Proxy type<\/th><th>Exits<\/th><th>Per-exit pace<\/th><th>Total requests<\/th><th>Success<\/th><th>403<\/th><th>429<\/th><th>Captcha<\/th><th>Avg latency ms<\/th><th>Cost per 1000 successes<\/th><th>Notes<\/th><\/tr><\/thead><tbody><tr><td>A<\/td><td>Datacenter<\/td><td>5<\/td><td>0.5 req\/s<\/td><td>1000<\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><tr><td>B<\/td><td>Residential<\/td><td>10<\/td><td>0.3 req\/s<\/td><td>1000<\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><tr><td>C<\/td><td>ISP<\/td><td>10<\/td><td>0.4 req\/s<\/td><td>1000<\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><tr><td>D<\/td><td>Rotating<\/td><td>1 pool<\/td><td>0.3 req\/s<\/td><td>1000<\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: run 200 requests first. If the trend is stable, scale to 1000 and compare cost per 1000 successes across plans.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Use Playwright only when the data appears after render<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Use Playwright when the HTML source does not contain the data you need, and you can confirm it only appears after render or interaction.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Minimal Playwright example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from playwright.sync_api import sync_playwright\n\ndef fetch_rendered(url: str, proxy_server: str | None = None) -&gt; str:\n    with sync_playwright() as p:\n        launch_args = {}\n        if proxy_server:\n            launch_args&#091;\"proxy\"] = {\"server\": proxy_server}\n        browser = p.chromium.launch(headless=True, **launch_args)\n        page = browser.new_page()\n        page.goto(url, wait_until=\"networkidle\", timeout=45000)\n        html = page.content()\n        browser.close()\n        return html\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: keep Playwright as a second-pass tool for incomplete records only. It saves cost and reduces risk.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<h2 class=\"wp-block-heading\">Keep scraping within clean boundaries to improve stability and reduce risk<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Clean scope improves stability. Public pages are simpler, and you avoid unnecessary issues by not collecting personal identifiers or bypassing access controls.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For the practical rules of robots files and typical interpretation, robotstxt.org is a useful reference. For eBay\u2019s own service rules, review their user agreement at <a href=\"https:\/\/www.ebay.com\/help\/policies\/member-behaviour-policies\/user-agreement?id=4259\" target=\"_blank\" rel=\"noopener\">eBay User Agreement<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next step: keep your dataset focused on product and listing information, and remove anything that is not required for your business decision.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p class=\"wp-block-paragraph\">Once your benchmark shows stable success rate and completeness, scale by increasing exits gradually and widening coverage while keeping cadence measurable. For large discovery runs where you want controlled churn but still care about data completeness, <a href=\"https:\/\/maskproxy.io\/rotating-residential-proxies.html\">Rotating Residential Proxies<\/a> can act as the rotation layer while you track cost per 1000 successful requests.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n<div class=\"wp-block-post-author\"><div class=\"wp-block-post-author__avatar\"><img alt='' src='https:\/\/maskproxy.io\/blog\/wp-content\/litespeed\/avatar\/34f0c677e3cc9e830b660d3ceb872148.jpg?ver=1780119660' srcset='https:\/\/maskproxy.io\/blog\/wp-content\/litespeed\/avatar\/b2346ff8f485776ddfb5623f5c63b9ab.jpg?ver=1780117939 2x' class='avatar avatar-48 photo' height='48' width='48' \/><\/div><div class=\"wp-block-post-author__content\"><p class=\"wp-block-post-author__name\">Harris Daniel<\/p><\/div><\/div>\n\n\n<p class=\"wp-block-paragraph\">Daniel Harris is a Content Manager and Full-Stack SEO Specialist with 7+ years of hands-on experience across content strategy and technical SEO. He writes about proxy usage in everyday workflows, including SEO checks, ad previews, pricing scans, and multi-account work. He\u2019s drawn to systems that stay consistent over time and writing that stays calm, concrete, and readable. Outside work, Daniel is usually exploring new tools, outlining future pieces, or getting lost in a long book.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">FAQ<\/h2>\n\n\n<div id=\"rank-math-faq\" class=\"rank-math-block\">\n<div class=\"rank-math-list \">\n<div id=\"faq-question-1767868715925\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">1. What is the first thing to do when eBay scraping returns 403?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Slow down and reduce concurrency first, then test the same URL direct vs proxy to isolate exit quality versus request pattern.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1767868760606\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">2. How do I handle 429 Too Many Requests on eBay?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Respect Retry-After if present and reduce per-IP cadence. Treat 429 as a hard signal to slow down.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1767868770323\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">3. Which proxy type is best for stable eBay scraping?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Residential or ISP exits are typically more consistent for long-running stability. Rotating pools help for discovery when rotation is controlled.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1767868787335\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">4. Should I rotate IPs as fast as possible?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>No. Fast rotation often increases captchas and instability. Rotate on a schedule you can measure and tune.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1767868793383\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">5. Why do I miss price or variants on product pages?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Pricing and variants are often embedded in scripts rather than visible HTML. Validate completeness and attempt script extraction before switching tools.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1767868803704\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">6. How do I scrape eBay search pagination reliably?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Use page parameters, dedupe item URLs, and checkpoint progress after each page so you can resume without repeating work.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1767868817161\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">7. When should I use Playwright for eBay scraping?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Only when the data appears after render or requires interaction. Otherwise httpx or requests is faster and cheaper.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1767868837441\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">8. How can I tell if I received a challenge page instead of real content?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Log HTML length and title when parsing fails. Challenge pages often have unusual titles and very different HTML sizes.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1767868844674\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">9. How do I benchmark proxy plans for eBay scraping?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Measure success rate, block rate, latency, completeness, and cost per 1000 successful requests. Compare plans using your own runs.<\/p>\n\n<\/div>\n<\/div>\n<div id=\"faq-question-1767868855931\" class=\"rank-math-list-item\">\n<h3 class=\"rank-math-question \">10. How do I keep scraping within safe boundaries?<\/h3>\n<div class=\"rank-math-answer \">\n\n<p>Stick to public pages, avoid login walls, avoid collecting personal identifiers, and keep scope focused on product and listing information.<\/p>\n\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Scrape eBay reliably with the right proxy strategy. Fix 403, 429, captchas, missing variants, and slow runs with step-by-step workflows, quick-start code, and production checks.<\/p>\n","protected":false},"author":2,"featured_media":771,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","footnotes":""},"categories":[200,277,87],"tags":[380,386,378,383,381,385,382,379,384,377],"class_list":["post-770","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-residential-proxies","category-rotating-proxies","category-rotating-residential-proxies","tag-ebay-captcha-scraping","tag-ebay-price-monitoring-scraper","tag-ebay-proxies","tag-ebay-scraper","tag-ebay-scraping-proxies","tag-fix-ebay-403","tag-fix-ebay-429","tag-residential-proxies-for-ebay","tag-rotating-proxies-for-ebay","tag-scrape-ebay"],"_links":{"self":[{"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/posts\/770","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/comments?post=770"}],"version-history":[{"count":1,"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/posts\/770\/revisions"}],"predecessor-version":[{"id":774,"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/posts\/770\/revisions\/774"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/media\/771"}],"wp:attachment":[{"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/media?parent=770"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/categories?post=770"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/maskproxy.io\/blog\/wp-json\/wp\/v2\/tags?post=770"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}