GitHunt
WF

wfgsss/pinduoduo-scraper-example

Scrape product data from Pinduoduo (拼多多) - China's largest group-buying e-commerce platform. Extract prices, sales volume, product images, and seller info.

Pinduoduo Scraper 🛒

License: MIT
Node.js
Apify

Scrape product data from Pinduoduo (拼多多), China's largest group-buying e-commerce platform with 900M+ users.

What It Does

Search for any product keyword and extract structured data:

  • Product title and goods ID
  • Price (current + original)
  • Sales volume (parsed from Chinese text like "已拼10万+件")
  • Product images (CDN URLs)
  • Seller info (shop name, rating)
  • Reviews count

Quick Start

Option 1: Run Locally

git clone https://github.com/wfgsss/pinduoduo-scraper-example.git
cd pinduoduo-scraper-example
npm install
npm start

Option 2: Run with Apify CLI

npm -g install apify-cli
apify init
apify run

Option 3: Use as a Module

const { scrapeSearch } = require('./src/main');

// Search for products
const results = await scrapeSearch({
    searchKeywords: ['蓝牙耳机', '手机壳'],
    maxItems: 50,
    includeDetails: false,
});

Input

Parameter Type Default Description
searchKeywords string[] ["手机壳"] Keywords to search (Chinese recommended)
maxItems number 50 Max products per keyword
includeDetails boolean false Visit detail pages for more data (slower)
proxy object Residential Proxy config (residential recommended)

Example Input

{
    "searchKeywords": ["蓝牙耳机", "手机壳", "充电宝"],
    "maxItems": 100,
    "includeDetails": true
}

Sample Output

[
    {
        "goodsId": "123456789",
        "title": "2026新款蓝牙耳机无线降噪入耳式超长续航",
        "price": 29.9,
        "priceText": "¥29.9",
        "salesVolume": 100000,
        "salesText": "已拼10万+件",
        "imageUrl": "https://img.pddpic.com/gaudit-image/xxx.jpeg",
        "detailUrl": "https://mobile.yangkeduo.com/goods2.html?goods_id=123456789",
        "searchKeyword": "蓝牙耳机",
        "scrapedAt": "2026-02-15T03:00:00.000Z"
    },
    {
        "goodsId": "987654321",
        "title": "苹果15手机壳新款硅胶防摔保护套",
        "price": 9.9,
        "priceText": "¥9.9",
        "salesVolume": 50000,
        "salesText": "已拼5万+件",
        "imageUrl": "https://img.pddpic.com/gaudit-image/yyy.jpeg",
        "detailUrl": "https://mobile.yangkeduo.com/goods2.html?goods_id=987654321",
        "searchKeyword": "手机壳",
        "scrapedAt": "2026-02-15T03:00:00.000Z"
    },
    {
        "goodsId": "456789123",
        "title": "充电宝20000毫安大容量超薄便携移动电源",
        "price": 49.9,
        "priceText": "¥49.9",
        "salesVolume": 200000,
        "salesText": "已拼20万+件",
        "imageUrl": "https://img.pddpic.com/gaudit-image/zzz.jpeg",
        "detailUrl": "https://mobile.yangkeduo.com/goods2.html?goods_id=456789123",
        "searchKeyword": "充电宝",
        "scrapedAt": "2026-02-15T03:00:00.000Z"
    }
]

Who Is This For

  • E-commerce researchers analyzing China's group-buying market
  • Dropshippers finding trending products at factory prices
  • Market analysts tracking pricing and sales trends
  • Supply chain professionals sourcing from Chinese manufacturers
  • Data scientists studying consumer behavior patterns

Technical Details

  • Uses Playwright to render Pinduoduo's React SPA (mobile H5 site)
  • Emulates iPhone Safari for mobile-optimized pages
  • Blocks unnecessary resources (images, fonts, trackers) for faster loading
  • Built-in scroll handling for infinite-scroll product lists
  • Automatic CAPTCHA detection with debug screenshots

Tips

  • Use Chinese keywords for best results (e.g. "手机壳" not "phone case")
  • Residential proxies are strongly recommended — datacenter IPs get blocked quickly
  • Set includeDetails: false for faster runs when you only need basic info
  • Keep maxItems reasonable (50-100) to avoid triggering rate limits

⚠️ Known Limitation

Pinduoduo currently requires phone number + SMS verification for all mobile H5 pages. This means:

  • The scraper may be redirected to a login page
  • A valid Pinduoduo session/cookie is needed for full functionality
  • See test results for details

We're actively exploring workarounds (Temu international version, API reverse engineering, etc.).

⚠️ Disclaimer

This tool is provided for educational and research purposes only. Pinduoduo's robots.txt disallows automated access. Users are responsible for ensuring their use complies with applicable laws and Pinduoduo's terms of service. Do not use this tool to scrape personal/private data.

License

MIT

wfgsss/pinduoduo-scraper-example | GitHunt