GitHunt

Digikala Product Reviews Spider

Digikala Product Reviews Spider collects detailed customer review data from Digikala product pages, helping teams understand real user sentiment at scale.
It turns unstructured feedback into clean, structured data for analysis, reporting, and decision-making.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for digikala-product-reviews-spider you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts comprehensive product review information from Digikala product pages and converts it into structured datasets.
It solves the challenge of manually analyzing large volumes of customer feedback by automating review collection.
It is built for analysts, e-commerce teams, researchers, and product managers who need reliable insight into customer opinions.

Customer Feedback Intelligence on Digikala

  • Processes multiple product URLs in a single run
  • Captures both textual reviews and quantitative ratings
  • Associates reviews with product variants and sellers
  • Outputs clean, analytics-ready structured data
  • Designed for stable, repeatable large-scale runs

Features

Feature Description
Review Content Extraction Collects full review text, ratings, and creation timestamps.
Buyer Verification Identifies whether a review is written by a verified buyer.
User Metadata Extracts available reviewer name and profile details.
Variant-Level Insights Includes color, seller, warranty, and variant rating data.
Scalable Collection Handles many product URLs efficiently in one execution.
Structured Output Produces consistent JSON suitable for analytics pipelines.

What Data This Scraper Extracts

Field Name Field Description
id Unique identifier of the review.
body Full review text written by the user.
created_at Date and time when the review was submitted.
rate Star rating assigned by the reviewer.
is_buyer Indicates whether the reviewer purchased the product.
user_name Display name of the reviewer.
variant.id Unique identifier of the product variant.
variant.color Color associated with the reviewed variant.
variant.seller Seller offering the reviewed variant.
variant.warranty Warranty details for the variant.
social_profile Optional social profile metadata if available.

Example Output

[
  {
    "id": 75424198,
    "body": "عالیه،سبکه و خیلی روان هست با این قیمت گوشی خیلی خوبیه...",
    "created_at": "2025-06-13 14:32:14",
    "rate": 5,
    "is_buyer": true,
    "user_name": "فاطمه کمالی روستا",
    "variant": {
      "id": 51931203,
      "rate": 96,
      "status": "marketable",
      "color": "آبی",
      "seller": "دیجی‌کالا",
      "warranty": "گارانتی 18 ماهه کاوش تیم"
    },
    "social_profile": {
      "username": null,
      "bio": null,
      "photo": null,
      "name": null
    }
  }
]

Directory Structure Tree

Digikala Product Reviews Spider/
├── src/
│   ├── runner.py
│   ├── parsers/
│   │   ├── review_parser.py
│   │   └── variant_parser.py
│   ├── utils/
│   │   ├── request_handler.py
│   │   └── normalizer.py
│   └── config/
│       └── settings.example.json
├── data/
│   ├── input.sample.json
│   └── output.sample.json
├── requirements.txt
└── README.md

Use Cases

  • Market researchers use it to analyze customer sentiment, so they can identify product strengths and weaknesses.
  • E-commerce teams use it to monitor feedback trends, so they can optimize listings and descriptions.
  • Product managers use it to compare variants, so they can prioritize inventory and features.
  • Competitor analysts use it to benchmark similar products, so they can spot competitive gaps.

FAQs

Can I scrape reviews from multiple products at once?
Yes, the project is designed to process multiple product URLs in a single run while keeping outputs clearly separated.

Does it include variant-specific information?
Yes, reviews are linked to product variants such as color, seller, and warranty when available.

Is the output suitable for analytics tools?
The structured JSON format is designed to integrate smoothly with dashboards, BI tools, or data warehouses.

How reliable is the collected data?
The extraction logic focuses on completeness and consistency to ensure high-quality datasets across runs.


Performance Benchmarks and Results

Primary Metric: Processes hundreds of reviews per product with consistent extraction accuracy.

Reliability Metric: Maintains stable execution with a high success rate across repeated runs.

Efficiency Metric: Optimized request handling minimizes overhead while maximizing throughput.

Quality Metric: Delivers high data completeness, capturing both textual feedback and structured metadata.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★