Digikala Product Reviews Spider

Digikala Product Reviews Spider collects detailed customer review data from Digikala product pages, helping teams understand real user sentiment at scale.
It turns unstructured feedback into clean, structured data for analysis, reporting, and decision-making.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for digikala-product-reviews-spider you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts comprehensive product review information from Digikala product pages and converts it into structured datasets.
It solves the challenge of manually analyzing large volumes of customer feedback by automating review collection.
It is built for analysts, e-commerce teams, researchers, and product managers who need reliable insight into customer opinions.

Customer Feedback Intelligence on Digikala

Processes multiple product URLs in a single run
Captures both textual reviews and quantitative ratings
Associates reviews with product variants and sellers
Outputs clean, analytics-ready structured data
Designed for stable, repeatable large-scale runs

Features

Feature	Description
Review Content Extraction	Collects full review text, ratings, and creation timestamps.
Buyer Verification	Identifies whether a review is written by a verified buyer.
User Metadata	Extracts available reviewer name and profile details.
Variant-Level Insights	Includes color, seller, warranty, and variant rating data.
Scalable Collection	Handles many product URLs efficiently in one execution.
Structured Output	Produces consistent JSON suitable for analytics pipelines.

What Data This Scraper Extracts

Field Name	Field Description
id	Unique identifier of the review.
body	Full review text written by the user.
created_at	Date and time when the review was submitted.
rate	Star rating assigned by the reviewer.
is_buyer	Indicates whether the reviewer purchased the product.
user_name	Display name of the reviewer.
variant.id	Unique identifier of the product variant.
variant.color	Color associated with the reviewed variant.
variant.seller	Seller offering the reviewed variant.
variant.warranty	Warranty details for the variant.
social_profile	Optional social profile metadata if available.

Example Output

[
  {
    "id": 75424198,
    "body": "عالیه،سبکه و خیلی روان هست با این قیمت گوشی خیلی خوبیه...",
    "created_at": "2025-06-13 14:32:14",
    "rate": 5,
    "is_buyer": true,
    "user_name": "فاطمه کمالی روستا",
    "variant": {
      "id": 51931203,
      "rate": 96,
      "status": "marketable",
      "color": "آبی",
      "seller": "دیجی‌کالا",
      "warranty": "گارانتی 18 ماهه کاوش تیم"
    },
    "social_profile": {
      "username": null,
      "bio": null,
      "photo": null,
      "name": null
    }
  }
]

Directory Structure Tree

Digikala Product Reviews Spider/
├── src/
│   ├── runner.py
│   ├── parsers/
│   │   ├── review_parser.py
│   │   └── variant_parser.py
│   ├── utils/
│   │   ├── request_handler.py
│   │   └── normalizer.py
│   └── config/
│       └── settings.example.json
├── data/
│   ├── input.sample.json
│   └── output.sample.json
├── requirements.txt
└── README.md

Use Cases

Market researchers use it to analyze customer sentiment, so they can identify product strengths and weaknesses.
E-commerce teams use it to monitor feedback trends, so they can optimize listings and descriptions.
Product managers use it to compare variants, so they can prioritize inventory and features.
Competitor analysts use it to benchmark similar products, so they can spot competitive gaps.

FAQs

Can I scrape reviews from multiple products at once?
Yes, the project is designed to process multiple product URLs in a single run while keeping outputs clearly separated.

Does it include variant-specific information?
Yes, reviews are linked to product variants such as color, seller, and warranty when available.

Is the output suitable for analytics tools?
The structured JSON format is designed to integrate smoothly with dashboards, BI tools, or data warehouses.

How reliable is the collected data?
The extraction logic focuses on completeness and consistency to ensure high-quality datasets across runs.

Performance Benchmarks and Results

Primary Metric: Processes hundreds of reviews per product with consistent extraction accuracy.

Reliability Metric: Maintains stable execution with a high success rate across repeated runs.

Efficiency Metric: Optimized request handling minimizes overhead while maximizing throughput.

Quality Metric: Delivers high data completeness, capturing both textual feedback and structured metadata.

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time." Nathan Pennington Marketer ★★★★★	"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on." Eliza SEO Affiliate Expert ★★★★★	"Exceptional results, clear communication, and flawless delivery. Bitbash nailed it." Syed Digital Strategist ★★★★★

nightking-oliver-powers/digikala-product-reviews-spider