nightking-oliver-powers/digikala-product-reviews-spider
Digikala reviews data extraction
Digikala Product Reviews Spider
Digikala Product Reviews Spider collects detailed customer review data from Digikala product pages, helping teams understand real user sentiment at scale.
It turns unstructured feedback into clean, structured data for analysis, reporting, and decision-making.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for digikala-product-reviews-spider you've just found your team — Let’s Chat. 👆👆
Introduction
This project extracts comprehensive product review information from Digikala product pages and converts it into structured datasets.
It solves the challenge of manually analyzing large volumes of customer feedback by automating review collection.
It is built for analysts, e-commerce teams, researchers, and product managers who need reliable insight into customer opinions.
Customer Feedback Intelligence on Digikala
- Processes multiple product URLs in a single run
- Captures both textual reviews and quantitative ratings
- Associates reviews with product variants and sellers
- Outputs clean, analytics-ready structured data
- Designed for stable, repeatable large-scale runs
Features
| Feature | Description |
|---|---|
| Review Content Extraction | Collects full review text, ratings, and creation timestamps. |
| Buyer Verification | Identifies whether a review is written by a verified buyer. |
| User Metadata | Extracts available reviewer name and profile details. |
| Variant-Level Insights | Includes color, seller, warranty, and variant rating data. |
| Scalable Collection | Handles many product URLs efficiently in one execution. |
| Structured Output | Produces consistent JSON suitable for analytics pipelines. |
What Data This Scraper Extracts
| Field Name | Field Description |
|---|---|
| id | Unique identifier of the review. |
| body | Full review text written by the user. |
| created_at | Date and time when the review was submitted. |
| rate | Star rating assigned by the reviewer. |
| is_buyer | Indicates whether the reviewer purchased the product. |
| user_name | Display name of the reviewer. |
| variant.id | Unique identifier of the product variant. |
| variant.color | Color associated with the reviewed variant. |
| variant.seller | Seller offering the reviewed variant. |
| variant.warranty | Warranty details for the variant. |
| social_profile | Optional social profile metadata if available. |
Example Output
[
{
"id": 75424198,
"body": "عالیه،سبکه و خیلی روان هست با این قیمت گوشی خیلی خوبیه...",
"created_at": "2025-06-13 14:32:14",
"rate": 5,
"is_buyer": true,
"user_name": "فاطمه کمالی روستا",
"variant": {
"id": 51931203,
"rate": 96,
"status": "marketable",
"color": "آبی",
"seller": "دیجیکالا",
"warranty": "گارانتی 18 ماهه کاوش تیم"
},
"social_profile": {
"username": null,
"bio": null,
"photo": null,
"name": null
}
}
]
Directory Structure Tree
Digikala Product Reviews Spider/
├── src/
│ ├── runner.py
│ ├── parsers/
│ │ ├── review_parser.py
│ │ └── variant_parser.py
│ ├── utils/
│ │ ├── request_handler.py
│ │ └── normalizer.py
│ └── config/
│ └── settings.example.json
├── data/
│ ├── input.sample.json
│ └── output.sample.json
├── requirements.txt
└── README.md
Use Cases
- Market researchers use it to analyze customer sentiment, so they can identify product strengths and weaknesses.
- E-commerce teams use it to monitor feedback trends, so they can optimize listings and descriptions.
- Product managers use it to compare variants, so they can prioritize inventory and features.
- Competitor analysts use it to benchmark similar products, so they can spot competitive gaps.
FAQs
Can I scrape reviews from multiple products at once?
Yes, the project is designed to process multiple product URLs in a single run while keeping outputs clearly separated.
Does it include variant-specific information?
Yes, reviews are linked to product variants such as color, seller, and warranty when available.
Is the output suitable for analytics tools?
The structured JSON format is designed to integrate smoothly with dashboards, BI tools, or data warehouses.
How reliable is the collected data?
The extraction logic focuses on completeness and consistency to ensure high-quality datasets across runs.
Performance Benchmarks and Results
Primary Metric: Processes hundreds of reviews per product with consistent extraction accuracy.
Reliability Metric: Maintains stable execution with a high success rate across repeated runs.
Efficiency Metric: Optimized request handling minimizes overhead while maximizing throughput.
Quality Metric: Delivers high data completeness, capturing both textual feedback and structured metadata.
