nightking-oliver-powers/mercadolivre-reviews-spider
Mercadolivre reviews extraction tool
Mercadolivre Reviews Spider Scraper
A production-ready tool for extracting detailed customer reviews from Mercadolivre product pages at scale.
It helps teams turn raw Mercadolivre reviews into structured insights for sentiment analysis, benchmarking, and decision-making.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for mercadolivre-reviews-spider you've just found your team — Let’s Chat. 👆👆
Introduction
This project extracts structured customer review data from Mercadolivre product listings, transforming unstructured feedback into clean, analyzable datasets.
It solves the challenge of manually collecting and normalizing large volumes of product reviews.
It is built for e-commerce analysts, marketers, data teams, and researchers.
Customer Review Intelligence for Mercadolivre
- Collects full review details including ratings, text, dates, and images
- Handles multiple product URLs in a single run
- Produces consistent, analytics-ready structured output
- Designed for large-scale review analysis and reporting
Features
| Feature | Description |
|---|---|
| Comprehensive Review Extraction | Captures ratings, titles, bodies, dates, images, and source URLs per review. |
| Scalable Crawling | Processes multiple product pages efficiently with parallel execution. |
| Structured Output | Outputs clean, normalized JSON ready for storage or analytics pipelines. |
| Proxy Support | Supports configurable proxy usage to improve access reliability. |
| Error Recovery | Retries failed requests and logs issues for stable long-running jobs. |
| Custom Inputs | Allows precise targeting through user-defined product URLs. |
What Data This Scraper Extracts
| Field Name | Field Description |
|---|---|
| Review_Id | Unique identifier assigned to each customer review. |
| Product_Id | Identifier of the product associated with the review. |
| Rating | Numerical rating given by the customer. |
| Title | Short headline of the review. |
| Body | Full review text written by the customer. |
| Date | Date when the review was published. |
| Full_Review | Combined title and body text for convenience. |
| Image_URLs | List of image URLs attached to the review. |
| URL | Source URL where the review was collected. |
| Crawled_Date | Timestamp indicating when the data was extracted. |
Example Output
[
{
"Review_Id": "1830050664",
"Product_Id": "MLM2031633061",
"Rating": 5,
"Title": "excelente",
"Body": "Esta robusta y tiene buenas funciones junto con la app, la recomiendo...",
"Date": "03-02-2025",
"Full_Review": "excelente: Esta robusta y tiene buenas funciones junto con la app...",
"Image_URLs": [
"https://http2.mlstatic.com/D_NQ_NP_982383-MLA82227086969_022025-F.jpg",
"https://http2.mlstatic.com/D_NQ_NP_786388-MLA81946094768_022025-F.jpg"
],
"URL": "https://articulo.mercadolibre.com.mx/noindex/catalog/reviews/MLM2031633061",
"Crawled_Date": "11-18-2025"
}
]
Directory Structure Tree
Mercadolivre Reviews Spider/
├── src/
│ ├── main.py
│ ├── crawler/
│ │ ├── reviews_collector.py
│ │ └── pagination_handler.py
│ ├── parsers/
│ │ ├── review_parser.py
│ │ └── text_cleaner.py
│ ├── utils/
│ │ ├── request_manager.py
│ │ └── logger.py
│ └── config/
│ └── settings.example.json
├── data/
│ ├── sample_input.json
│ └── sample_output.json
├── requirements.txt
└── README.md
Use Cases
- E-commerce analysts use it to analyze Mercadolivre reviews, so they can identify product strengths and weaknesses.
- Marketing teams use it to monitor customer sentiment, so they can optimize messaging and positioning.
- Competitive researchers use it to compare similar products, so they can benchmark performance.
- Content teams use it to aggregate real user feedback, so they can create authentic review-based content.
- Academic researchers use it to study consumer behavior trends, so they can support data-driven publications.
FAQs
Can I scrape reviews from multiple products at once?
Yes, the tool supports multiple product URLs in a single run, allowing batch collection at scale.
Does it include review images and ratings?
Yes, ratings, text content, and all available image URLs are extracted per review.
Is the output easy to integrate with analytics tools?
The output is structured JSON, making it straightforward to load into databases, dashboards, or BI tools.
How does it handle failed requests?
Built-in retry logic and logging help maintain stability and data completeness during long runs.
Performance Benchmarks and Results
Primary Metric: Processes hundreds of reviews per minute depending on page size and network conditions.
Reliability Metric: Maintains a high successful extraction rate with automatic retries for transient failures.
Efficiency Metric: Optimized request handling minimizes redundant loads and reduces execution time.
Quality Metric: Delivers high data completeness with consistent field coverage across reviews.
