schalappe/content-based-image-retrieval
The goal is to find the best algorithm for content-based image retrieval.
Content-based Image Retrieval (CBIR)
A system that finds similar images using visual features rather than metadata. This project compares various feature
extraction algorithms and similarity metrics to determine the most effective combination for image retrieval tasks.
Table of Contents
Overview
Content-based image retrieval (CBIR) is a computer
vision technique for searching digital images in large databases based on their visual content rather than metadata
or text annotations.
The process consists of four steps:
- Extract features from an image database to form a feature database
- Extract features from the query image
- Find the most similar features in the database
- Return the images associated with the most similar features
Research Objective
This project compares different combinations of feature extraction algorithms and similarity metrics to identify
the most effective approach for image retrieval tasks.
Feature Extraction Methods
- Traditional Computer Vision:
- Deep Learning:
- VGG16 - 16-layer CNN architecture
- NasNet - Neural Architecture Search Network
- EfficientNet - Scalable and efficient CNN
Similarity Metrics
- Cosine Similarity - Measure of similarity between two non-zero vectors
- Manhattan Distance - Sum of absolute differences between coordinates
- Euclidean Distance - "Ordinary" straight-line distance between points
Evaluation Dataset
We used the Apparel Images Dataset from Kaggle, which
contains various clothing items organized by category.
Evaluation Metrics
- Mean Average Precision (MAP) - Assesses overall system retrieval quality
- Mean Reciprocal Rank (MRR) - Evaluates the rank of the first relevant item
- First Rank Accuracy - Percentage of queries where the first result is relevant
- Average query time - Performance measurement in seconds
Project Structure
├── data/ # Data directory (created by make prepare)
│ ├── inputs/ # Raw and processed input images
│ ├── features/ # Extracted feature vectors
│ └── evaluation/ # Evaluation results
├── notebooks/ # Jupyter notebooks for analysis and visualization
├── reports/ # Generated evaluation reports and images
│ └── images/ # Visualizations and diagrams
├── src/ # Source code
│ ├── addons/ # Core functionality
│ │ ├── extraction/ # Feature extraction algorithms
│ │ └── metrics.py # Evaluation metrics
│ ├── data/ # Data processing utilities
│ ├── features/ # Feature generation scripts
│ └── models/ # Prediction and model utilities
├── tests/ # Unit tests
├── Makefile # Automation scripts
├── pyproject.toml # Project dependencies and metadata
└── README.md # Project documentationInstallation
Prerequisites
- Python 3.12+
- uv for dependency management
- Kaggle account with API credentials (get from Kaggle Settings > API)
Setup
-
Clone the repository:
git clone https://github.com/schalappe/content-based-image-retrieval.git cd content-based-image-retrieval -
Set up the environment and install dependencies:
make venv
-
Prepare the project structure and configuration:
make prepare
-
Add your Kaggle credentials to
.env(get from Kaggle Settings > API):KAGGLE_USERNAME=your_username KAGGLE_KEY=your_api_key -
Download the Apparel Images Dataset from Kaggle:
make download
-
Generate feature vectors:
make features
Usage
Running Evaluations
To run the complete evaluation pipeline:
make predictExploring Results
To launch Jupyter notebook for result analysis:
make notebookRunning Tests
python -m unittest discover testsResults and Findings
Our evaluation revealed significant differences in performance between traditional computer vision and deep learning approaches:
Performance Comparison
| Method | Similarity | MAP | MRR | First Rank Accuracy | Avg Query Time (s) |
|---|---|---|---|---|---|
| EfficientNet | Euclidean | 0.75 | 0.78 | 0.70 | 0.008 |
| VGG16 | Manhattan | 0.72 | 0.76 | 0.68 | 0.009 |
| NasNet | Euclidean | 0.65 | 0.70 | 0.64 | 0.012 |
| AKAZE | Manhattan | 0.41 | 0.47 | 0.39 | 0.005 |
| ORB | Euclidean | 0.35 | 0.40 | 0.33 | 0.004 |
Key Findings
- Neural networks significantly outperform traditional descriptors in retrieval accuracy
- Euclidean and Manhattan distances consistently work better than cosine similarity
- EfficientNet with Euclidean distance provides the best balance of accuracy and performance
- Traditional methods (AKAZE, ORB) are faster but substantially less accurate
- Cosine similarity performs poorly across all extraction methods
For detailed analysis and visualization of results, see the evaluation report.
License
This project is licensed under the MIT License.
