Content-based Image Retrieval (CBIR)

A system that finds similar images using visual features rather than metadata. This project compares various feature
extraction algorithms and similarity metrics to determine the most effective combination for image retrieval tasks.

Overview
Project Structure
Installation
Usage
Results and Findings
License

Overview

Content-based image retrieval (CBIR) is a computer
vision technique for searching digital images in large databases based on their visual content rather than metadata
or text annotations.

The process consists of four steps:

Extract features from an image database to form a feature database
Extract features from the query image
Find the most similar features in the database
Return the images associated with the most similar features

Research Objective

This project compares different combinations of feature extraction algorithms and similarity metrics to identify
the most effective approach for image retrieval tasks.

Feature Extraction Methods

Traditional Computer Vision:
- AKAZE - Fast local feature detector and descriptor
- ORB - Oriented FAST and Rotated BRIEF features
Deep Learning:
- VGG16 - 16-layer CNN architecture
- NasNet - Neural Architecture Search Network
- EfficientNet - Scalable and efficient CNN

Similarity Metrics

Cosine Similarity - Measure of similarity between two non-zero vectors
Manhattan Distance - Sum of absolute differences between coordinates
Euclidean Distance - "Ordinary" straight-line distance between points

Evaluation Dataset

We used the Apparel Images Dataset from Kaggle, which
contains various clothing items organized by category.

Evaluation Metrics

Mean Average Precision (MAP) - Assesses overall system retrieval quality
Mean Reciprocal Rank (MRR) - Evaluates the rank of the first relevant item
First Rank Accuracy - Percentage of queries where the first result is relevant
Average query time - Performance measurement in seconds

Project Structure

├── data/               # Data directory (created by make prepare)
│   ├── inputs/         # Raw and processed input images
│   ├── features/       # Extracted feature vectors
│   └── evaluation/     # Evaluation results
├── notebooks/          # Jupyter notebooks for analysis and visualization
├── reports/            # Generated evaluation reports and images
│   └── images/         # Visualizations and diagrams
├── src/                # Source code
│   ├── addons/         # Core functionality
│   │   ├── extraction/ # Feature extraction algorithms
│   │   └── metrics.py  # Evaluation metrics
│   ├── data/           # Data processing utilities
│   ├── features/       # Feature generation scripts
│   └── models/         # Prediction and model utilities
├── tests/              # Unit tests
├── Makefile            # Automation scripts
├── pyproject.toml      # Project dependencies and metadata
└── README.md           # Project documentation

Installation

Prerequisites

Python 3.12+
uv for dependency management
Kaggle account with API credentials (get from Kaggle Settings > API)

Setup

Clone the repository:

git clone https://github.com/schalappe/content-based-image-retrieval.git
cd content-based-image-retrieval

Set up the environment and install dependencies:
```
make venv
```
Prepare the project structure and configuration:
```
make prepare
```
Add your Kaggle credentials to .env (get from Kaggle Settings > API):
```
KAGGLE_USERNAME=your_username
KAGGLE_KEY=your_api_key
```
Download the Apparel Images Dataset from Kaggle:
```
make download
```
Generate feature vectors:
```
make features
```

Usage

Running Evaluations

To run the complete evaluation pipeline:

make predict

Exploring Results

To launch Jupyter notebook for result analysis:

make notebook

Running Tests

python -m unittest discover tests

Results and Findings

Our evaluation revealed significant differences in performance between traditional computer vision and deep learning approaches:

Performance Comparison

Method	Similarity	MAP	MRR	First Rank Accuracy	Avg Query Time (s)
EfficientNet	Euclidean	0.75	0.78	0.70	0.008
VGG16	Manhattan	0.72	0.76	0.68	0.009
NasNet	Euclidean	0.65	0.70	0.64	0.012
AKAZE	Manhattan	0.41	0.47	0.39	0.005
ORB	Euclidean	0.35	0.40	0.33	0.004

Key Findings

Neural networks significantly outperform traditional descriptors in retrieval accuracy
Euclidean and Manhattan distances consistently work better than cosine similarity
EfficientNet with Euclidean distance provides the best balance of accuracy and performance
Traditional methods (AKAZE, ORB) are faster but substantially less accurate
Cosine similarity performs poorly across all extraction methods

For detailed analysis and visualization of results, see the evaluation report.

License

This project is licensed under the MIT License.

schalappe/content-based-image-retrieval