GitHunt
SC

schalappe/content-based-image-retrieval

The goal is to find the best algorithm for content-based image retrieval.

Content-based Image Retrieval (CBIR)

CBIR

A system that finds similar images using visual features rather than metadata. This project compares various feature
extraction algorithms and similarity metrics to determine the most effective combination for image retrieval tasks.

Table of Contents

  1. Overview
  2. Project Structure
  3. Installation
  4. Usage
  5. Results and Findings
  6. License

Overview

Content-based image retrieval (CBIR) is a computer
vision technique for searching digital images in large databases based on their visual content rather than metadata
or text annotations.

The process consists of four steps:

  1. Extract features from an image database to form a feature database
  2. Extract features from the query image
  3. Find the most similar features in the database
  4. Return the images associated with the most similar features

Research Objective

This project compares different combinations of feature extraction algorithms and similarity metrics to identify
the most effective approach for image retrieval tasks.

Feature Extraction Methods

  • Traditional Computer Vision:
    • AKAZE - Fast local feature detector and descriptor
    • ORB - Oriented FAST and Rotated BRIEF features
  • Deep Learning:
    • VGG16 - 16-layer CNN architecture
    • NasNet - Neural Architecture Search Network
    • EfficientNet - Scalable and efficient CNN

Similarity Metrics

Evaluation Dataset

We used the Apparel Images Dataset from Kaggle, which
contains various clothing items organized by category.

Evaluation Metrics

  • Mean Average Precision (MAP) - Assesses overall system retrieval quality
  • Mean Reciprocal Rank (MRR) - Evaluates the rank of the first relevant item
  • First Rank Accuracy - Percentage of queries where the first result is relevant
  • Average query time - Performance measurement in seconds

Project Structure

├── data/               # Data directory (created by make prepare)
│   ├── inputs/         # Raw and processed input images
│   ├── features/       # Extracted feature vectors
│   └── evaluation/     # Evaluation results
├── notebooks/          # Jupyter notebooks for analysis and visualization
├── reports/            # Generated evaluation reports and images
│   └── images/         # Visualizations and diagrams
├── src/                # Source code
│   ├── addons/         # Core functionality
│   │   ├── extraction/ # Feature extraction algorithms
│   │   └── metrics.py  # Evaluation metrics
│   ├── data/           # Data processing utilities
│   ├── features/       # Feature generation scripts
│   └── models/         # Prediction and model utilities
├── tests/              # Unit tests
├── Makefile            # Automation scripts
├── pyproject.toml      # Project dependencies and metadata
└── README.md           # Project documentation

Installation

Prerequisites

Setup

  1. Clone the repository:

    git clone https://github.com/schalappe/content-based-image-retrieval.git
    cd content-based-image-retrieval
  2. Set up the environment and install dependencies:

    make venv
  3. Prepare the project structure and configuration:

    make prepare
  4. Add your Kaggle credentials to .env (get from Kaggle Settings > API):

    KAGGLE_USERNAME=your_username
    KAGGLE_KEY=your_api_key
    
  5. Download the Apparel Images Dataset from Kaggle:

    make download
  6. Generate feature vectors:

    make features

Usage

Running Evaluations

To run the complete evaluation pipeline:

make predict

Exploring Results

To launch Jupyter notebook for result analysis:

make notebook

Running Tests

python -m unittest discover tests

Results and Findings

Our evaluation revealed significant differences in performance between traditional computer vision and deep learning approaches:

Performance Comparison

Method Similarity MAP MRR First Rank Accuracy Avg Query Time (s)
EfficientNet Euclidean 0.75 0.78 0.70 0.008
VGG16 Manhattan 0.72 0.76 0.68 0.009
NasNet Euclidean 0.65 0.70 0.64 0.012
AKAZE Manhattan 0.41 0.47 0.39 0.005
ORB Euclidean 0.35 0.40 0.33 0.004

Key Findings

  • Neural networks significantly outperform traditional descriptors in retrieval accuracy
  • Euclidean and Manhattan distances consistently work better than cosine similarity
  • EfficientNet with Euclidean distance provides the best balance of accuracy and performance
  • Traditional methods (AKAZE, ORB) are faster but substantially less accurate
  • Cosine similarity performs poorly across all extraction methods

For detailed analysis and visualization of results, see the evaluation report.

License

This project is licensed under the MIT License.