GitHunt
PT

ptthanh02/airflow-vector-search-etl

A pipeline designed for intelligent, semantic document searching

๐Ÿš€ Airflow Vector Search ETL

Project Overview

This project implements a robust data pipeline for intelligent document search, leveraging modern technologies to create a seamless information retrieval system.

Key Technologies

  • Apache Airflow
  • MongoDB
  • QdrantDB (Vector Database)
  • Python
  • Docker Compose

๐ŸŒŸ Features

  • Automated data ingestion pipeline
  • Vector-based semantic search
  • Scalable microservices architecture
  • Easy deployment with Docker

๐Ÿ›  Prerequisites

  • Docker
  • Docker Compose
  • Python 3.9+

๐Ÿšฆ Getting Started

1. Clone the Repository

git clone https://github.com/yourusername/intelligent-document-search.git
cd intelligent-document-search

2. Load Docker Images

docker load -i mongo.tar
docker load -i qdrant.tar
docker load -i postgres.tar
docker load -i redis.tar
docker load -i python3911.tar

3. Start the Data Pipeline

cd PhamTienThanh_12345678
docker compose up --build

๐Ÿ” Access Points

๐Ÿ“ Pipeline Workflow

The Airflow DAG performs the following steps:

  1. Initialize collection in QdrantDB
  2. Insert random data into MongoDB
  3. Transfer data to QdrantDB
  4. Count and verify data
  5. Perform vector-based search
ptthanh02/airflow-vector-search-etl | GitHunt