PT
ptthanh02/airflow-vector-search-etl
A pipeline designed for intelligent, semantic document searching
๐ Airflow Vector Search ETL
Project Overview
This project implements a robust data pipeline for intelligent document search, leveraging modern technologies to create a seamless information retrieval system.
Key Technologies
- Apache Airflow
- MongoDB
- QdrantDB (Vector Database)
- Python
- Docker Compose
๐ Features
- Automated data ingestion pipeline
- Vector-based semantic search
- Scalable microservices architecture
- Easy deployment with Docker
๐ Prerequisites
- Docker
- Docker Compose
- Python 3.9+
๐ฆ Getting Started
1. Clone the Repository
git clone https://github.com/yourusername/intelligent-document-search.git
cd intelligent-document-search2. Load Docker Images
docker load -i mongo.tar
docker load -i qdrant.tar
docker load -i postgres.tar
docker load -i redis.tar
docker load -i python3911.tar3. Start the Data Pipeline
cd PhamTienThanh_12345678
docker compose up --build๐ Access Points
-
Airflow UI: http://localhost:8080
- Username: airflow
- Password: airflow
-
QdrantDB Dashboard: http://localhost:6333/dashboard
-
MongoDB Compass Connection:
- URL: mongodb://localhost:27017
- Username: admin
- Password: admin
๐ Pipeline Workflow
The Airflow DAG performs the following steps:
- Initialize collection in QdrantDB
- Insert random data into MongoDB
- Transfer data to QdrantDB
- Count and verify data
- Perform vector-based search
On this page
Languages
Jupyter Notebook78.6%Python20.3%Dockerfile1.2%
Contributors
MIT License
Created March 5, 2025
Updated September 10, 2025