NickLitwinow/XLSX-Assembler-Public
XLSX Assembler – ETL Tool for automating the extraction, transformation, and loading of data from multiple Excel files. Built with: Python, Airflow, Cron, Redis, Pandas, Openpyxl, PyQT5, Docker.
XLSX Assembler – ETL Tool for Merging Excel Data
Architecture
Built With
This project was built using these technologies.
- Python
- Airflow
- Cron
- Redis
- Pandas
- Openpyxl
- PyQT5
- Docker
Features
🚀 Efficient ETL Process
Automates the extraction, transformation, and loading (ETL) of data from multiple Excel files using Airflow.
(Only specific excel structure)
📊 Advanced Data Processing
Leverages the power of Pandas and Openpyxl for fast and accurate data reading, processing, and styling.
💻 Intuitive GUI with PyQt5
Includes a user-friendly graphical interface for selecting files and tracking real-time progress.
⚡ Performance Optimization
Optimized for reduced system load and faster data processing using Redis, ensuring efficient handling of large datasets.
Getting Started
Prerequisites:
PythonandDockerinstalled on your machine
🛠 Installation and Setup Instructions
-
Clone the repository:
git clone https://github.com/NickLitwinow/XLSXAssembler_Public.git -
Navigate into the
srcdirectorycd src/ -
(Terminal 1) Run the ETL client:
python app.py -
(Terminal 2) Build the Docker image (
sudomay require):
docker build . --tag extending_airflow:latest -
(Terminal 2) Run
docker-compose up -dcommand to start docker services. -
(Terminal 2) (Optional) Run
docker-compose down -vcommand to end docker services.
The PyQt5 GUI will launch, where you can select multiple Excel files and begin the ETL process.
Runs the app in the development mode.
Usage Instructions Example
-
In the ETL client click
Add Filebutton and select files from theexample files(You can add them again later if you want so) -
(Optional) To remove a file from selected, click on it's path (element) in the black selection window. Click
Remove Fileto remove the file. -
Click
Merge Filesto name the output file and choose it's destination. The ETL process will start afterwards. -
To view the Airflow Dag process:
- Open
http://localhost:8080/homein your browser. - Enter Login:
airflowand Password:airflow. - (Info) If you just ran the
docker-compose up -dit may take some time for airflow to load.
- To view the Radis database:
- Open
http://localhost:8001/in your browser. - Accept "EULA and Privacy Settings"
- Click
I already have a database - Click
Connect to a Radis Databasewith Host:redis, Port:6379, Name:redis-local - Click
ADD REDIS DATABASE - Select the
redis-localdatabase.
Show your support
Give a ⭐ if you like this project!


