AlejandroPqLz/tf-diffusion-scratch
Implementing a Denoising Diffsuion Probabilistic Model (DDPM) on Tensorflow from scratch for Pokémon sprites synthesis
🔍 Project Overview
Implementing a conditioned Denoising Diffusion Probabilistic Model (DDPM) on TensorFlow from Scratch for Pokémon generation and understanding the mathematics and theory behind it. Therefore, to achieve this goal, the Pokémon sprites dataset will be used: Pokémon sprite images with license: 
This project has been developed for my Bachelor's Thesis in Data Science and Artificial Intelligence at Universidad Politécnica de Madrid (UPM).
NOTE: Since this project is for a spanish college institution, the jupyter-notebook's markdowns and the thesis document are in spanish 🇪🇸. However, the code and comments are in english 🇬🇧.
📂 Structure
The structure of the repository is as follows:
📦tf-diffusion-scratch
┣ 📂.devcontainer
┣ 📂app
┃ ┣ 📂src_app
┃ ┃ ┣ 📜__init__.py
┃ ┃ ┣ 📜icon_loader.py
┃ ┃ ┗ 📜model_loader.py
┃ ┗ 📜diffusion_app.py
┣ 📂data
┃ ┣ 📂interim
┃ ┣ 📂processed
┃ ┗ 📂raw
┣ 📂docs
┃ ┣ 📂bachelor_thesis
┃ ┣ 📂papers
┃ ┗ 📂study
┣ 📂figures
┃ ┣ 📂app_figures
┃ ┣ 📂notebook_figures
┃ ┣ 📂readme_figures
┃ ┗ 📂sampling_model_figures # RESULT IMAGES
┣ 📂model_weights
┃ ┣ 📂interim
┃ ┣ 📂overfitting
┃ ┃ ┗ 📜overfitting_diffusion_32x32_batch128_epochs200.weights.h5
┃ ┣ 📂test_upload
┃ ┗ 📜final_diffusion_model.weights.h5
┣ 📂notebooks
┃ ┣ 📂test
┃ ┣ 📜00-Intro-and-Analysis.ipynb
┃ ┣ 📜01-Dataset-Creation.ipynb
┃ ┣ 📜02-Diffusion-Model-Architecture.ipynb
┃ ┣ 📜03-Diffusion-Process.ipynb
┃ ┣ 📜04-Training-Diffusion-Model.ipynb
┃ ┗ 📜05-DDPM-final-model.ipynb
┣ 📂src
┃ ┣ 📂data
┃ ┃ ┣ 📜__init__.py
┃ ┃ ┣ 📜create_dataset.py
┃ ┃ ┣ 📜path_loader.py
┃ ┃ ┗ 📜preprocess.py
┃ ┣ 📂model
┃ ┃ ┣ 📜__init__.py
┃ ┃ ┣ 📜build_model.py
┃ ┃ ┣ 📜diffusion_funcionality.py
┃ ┃ ┗ 📜model_callbacks.py
┃ ┣ 📂utils
┃ ┃ ┣ 📜__init__.py
┃ ┃ ┣ 📜config.py
┃ ┃ ┗ 📜utils.py
┃ ┣ 📂visualization
┃ ┃ ┣ 📜__init__.py
┃ ┃ ┗ 📜visualize.py
┃ ┗ 📜__init__.py
┣ 📜.gitattributes
┣ 📜.gitignore
┣ 📜LICENSE
┣ 📜README.md
┣ 📜config.ini
┗ 📜setup.py
🚀 Prerequisites
This project contains dependencies outside the scope of python. Therefore, you need to perform additional steps.
It is recommended to use a Linux (Ubuntu) distribution for this project, since it is the most common OS for data science and artificial intelligence tasks and for that reason, NVIDIA GPU configurations are easier to set up.
Not only that, but also because it is the simplest way to configure and maintain the project code overtime since we will be using a Docker container, avoiding any compatibility issues with the OS and if the is any issue update or upgrade, it can be easily resolved by just rebuilding the container.
However, you can also use Windows with WSL2 or macOS. The requirements for each OS are as follows:
| Windows | Linux (Ubuntu) recommended | macOS |
|---|---|---|
|
|
|
🔧 OS Configuration
1. NVIDIA GPU Configuration (Windows and Linux)
In order to use the GPU for training the model, you need to install the NVIDIA drivers, CUDA and cuDNN.
Even though the project is developed in TensorFlow and therefore not all CUDA and cuDNN versions are compatible with the version of TensorFlow used, for the GPU to work properly, the versions of CUDA and cuDNN and the NVIDIA drivers must be the most recent ones.
1.1 Install NVIDIA drivers:
| Windows | Linux (Ubuntu) |
|---|---|
|
|
After these steps, when executing the nvidia-smi command, you should see the following output:
user@user:~$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 ... Off | 00000000:01:00.0 On | N/A |
| N/A 41C P8 15W / 70W | 73MiB / 6144MiB | 18% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+1.2 Install CUDA toolkit:
Download and install the CUDA toolkit following the instructions for your OS, if you have any issues, visit the CUDA installation guide:
- Windows: Install CUDA toolkit on Windows
- WSL2: Install CUDA toolkit on WSL2
- Ubuntu: Install CUDA toolkit on Ubuntu
After that, open a terminal and run the following command to check the CUDA installation:
-
For WSL2 and Ubuntu:
sudo apt install nvidia-cuda-toolkit # to avoid any issues with the CUDA installationnvcc --version # to check the CUDA version -
For Windows:
nvcc --version # to check the CUDA version
1.3 Install cuDNN:
Install cuDNN following the instructions for your OS, if you have any issues, visit the cuDNN installation guide:
- Windows (WSL2): Install cuDNN on Windows
- Ubuntu: Install cuDNN on Ubuntu
2. Windows Subsystem for Linux (WSL2) Configuration
After installing the NVIDIA drivers, CUDA and cuDNN, if you are going to develop the project on Windows, you need to set up WSL2 to use the GPU for training the model. To do this, follow the steps below:
2.1 Conda Environment
We will use conda to manage the python environment. You can install it following the Miniconda installation guide. After installing miniconda, create a new environment with the following command:
# Create the environment
conda create -n diffusion_env python=3.12 -y # Activate the environment
conda activate diffusion_env2.2 CUDA and cuDNN compatible versions
Since the model is implemented in TensorFlow, you need to install the versions of CUDA and cuDNN that are compatible with the version of TensorFlow you are using. For more information, visit the TensorFlow versions compatibility. For this project, since we are using TensorFlow 2.16.1, we need to install CUDA 12.3 and cuDNN 8.9, to do so, just execute the following commands:
# Install CUDA 12.3
conda install nvidia/label/cuda-12.3.2::cuda-toolkit # Install cuDNN 8.9
conda install -c conda-forge cudnn=8.9And finally, set the environment variables to use the CUDA and cuDNN libraries every time the environment is activated:
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh2.3 External Dependencies
Once the environment is activated, you can install the external dependencies by running the following command:
pip install -e .And you are ready to go!
3. Linux (Ubuntu) Configuration
After installing the NVIDIA drivers, CUDA and cuDNN, if you are going to develop the project on Ubuntu, you can follow the same steps as in the Windows Subsystem for Linux (WSL2) Configuration section but having in mind that you are working on a Linux distribution it is recommended to use Docker to create a container with all the dependencies installed and avoid any compatibility and version issues.
⚠
WARNING: Docker set up approach is not recommended for WSL2 nor Windows, since the there are many issues regarding the CPU usage making it unworkable (more info).
3.1 Install the NVIDIA Container Toolkit
Follow the NVIDIA Container Toolkit Guide
After installing the NVIDIA Container Toolkit, you can check the installation by running the following command:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smiIf you get an error when checking the installation, just follow the next steps:
# Restart the Docker service sudo systemctl restart docker # Open the Docker configuration file of nvidia-container-runtime sudo nano /etc/nvidia-container-runtime/config.toml # Set no-cgroups = true ... no-cgroups = true ... # Save and close the file and check the installation again sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
3.2 Pull the tensorflow-gpu-jupyter image (Optional)
This image contains all the correct dependencies for TensorFlow with CUDA and cuDNN installed and a Jupyter notebook server to develop the project (if not, pull it will be automatically pulled in the next step). You can pull the image with the following command:
docker pull tensorflow/tensorflow:latest-gpu-jupyter3.3 Build the container
Since the project has a Dev Container configuration file in .devcontainer folder, you just need to, in VSCode, open the project folder and click on the Reopen in Container button that appears in the bottom right corner of the window. Or you can do it at any time by opening the command palette with Ctrl+Shift+P and type Reopen in Container.
This will pull the tensorflow-gpu-jupyter image if not pulled before and build a container using the custom Dockerfile for the project with all the dependencies needed.
In order to avoid possible issues with the container not detecting some versions of the libraries, just run the following command in the container terminal to install the external dependencies declared in the setup.py file:
pip install -e .Finally, when running any Jupyter notebook, choose the python version that matches the one the image was built with. To check the python version, just run the following command in container terminal:
python --versionTo this date, the image is built with python 3.11.0rc1, therefore you need to select the python 3.11.0 kernel in the Jupyter notebook.
And voilà! You have a container with all the dependencies installed and ready to go!:
After that, if any issue or problem arises, just rebuild the container using the command palette and selecting the Rebuild Container option.
4. macOS Configuration
Finally, if you are going to develop the project on macOS, you can follow the next steps based on TensorFlow Metal but adapting it to the project dependencies:
4.1 Conda Environment
We will follow the same first steps as in the Windows Subsystem for Linux (WSL2) Configuration section, since we are going to use a coda environment to manage the dependencies. Therefore, install miniconda following the Miniconda installation guide. After installing miniconda, create a new environment with the following command:
# Create the environment
conda create -n diffusion_env python=3.12 -y
# Activate the environment
conda activate diffusion_env
# Install external dependencies
pip install -e .4.2 TensorFlow for macOS
TensorFlow does not support GPU acceleration on macOS with CUDA and cuDNN, so you need to install the specific version for macOS. To do so, just run the following command:
pip install tensorflow-metalNow you are ready to go!
📊 Data
As mentioned before, the dataset used in this project is the Pokémon sprite images from Kaggle.
The dataset contains +10,000 Pokémon sprites in PNG format (half of them are shiny variants) in 96x96 resolution from 898 Pokémon in different games, and their corresponding labels that may relate to their design in a CSV file. These aspects will be analysed deeper in the 00-Intro-and-Analysis.ipynb notebook.
🛠️ Usage
After following the steps described in the Prerequisites section, you can start using the project by running the notebooks in the notebooks folder. Which contain the whole process of the project from the dataset creation to the model training.
Before diving into the notebooks, have a look at the config.ini file in the root of the project and adapt it to your needs. This file will contain all the hyperparameters for the model training. Once done that, you can run the notebooks in the pre-established order where:
-
00-Intro-and-Analysis.ipynb: Introduces the project and analyses the Pokémon sprites dataset and
pokedex.csvfile. -
01-Dataset-Creation.ipynb: Gives multiple choices to create the dataset for the model and offers a raw dataset to custom the dataset creation process. Finally, it saves the dataset in the
data/processed/pokemon_tf_datasetfolder as aTensorflow Dataset. -
02-Diffusion-Model-Architecture.ipynb: Defines the model architecture
Unetand explain the theory behind it. -
03-Diffusion-Process.ipynb: Defines and explain the diffusion functionalities for the model architecture:
forward,reverse,sampleand leaves thetrainingprocess for the next notebook. -
04-Training-Diffusion-Model.ipynb: Defines and explains the training diffusion process and trains the model with the dataset created in the
01-Dataset-Creation.ipynbnotebook. -
05-Evaluate-Diffusion-Samples.ipynb: Generates samples from the trained model.
🎨 Streamlit App
The project also contains a Streamlit app to generate Pokémon sprites using the trained model. The app is located in the app folder.
⚠
WARNING: Before running the app, make sure you have decompressed thefinal_diffusion_model.weights.h5file in themodel_weightsfolder. To do so, just run the following command in the root of the project:cd model_weights 7z x final_diffusion_model.7z.001
To run the app, just execute the following command in the root of the project:
streamlit run app/diffusion_app.pyThis will open a new tab in your default browser with the app running. You can select the Pokémon type and the number of samples to generate and click on the Generate button to see the results. After that you can download the generated sprites by clicking on the Download button as shown in the following screenshot:
📚 Resources
-
Thesis report, resources and tutorials that have been found useful for this project are located in the /docs folder.
-
Conda environment installation and management: Conda documentation.
-
Docker installation and management: Docker documentation.
-
NVIDIA GPU configuration: NVIDIA documentation, CUDA installation guide, cuDNN installation guide.
-
TensorFlow installation: TensorFlow documentation.
-
Git LFS to upload large files into the repository:
Git Large File Storage (LFS) replaces large files such as datasets, models or weights with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.
For more info, visit: Git LFS repository.⚠ WARNING: Every account using Git Large File Storage receives 1 GiB of free storage and 1 GiB a month of free bandwidth, so in order to avoid any issues uploading heavy files, it is recommended to only upload the heavy files one at a time and do not commit other changes additionally.
🌱 Contributing
If you wish to make contributions to this project, please initiate the process by opening an issue or submitting a pull request that encapsulates your proposed modifications.
🗞️ License
This project is licensed under the MIT License - see the LICENSE file for details.
👥 Contact
Should you have any inquiries or require assistance, please do not hesitate to contact Alejandro Pequeño Lizcano.
Gotta create 'em all!




