C-S26/film-wgan-password-generation
Implementation of FiLM-Enhanced WGAN for Conditional Password Generation (IEEE ICAIC 2026)
FiLM-Enhanced WGAN for Conditional Password Generation
Official implementation of the research paper:
“Conditional Password Generation Using a FiLM-Enhanced WGAN: A Controlled Comparison Against Standard GAN Baselines.”
Published in:
5th IEEE International Conference on AI in Cybersecurity (ICAIC 2026)
University of Houston, USA.
Abstract
This project investigates the use of Feature-wise Linear Modulation (FiLM) within a conditional Wasserstein GAN (WGAN-GP) for modeling password distributions.
The study compares four architectures under a unified preprocessing and training pipeline:
- PassGAN
- PaC-GAN
- WGAN-CGAN
- FiLM-WGAN-CGAN (proposed model)
The FiLM-enhanced generator improves conditional fidelity and structural consistency while maintaining competitive diversity.
All experiments follow strict ethical guidelines and report aggregate statistics only.
Repository Structure
film-wgan-password-generation
│
├── notebooks
│ ├── 01_preprocess.ipynb
│ ├── 02_train_model.ipynb
│ └── 03_evaluation.ipynb
│
├── src
│ ├── preprocess.py
│ ├── train.py
│ ├── evaluate.py
│ └── model.py
│
├── figures
│ ├── architecture.png
│ ├── loss_curve.png
│
├── requirements.txt
├── README.md
├── LICENSE
└── .github/workflows/build.yml
Dataset
This project uses the PasswordCollection dataset.
Source:
https://github.com/yuqian5/PasswordCollection
The dataset is not included in this repository.
Preprocessing converts passwords into a fixed-length integer representation (16 tokens).
Preprocessing Pipeline
The preprocessing pipeline performs the following steps:
- Unicode normalization (NFKD)
- ASCII filtering (94 printable characters)
- Password truncation or padding to length 16
- Integer token encoding
- Export dataset as:
train_data.npy
chars.txt
Training Configuration
All models are trained using identical hyperparameters:
| Parameter | Value |
|---|---|
| Optimizer | Adam |
| β1 | 0 |
| β2 | 0.9 |
| Generator LR | 2×10⁻⁵ |
| Critic LR | 1×10⁻⁴ |
| Batch Size | 96 |
| Latent Dimension | 128 |
| Sequence Length | 16 |
| Critic Iterations | 5 |
| Gradient Penalty | λ = 10 |
| Training Budget | 60 epochs |
Conditional experiments use a balanced subset of 250k samples.
Training Environment
Experiments were executed on Kaggle notebooks using:
- GPU: NVIDIA T4 (16 GB VRAM)
- Python: 3.11
- TensorFlow: 2.15
Evaluation Metrics
The following metrics are used for evaluation:
Uniqueness
Fraction of unique samples among generated outputs.
N-gram Coverage
Coverage of 2-gram, 3-gram, and 4-gram patterns between generated and real samples.
Character-Class Fidelity
Comparison of character class distributions between generated and real data.
Conditional Fidelity
Accuracy of generating passwords within the requested length bucket.
Quick Start
Install dependencies:
pip install -r requirements.txt
Preprocess dataset:
python src/preprocess.py
Train model:
python src/train.py
Evaluate results:
python src/evaluate.py
Ethical Considerations
This repository does not release raw password data or generated password samples.
Only aggregate statistics are reported to prevent misuse.
Model Architecture
The proposed FiLM-WGAN-CGAN architecture used for conditional password generation.
Training Loss Curves
Representative generator and critic loss curves during training.
Generated Length Distribution
Comparison of password length distributions between real data and generated samples.
Conditional Fidelity Confusion Matrix
Confusion matrix comparing requested length buckets vs generated outputs.
Character Class Distribution
Comparison of character-class frequencies between generated passwords and real data.
License
This project is licensed under the MIT License.
See LICENSE for details.
The dataset is downloaded during preprocessing and is not included in this repository.




