Tolga Cangöz
tolgacangoz
...to boldly go where no one has gone before...
Languages
Top Repositories
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
The Modular Platform (includes MAX & Mojo)
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
Machine Learning Engineering Open Book
Repositories
43🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
The Modular Platform (includes MAX & Mojo)
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
Machine Learning Engineering Open Book
No description provided.
Optimizing diffusion for production-ready speeds
Making Flux go brrr on GPUs.
Faster generation with text-to-image diffusion models.
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Learn GPU Programming in Mojo🔥 by Solving Puzzles
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Create your own programming language with Rust
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
Fast and memory-efficient exact attention
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Tile primitives for speedy kernels
A unified inference and post-training framework for accelerated video generation.
Development repository for the Triton language and compiler
SGLang is a fast serving framework for large language models and vision language models.
Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
A framework for efficient model inference with omni-modality models
Learn how to design, develop, deploy and iterate on production-grade ML applications.
A batched offline inference oriented version of segment-anything
Full stack, modern web application template. Using FastAPI, React, SQLModel, PostgreSQL, Docker, GitHub Actions, automatic HTTPS and more.
Examples of programs built using Modal
BentoDiffusion: A collection of diffusion models served with BentoML
Complete deep learning project developed in Full Stack Deep Learning, 2022 edition. Generated automatically from https://github.com/full-stack-deep-learning/fsdl-text-recognizer-2022
200+ detailed flashcards useful for reviewing topics in machine learning, computer vision, and computer science.