Sakib Ahamed
zsxkib
Born too late to explore the earth. \\ Born too early to explore the universe. \\ Born just in time for the AI uprising.
Languages
Repos
167
Stars
422
Forks
131
Top Language
Python
Loading contributions...
Top Repositories
A WebUI to create song covers with any RVC v2 trained AI voice from YouTube videos or audio files.
🎨 Fill in masked parts of images with FLUX.1-dev 🖌️
Replicate Repo for InstantID : Instant Faceswap AI Avatars in Seconds 🔥
Playground v2 is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at Playground.
Create your own RVC v2 dataset from a youtube video
🎭Cogified version of MeiGen-AI/InfiniteTalk Unlimited-length talking video generation that supports image-to-video and video-to-video generation🗣️
Repositories
167No description provided.
🖼️Cogified implementation of FramePack: video diffusion, but feels like image diffusion
🎭Cogified version of MeiGen-AI/InfiniteTalk Unlimited-length talking video generation that supports image-to-video and video-to-video generation🗣️
🙊Cogified speech-to-text model nvidia/canary-qwen-2.5b (best ASR model according to hf-audio/open_asr_leaderboard as of 18/Jul/2025)🎙️
🎨 Fill in masked parts of images with FLUX.1-dev 🖌️
A ComfyUI based Wan (video generation) LoRa Trainer
No description provided.
Easily create video datasets with auto-captioning for Hunyuan-Video LoRA finetuning
Run ComfyUI with an API
Replicate Cog'ified MMAudio
🎨 Native AI image generation for Apple Silicon with Qwen-Image. Lightning LoRA acceleration for fast 4–8 step runs. Zero Docker, just works.
Lightweight coding agent that runs in your terminal
Read-only MCP server for Slack workspace data
A WebUI to create song covers with any RVC v2 trained AI voice from YouTube videos or audio files.
Create your own RVC v2 dataset from a youtube video
AuraSR v2: Second-gen GAN-based Super-Resolution for real-world applications
Voice data <= 10 mins can also be used to train a good VC model!
No description provided.
🚀 Google's compact 300M parameter embedding model for production-ready semantic search and text similarity tasks 🎯
🗣️Generate high-quality multilingual speech from text with reference audio styling, supporting 23 languages
🎼Cog'd Advancing Audio Intelligence with Fully Open Large Audio-Language Models🎶
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
🎙️Hololive text-to-speech and voice-to-voice (Japanese🇯🇵 + English🇬🇧)
Playground v2 is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at Playground.
🗣️MultiTalk all wrapped in Cog🎙️
🖋️➡️📱Converts handwritten text images into digital text
Replicate Repo for InstantID : Instant Faceswap AI Avatars in Seconds 🔥
TTDS Group Project: Video Games Search Engine. Sakib Ahamed. Dan Buxton, Kenza Amira, Wini Lau, Mansoor Ahmad
Cog wrapper for SeedVR2 (3B/7B) video & image restoration with optional color fix
🤪Cogifed version of Tencent (Hunyuan)'s Open-Source Lip-Sync Model HunyuanVideo-Avatar🫦