BohaoSu
Subohao
Postdoctoral Researcher @ CMU-LTI WAVLab | Ph.D. @ NTHU-EE
Languages
Repos
23
Stars
3
Forks
1
Top Language
Python
Loading contributions...
Top Repositories
Repositories
23End-to-End Speech Processing Toolkit
No description provided.
Versatile Evaluation of Speech and Audio
Foundation Models and Data for Human-Human and Human-AI interactions.
No description provided.
ICASSP26
SALMONN family: A suite of advanced multi-modal LLMs
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
A TTS model capable of generating ultra-realistic dialogue in one pass.
AudioLDM: Generate speech, sound effects, music and beyond, with text.
A family of diffusion models for text-to-audio generation.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
visiting scholar at CMU-LTI WAVLab
Reproducible, flexible LLM evaluations
audio sample demo
No description provided.
Dual fisheye video stitching
No description provided.
No description provided.
No description provided.