Yuchen Hu
YUCHEN005
Ph.D. student at NTU, research focus on speech, LLM and multimodal.
Languages
Top Repositories
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"
Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"
Code for paper "Noise-aware Speech Enhancement using Diffusion Probabilistic Model"
Code for paper "Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation"
Code for paper "Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition"
Repositories
18Code for paper "Unsupervised Noise adaptation using Data Simulation"
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"
Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"
Code for paper "Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition"
Code for paper "Noise-aware Speech Enhancement using Diffusion Probabilistic Model"
Code for paper "Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition"
Code for paper "Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition"
Code for paper "Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation"
This is a public repository for RATS Channel-A Speech Data, which is a chargeable noisy speech dataset under LDC. Here we release its Log-Mel Fbank features and several raw wavform listening samples.
AcadHomepage: A Modern and Responsive Academic Personal Homepage
No description provided.
No description provided.
No description provided.
Code for paper "MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition"
Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"
Single-blind supplementary materials for NeurIPS 2023 submission
No description provided.