GitHunt
XA

xavriley/RMVPE

RMVPE

Warning

This repo is a vibe coded wrapper to allow easy pip installs. I haven't done any checks beyond basic functionality and this code should not be relied upon in public projects until a proper review and publication strategy have taken place. For the same reason, I haven't attempted to merge this upstream yet.

A pip-installable Python package for vocal pitch estimation in polyphonic music using the RMVPE (Robust Model for Vocal Pitch Estimation) deep learning model.

This is the PyTorch implementation of "RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music".

Features

  • ๐ŸŽต Accurate pitch estimation for vocals in polyphonic music
  • ๐Ÿš€ Easy to use Python API
  • ๐Ÿ“ฆ Automatic model downloading from HuggingFace
  • ๐ŸŽฏ Command-line interface for batch processing
  • โšก GPU acceleration support
  • ๐Ÿ”ง Half precision (FP16) support for faster inference

Installation

From source (for development)

# Clone the repository
git clone https://github.com/xavriley/RMVPE.git
cd RMVPE

# Install in editable mode
pip install -e .

Dependencies

The package requires:

  • Python >= 3.7
  • PyTorch >= 1.7.0
  • NumPy 1.26.4
  • librosa
  • tqdm >= 4.50.0

Usage

Python API

import librosa
from rmvpe import RMVPE

# Initialize the model (automatically downloads on first use)
model = RMVPE()

# Load audio file (must be 16kHz sample rate)
audio, sr = librosa.load("audio.wav", sr=16000)

# Estimate pitch
f0 = model.infer_from_audio(audio, thred=0.03)

# f0 is a numpy array of F0 values in Hz
# Zero values indicate unvoiced frames
# Hop length is 160 samples (10ms at 16kHz)

print(f"Estimated {len(f0)} pitch frames")

Advanced Usage

from rmvpe import RMVPE

# Use custom model path
model = RMVPE(model_path="/path/to/custom/model.pt")

# Use GPU with half precision for faster inference
model = RMVPE(device="cuda", is_half=True)

# Adjust sensitivity threshold (lower = more sensitive)
f0 = model.infer_from_audio(audio, thred=0.01)

Command-Line Interface

Process all audio files in a folder:

# Basic usage
rmvpe input_folder/ output_folder/

# Use custom model
rmvpe input_folder/ output_folder/ --model_path /path/to/model.pt

# Use GPU with half precision
rmvpe input_folder/ output_folder/ --device cuda --is_half

# Adjust sensitivity threshold
rmvpe input_folder/ output_folder/ --thred 0.05

The CLI will:

  • Process all audio files (.wav, .mp3, .flac, .ogg, .m4a) in the input folder
  • Save pitch estimates as CSV files (time, frequency) in the output folder
  • Skip files that have already been processed
  • Show a progress bar during processing

Output Format

The output is a NumPy array (or CSV file for CLI) containing F0 values:

  • F0 values: Fundamental frequency in Hz
  • Zero values: Indicate unvoiced/silent frames
  • Temporal resolution: 10ms per frame (hop length = 160 samples at 16kHz)

Example CSV output:

time,frequency
0.000,0.0
0.010,0.0
0.020,220.5
0.030,221.2
...

Model Details

  • Input: Audio waveform at 16kHz
  • Output: F0 contour with 10ms resolution
  • Architecture: Deep U-Net with BiGRU
  • Model size: ~85MB
  • Cache location: ~/.cache/rmvpe/rmvpe.pt

The model is automatically downloaded from HuggingFace on first use and cached locally.

API Reference

RMVPE Class

class RMVPE(model_path=None, is_half=False, device=None)

Parameters:

  • model_path (str, optional): Path to model weights. If None, downloads automatically.
  • is_half (bool): Use FP16 precision. Default: False.
  • device (str, optional): Device to use ('cuda' or 'cpu'). Default: auto-detect.

Methods:

infer_from_audio(audio, thred=0.03)

Estimate pitch from audio.

Parameters:

  • audio (np.ndarray): Audio array at 16kHz, shape (samples,)
  • thred (float): Threshold for voiced/unvoiced detection. Default: 0.03. Lower values are more sensitive.

Returns:

  • np.ndarray: F0 array in Hz. Zero values indicate unvoiced frames.

Training (Research Use)

This package focuses on inference only. For training code, please refer to the original research repository. The training scripts (train.py, evaluate.py) are included in the repository but not installed with the package.

Citation

If you use RMVPE in your research, please cite:

@article{rmvpe2023,
  title={RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music},
  author={},
  journal={arXiv preprint arXiv:2306.15412},
  year={2023}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

  • Original RMVPE paper and research
  • HuggingFace for hosting the pre-trained model
  • PyTorch and librosa communities
xavriley/RMVPE | GitHunt