iamnbutler/whispr
whispr
A general-purpose voice <-> crate — text-to-speech, speech-to-text, and audio-to-audio transformations. Also supports realtime conversations.
Overview
Whispr provides a clean, ergonomic API for working with audio AI services. It's designed to be provider-agnostic, though only openai is currently implemented.
Installation
[dependencies]
whispr = "0.1"
tokio = { version = "1", features = ["full"] }Quick Start
use whispr::{Client, TtsModel, Voice};
#[tokio::main]
async fn main() -> Result<(), whispr::Error> {
let client = Client::from_env()?; // reads OPENAI_API_KEY
// Text to Speech
let audio = client
.speech()
.text("Hello, world!")
.voice(Voice::Nova)
.generate()
.await?;
std::fs::write("hello.mp3", &audio)?;
Ok(())
}Features
Text to Speech
Convert text to natural-sounding audio with multiple voices and customization options.
use whispr::{Client, TtsModel, Voice, AudioFormat, prompts};
let client = Client::from_env()?;
let audio = client
.speech()
.text("Welcome to whispr!")
.voice(Voice::Nova)
.model(TtsModel::Gpt4oMiniTts)
.format(AudioFormat::Mp3)
.speed(1.0)
.instructions(prompts::FITNESS_COACH) // Voice personality (gpt-4o-mini-tts only)
.generate()
.await?;
std::fs::write("output.mp3", &audio)?;Available Voices: Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer, Verse
Available Models:
Gpt4oMiniTts— Latest model with instruction supportTts1— Optimized for speedTts1Hd— Optimized for quality
Speech to Text
Transcribe audio files to text with optional language hints.
let result = client
.transcription()
.file("recording.mp3").await?
.language("en")
.transcribe()
.await?;
println!("Transcription: {}", result.text);From bytes (useful for recorded audio):
let wav_data: Vec<u8> = record_audio();
let result = client
.transcription()
.bytes(wav_data, "recording.wav")
.transcribe()
.await?;Audio to Audio
Transcribe audio and generate new speech in one call — useful for voice transformation, translation, or processing pipelines.
let (transcription, audio) = client.audio_to_audio("input.mp3").await?;
println!("Said: {}", transcription.text);
std::fs::write("output.mp3", &audio)?;Streaming
For real-time applications, stream audio as it's generated:
use futures::StreamExt;
let mut stream = client
.speech()
.text("This is a longer text that will be streamed...")
.generate_stream()
.await?;
while let Some(chunk) = stream.next().await {
let bytes = chunk?;
// Process audio chunk in real-time
}Prompts
The prompts module includes pre-built voice personalities for common use cases:
use whispr::prompts;
client.speech()
.text("Let's get moving!")
.model(TtsModel::Gpt4oMiniTts)
.instructions(prompts::FITNESS_COACH)
.generate()
.await?;License
MIT License — see LICENSE for details.