whispr

A general-purpose voice <-> crate — text-to-speech, speech-to-text, and audio-to-audio transformations. Also supports realtime conversations.

Overview

Whispr provides a clean, ergonomic API for working with audio AI services. It's designed to be provider-agnostic, though only openai is currently implemented.

Installation

[dependencies]
whispr = "0.1"
tokio = { version = "1", features = ["full"] }

Quick Start

use whispr::{Client, TtsModel, Voice};

#[tokio::main]
async fn main() -> Result<(), whispr::Error> {
    let client = Client::from_env()?; // reads OPENAI_API_KEY

    // Text to Speech
    let audio = client
        .speech()
        .text("Hello, world!")
        .voice(Voice::Nova)
        .generate()
        .await?;

    std::fs::write("hello.mp3", &audio)?;
    Ok(())
}

Features

Text to Speech

Convert text to natural-sounding audio with multiple voices and customization options.

use whispr::{Client, TtsModel, Voice, AudioFormat, prompts};

let client = Client::from_env()?;

let audio = client
    .speech()
    .text("Welcome to whispr!")
    .voice(Voice::Nova)
    .model(TtsModel::Gpt4oMiniTts)
    .format(AudioFormat::Mp3)
    .speed(1.0)
    .instructions(prompts::FITNESS_COACH) // Voice personality (gpt-4o-mini-tts only)
    .generate()
    .await?;

std::fs::write("output.mp3", &audio)?;

Available Voices: Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer, Verse

Available Models:

Gpt4oMiniTts — Latest model with instruction support
Tts1 — Optimized for speed
Tts1Hd — Optimized for quality

Speech to Text

Transcribe audio files to text with optional language hints.

let result = client
    .transcription()
    .file("recording.mp3").await?
    .language("en")
    .transcribe()
    .await?;

println!("Transcription: {}", result.text);

From bytes (useful for recorded audio):

let wav_data: Vec<u8> = record_audio();

let result = client
    .transcription()
    .bytes(wav_data, "recording.wav")
    .transcribe()
    .await?;

Audio to Audio

Transcribe audio and generate new speech in one call — useful for voice transformation, translation, or processing pipelines.

let (transcription, audio) = client.audio_to_audio("input.mp3").await?;

println!("Said: {}", transcription.text);
std::fs::write("output.mp3", &audio)?;

Streaming

For real-time applications, stream audio as it's generated:

use futures::StreamExt;

let mut stream = client
    .speech()
    .text("This is a longer text that will be streamed...")
    .generate_stream()
    .await?;

while let Some(chunk) = stream.next().await {
    let bytes = chunk?;
    // Process audio chunk in real-time
}

Prompts

The prompts module includes pre-built voice personalities for common use cases:

use whispr::prompts;

client.speech()
    .text("Let's get moving!")
    .model(TtsModel::Gpt4oMiniTts)
    .instructions(prompts::FITNESS_COACH)
    .generate()
    .await?;

License

MIT License — see LICENSE for details.

iamnbutler/whispr