Speech-to-Speech Translator

This project uses Google Cloud Speech-to-Text API to transcribe speech to text, DeepL API to translate the transcribed text, and ElevenLabs API to convert the translated text back to speech. This creates a seamless speech-to-speech translation system.

Prerequisites

Before running this project, ensure you have the following dependencies installed:

Python 3.7 or later
Google Cloud SDK (gcloud)
Pyaudio
Requests
Pygame
DeepL API key
ElevenLabs API key

Installation

Clone the repository:

git clone https://github.com/bykemalh/S2ST.git
cd S2ST

Set up a virtual environment:

python3 -m venv env
source env/bin/activate  # On Windows use `env\Scripts\activate`

Install the required Python packages:

pip install google-cloud-speech pyaudio deepl requests pygame

Install Google Cloud SDK:
Follow the installation instructions for your operating system here.

Authenticate with Google Cloud:

gcloud auth login
gcloud auth application-default login

Enable the Google Cloud Speech-to-Text API:

gcloud services enable speech.googleapis.com

Set up API keys:
Replace the placeholder values in the script with your actual DeepL and ElevenLabs API keys.
```
auth_key = "your-deepl-auth-key"
xi_api_key = "your-elevenlabs-api-key"
```

Running the Application

To run the application, simply execute the main.py script:

python S2ST_NewAdvanced.py

How It Works

Audio Input:
- The application opens a microphone stream using the pyaudio library and captures audio in real-time.
Speech-to-Text:
- The captured audio is sent to the Google Cloud Speech-to-Text API, which returns the transcribed text.
Translation:
- The transcribed text is translated to English using the DeepL API.
Text-to-Speech:
- The translated text is sent to the ElevenLabs API, which converts it to speech and plays it back.

Dependencies

Ensure you have the following libraries installed:

google-cloud-speech
pyaudio
deepl
requests
pygame

You can install these dependencies using the following command:

pip install google-cloud-speech pyaudio deepl requests pygame

Configuration

Modify the following variables in the script to match your settings:

auth_key: Your DeepL API key.
xi_api_key: Your ElevenLabs API key.
voice_id: The voice ID to be used with ElevenLabs API.
RATE: The audio sample rate (default is 16000).
CHUNK: The audio chunk size (default is 1600).

Logging

Logging is set up in the script to capture errors during the text-to-speech conversion process. You can enable more detailed logging by uncommenting the logging configuration line.

# logging.basicConfig(level=logging.DEBUG)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

If you wish to contribute to this project, please fork the repository and create a pull request.

Acknowledgments

Developed By

This algorithm was developed by Kemal Hafızoğlu.

bykemalh/S2ST