bykemalh/S2ST
Speech to Speech Translation Python
Speech-to-Speech Translator
This project uses Google Cloud Speech-to-Text API to transcribe speech to text, DeepL API to translate the transcribed text, and ElevenLabs API to convert the translated text back to speech. This creates a seamless speech-to-speech translation system.
Prerequisites
Before running this project, ensure you have the following dependencies installed:
- Python 3.7 or later
- Google Cloud SDK (gcloud)
- Pyaudio
- Requests
- Pygame
- DeepL API key
- ElevenLabs API key
Installation
-
Clone the repository:
git clone https://github.com/bykemalh/S2ST.git cd S2ST -
Set up a virtual environment:
python3 -m venv env source env/bin/activate # On Windows use `env\Scripts\activate`
-
Install the required Python packages:
pip install google-cloud-speech pyaudio deepl requests pygame
-
Install Google Cloud SDK:
Follow the installation instructions for your operating system here. -
Authenticate with Google Cloud:
gcloud auth login gcloud auth application-default login
-
Enable the Google Cloud Speech-to-Text API:
gcloud services enable speech.googleapis.com -
Set up API keys:
Replace the placeholder values in the script with your actual DeepL and ElevenLabs API keys.auth_key = "your-deepl-auth-key" xi_api_key = "your-elevenlabs-api-key"
Running the Application
To run the application, simply execute the main.py script:
python S2ST_NewAdvanced.pyHow It Works
-
Audio Input:
- The application opens a microphone stream using the
pyaudiolibrary and captures audio in real-time.
- The application opens a microphone stream using the
-
Speech-to-Text:
- The captured audio is sent to the Google Cloud Speech-to-Text API, which returns the transcribed text.
-
Translation:
- The transcribed text is translated to English using the DeepL API.
-
Text-to-Speech:
- The translated text is sent to the ElevenLabs API, which converts it to speech and plays it back.
Dependencies
Ensure you have the following libraries installed:
google-cloud-speechpyaudiodeeplrequestspygame
You can install these dependencies using the following command:
pip install google-cloud-speech pyaudio deepl requests pygameConfiguration
Modify the following variables in the script to match your settings:
auth_key: Your DeepL API key.xi_api_key: Your ElevenLabs API key.voice_id: The voice ID to be used with ElevenLabs API.RATE: The audio sample rate (default is 16000).CHUNK: The audio chunk size (default is 1600).
Logging
Logging is set up in the script to capture errors during the text-to-speech conversion process. You can enable more detailed logging by uncommenting the logging configuration line.
# logging.basicConfig(level=logging.DEBUG)License
This project is licensed under the MIT License. See the LICENSE file for details.
Contributing
If you wish to contribute to this project, please fork the repository and create a pull request.
Acknowledgments
Developed By
This algorithm was developed by Kemal Hafızoğlu.