Quickstart

Create your first text-to-speech audio.

Prerequisites

Before you can make any requests you must first create an API key. You can create one in the Playground. The API key must be included in all subsequent requests with the X-API-Key header or with the ?api_key= query string parameter. In browsers, the JavaScript API for WebSockets may not allow custom headers, so the query string option is perefered there.

Obtain a list of available voices

Before generating audio from your text, you must first select the ID of the voice you’d like to use. To view a list of available voices, send the following request:

GET
/v1/public/tts/en-rt/voices
1from respeecher import Respeecher
2
3client = Respeecher(
4 api_key="YOUR_API_KEY",
5)
6client.voices.list()

The result should look something like this:

Response
1[
2 {
3 "id": "samantha",
4 "gender": "female",
5 "accent": "American",
6 "sampling_params": {
7 "temperature": 0.6,
8 "top_k": -1,
9 "top_p": 0.8,
10 "min_p": 1,
11 "presence_penalty": 0,
12 "repetition_penalty": 2,
13 "frequency_penalty": 2
14 }
15 },
16 {
17 "id": "volodymyr",
18 "gender": "male",
19 "accent": "Ukrainian",
20 "sampling_params": {
21 "temperature": 0.4,
22 "top_k": -1,
23 "top_p": 0.8,
24 "min_p": 1,
25 "presence_penalty": 0,
26 "repetition_penalty": 2,
27 "frequency_penalty": 2
28 }
29 }
30]

Generate your first audio with the Bytes endpoint

Now that you have the ID of the voice you would like to use you can make a POST request to the Bytes endpoint and generate some audio data. Redirect the output to a file by appending --output result.wav or > result.wav to the curl command. The file result.wav can then be listened to in any audio player.

POST
/v1/public/tts/en-rt/tts/bytes
1from respeecher import Respeecher
2
3client = Respeecher(
4 api_key="YOUR_API_KEY",
5)
6client.tts.bytes(
7 transcript="Hello, World!",
8 voice={"id": "samantha"},
9)

Stream audio with the Server-Sent Events endpoint

1

Generate the Audio

POST
/v1/public/tts/en-rt/tts/sse
1from respeecher import Respeecher
2
3client = Respeecher(
4 api_key="YOUR_API_KEY",
5)
6response = client.tts.sse(
7 transcript="Hello, World!",
8 voice={"id": "samantha"},
9)
10for chunk in response.data:
11 yield chunk

The response is a stream of JSON objects:

1{"type": "chunk", "data": "..."}

Where data contains a base64 encoded chunk of 32-bit floating point numbers.

Save the response data into a file with the name result.json.

2

Assemble the chunks into an audio file

The SSE endpoint streams the audio data in chunks. This is useful for real-time playback; however, for this demo, we will use a short Python script to parse the chunks and assemble them into a complete audio file.

Note: this example requires both the soundfile and numpy modules to run.

1import json
2import base64
3import numpy as np
4import soundfile as sf
5
6sample_rate = 22050
7infile = "result.json"
8outfile = "result.wav"
9
10with open(infile, "r", encoding="utf-8") as f:
11 data = [json.loads(line) for line in f]
12
13chunks = []
14for chunk in data:
15 audio_bytes = base64.b64decode(chunk["data"])
16 audio = np.frombuffer(audio_bytes, dtype=np.float32)
17 chunks.append(audio)
18
19full_audio = np.concatenate(chunks)
20sf.write(outfile, full_audio, sample_rate)
3

Listen to the result

You can now enjoy the result.wav audio file generated by the script.

Stream audio via WebSockets

The API also supports streaming audio via WebSockets. Here is a quick example implementation of a Python client that supports real-time playback using the WebSocket endpoint:

1import json
2import base64
3import pyaudio
4import numpy as np
5from websocket import create_connection
6
7voice = "<the id of the voice you want to use>"
8
9# connect to pyaudio for audio output
10pa = pyaudio.PyAudio()
11stream = pa.open(
12 format=pyaudio.paFloat32, channels=1, rate=22050, output=True
13)
14
15# connect to the websocket
16ws = create_connection("wss://<endpoint>/tts/websocket", header=["X-Api-Key: <ApiKey>"])
17
18while True:
19 # read input
20 try:
21 text = input("> ")
22 except EOFError:
23 break
24
25 # send the input text to the websocket
26 transcript = json.dumps({
27 "transcript": text, "voice": {"id": voice}, "context_id": ""
28 })
29 ws.send_text(transcript)
30
31 # recieve the result
32 chunks = []
33 while True:
34 chunk = json.loads(ws.recv())
35 if chunk.get("type") == "done":
36 break
37
38 audio_bytes = base64.b64decode(chunk.get("data", b''))
39 audio = np.frombuffer(audio_bytes, dtype=np.float32)
40 chunks.append(audio)
41
42 # concatenate and output the audio data
43 full_audio = np.concatenate(chunks)
44 stream.write(full_audio.tobytes())

Support