Quickstart | Respeecher Space API

Prerequisites

Before you can make any requests you must first create an API key. You can create one in the Playground. The API key must be included in all subsequent requests with the X-API-Key header or with the ?api_key= query string parameter. In browsers, the JavaScript API for WebSockets may not allow custom headers, so the query string option is perefered there.

Obtain a list of available voices

Before generating audio from your text, you must first select the ID of the voice you’d like to use. To view a list of available voices, send the following request:

GET

/v1/public/tts/en-rt/voices

1 from respeecher import Respeecher
2 
3 client = Respeecher(
4     api_key="YOUR_API_KEY",
5 )
6 client.voices.list()

The result should look something like this:

Response

1 [
2   {
3     "id": "samantha",
4     "gender": "female",
5     "accent": "American",
6     "sampling_params": {
7       "temperature": 0.6,
8       "top_k": -1,
9       "top_p": 0.8,
10       "min_p": 1,
11       "presence_penalty": 0,
12       "repetition_penalty": 2,
13       "frequency_penalty": 2
14     }
15   },
16   {
17     "id": "volodymyr",
18     "gender": "male",
19     "accent": "Ukrainian",
20     "sampling_params": {
21       "temperature": 0.4,
22       "top_k": -1,
23       "top_p": 0.8,
24       "min_p": 1,
25       "presence_penalty": 0,
26       "repetition_penalty": 2,
27       "frequency_penalty": 2
28     }
29   }
30 ]

Generate your first audio with the Bytes endpoint

Now that you have the ID of the voice you would like to use you can make a POST request to the Bytes endpoint and generate some audio data. Redirect the output to a file by appending --output result.wav or > result.wav to the curl command. The file result.wav can then be listened to in any audio player.

POST

/v1/public/tts/en-rt/tts/bytes

1 from respeecher import Respeecher
2 
3 client = Respeecher(
4     api_key="YOUR_API_KEY",
5 )
6 client.tts.bytes(
7     transcript="Hello, World!",
8     voice={"id": "samantha"},
9 )

Stream audio with the Server-Sent Events endpoint

Generate the Audio

POST

/v1/public/tts/en-rt/tts/sse

1 from respeecher import Respeecher
2 
3 client = Respeecher(
4     api_key="YOUR_API_KEY",
5 )
6 response = client.tts.sse(
7     transcript="Hello, World!",
8     voice={"id": "samantha"},
9 )
10 for chunk in response.data:
11     yield chunk

The response is a stream of JSON objects:

1 {"type": "chunk", "data": "..."}

Where data contains a base64 encoded chunk of 32-bit floating point numbers.

Save the response data into a file with the name result.json.

Assemble the chunks into an audio file

The SSE endpoint streams the audio data in chunks. This is useful for real-time playback; however, for this demo, we will use a short Python script to parse the chunks and assemble them into a complete audio file.

Note: this example requires both the soundfile and numpy modules to run.

1 import json
2 import base64
3 import numpy as np
4 import soundfile as sf
5 
6 sample_rate = 22050
7 infile = "result.json"
8 outfile = "result.wav"
9 
10 with open(infile, "r", encoding="utf-8") as f:
11     data = [json.loads(line) for line in f]
12 
13 chunks = []
14 for chunk in data:
15     audio_bytes = base64.b64decode(chunk["data"])
16     audio = np.frombuffer(audio_bytes, dtype=np.float32)
17     chunks.append(audio)
18 
19 full_audio = np.concatenate(chunks)
20 sf.write(outfile, full_audio, sample_rate)

Listen to the result

You can now enjoy the result.wav audio file generated by the script.

Stream audio via WebSockets

The API also supports streaming audio via WebSockets. Here is a quick example implementation of a Python client that supports real-time playback using the WebSocket endpoint:

1 import json
2 import base64
3 import pyaudio
4 import numpy as np
5 from websocket import create_connection
6 
7 voice = "<the id of the voice you want to use>"
8 
9 # connect to pyaudio for audio output
10 pa = pyaudio.PyAudio()
11 stream = pa.open(
12     format=pyaudio.paFloat32, channels=1, rate=22050, output=True
13 )
14 
15 # connect to the websocket
16 ws = create_connection("wss://<endpoint>/tts/websocket", header=["X-Api-Key: <ApiKey>"])
17 
18 while True:
19     # read input
20     try:
21         text = input("> ")
22     except EOFError:
23         break
24 
25     # send the input text to the websocket
26     transcript = json.dumps({
27         "transcript": text, "voice": {"id": voice}, "context_id": ""
28     })
29     ws.send_text(transcript)
30 
31     # recieve the result
32     chunks = []
33     while True:
34         chunk = json.loads(ws.recv())
35         if chunk.get("type") == "done":
36             break
37 
38         audio_bytes = base64.b64decode(chunk.get("data", b''))
39         stream.write(audio_bytes)

Support

Have a question about our API? Contact us here.