Quickstart
Create your first text-to-speech audio.
Prerequisites
Before you can make any requests you must first create an API key.
You can create one in the
Playground.
The API key must be included in all subsequent requests with the
X-API-Key
header or with the ?api_key=
query string parameter.
In browsers, the JavaScript API for WebSockets may not allow custom headers,
so the query string option is perefered there.
Obtain a list of available voices
Before generating audio from your text, you must first select the ID of the voice you’d like to use. To view a list of available voices, send the following request:
The result should look something like this:
Generate your first audio with the Bytes endpoint
Now that you have the ID of the voice you would like to use you can make a
POST request to the Bytes endpoint and generate some audio data.
Redirect the output to a file by appending --output result.wav
or > result.wav
to the curl
command.
The file result.wav
can then be listened to in any audio player.
Stream audio with the Server-Sent Events endpoint
Generate the Audio
The response is a stream of JSON objects:
Where data
contains a base64 encoded chunk of 32-bit floating point
numbers.
Save the response data into a file with the name result.json
.
Assemble the chunks into an audio file
The SSE endpoint streams the audio data in chunks. This is useful for real-time playback; however, for this demo, we will use a short Python script to parse the chunks and assemble them into a complete audio file.
Note: this example requires both the soundfile
and numpy
modules to
run.
Stream audio via WebSockets
The API also supports streaming audio via WebSockets. Here is a quick example implementation of a Python client that supports real-time playback using the WebSocket endpoint: