Bytes

The easiest way to generate text-to-speech audio. Not suitable for latency-sensitive applications. Use for transcripts up to approximately 5,000 characters. For longer texts, use SSE or WebSocket.

Authentication

X-API-Keystring
API Key authentication via header

Request

This endpoint expects an object.
transcriptstringRequired
Text for narration.
voiceobjectRequired
Voice for narration.
output_formatobjectOptional
Audio format specification.

Response

WAV file (16-bit LE PCM).