Create Generation
Create a lip sync generation. This is the JSON form of POST /v2/generate — provide each input by url (or by assetId from an uploaded asset). It is the same endpoint as “Create Generation with Files”; use that multipart form instead when you want to upload local files directly. Pick one content type per request.
Authentication
Request
Array of input objects. Must include one video or image input item and at least one audio input item. Image inputs are only supported with the sync-3 model. Audio input items can be provided as either: recorded/captured audio url or a text-to-speech input with tts provider configuration. When using segments, multiple audio inputs can be provided with unique refId values.
Dubbing parameters. When present, audio is extracted from the video input, dubbed via ElevenLabs into the target language, and then lipsync is run with the dubbed audio. Audio inputs in the input array are ignored when dubbing is enabled — so a single video input (with audio) is sufficient.
Optionally attach this generation to a project (created via POST /v2/projects) so it appears in Studio under that project. Must reference a project in your organization — otherwise the request is rejected with 422.
Response
Stable, machine-readable error code if the generation failed (e.g. generation_input_video_inaccessible). The full catalog of codes, messages and suggested fixes is served unauthenticated at GET /v2/errors.
The URL of the audio synthesized from a text (TTS) input. Only present for generations created with a TTS text input; reuse it as an audio input to keep the same take across generations.

