Im experiencing a latency difference between TTS-1 and TTS-1-Max

Is there a difference between both models in this regard?

Hey Harpreet,

There is a difference. For inworld-tts-1, expect ~200-400ms time-to-first-chunk in streaming mode. Overall latency depends on text length, but we’re optimized for real-time applications. inworld-tts-max is slower but more expressive, right now it’s not ideal for real-time use cases.

Pro tip: Always use streaming for interactive experiences. The perceived latency is much better.