E-girls Speaking: Voices That Feel Alive (Devlog)
[p]Have you heard? The NPCs have more voice options now! It's hard to miss when Elysia's British accent seems to hit hard with many players. ❤️🔥[/p][p]Curious why you can choose between Legacy and Advanced text-to-speech? Read on to learn how our team yandere'd their way into this one—the emotional journey's end result is characters that are more dynamic, emotive, and dare we say alive?[/p][p][/p][hr][/hr][h3]Why Voice Options?[/h3][p]Originally, we used Azure's Speech Services  for our demo builds — it was stable and fast to set up for a demo project. But Azure came with some limitations. We wanted our girls to sound consistent across languages, yet few of Azure's voices could handle that well. Plus, Azure's TTS costs always ate up a big chunk of our budget. 💸 We're always on the hunt for ways to reduce cost without sacrificing quality.[/p][p]At the same time, other TTS services like ElevenLabs and MiniMax came out, offering more advanced options, including different emotional tones. After many months of testing, we chose MiniMax, which supports multiple languages, delivers more expressive voices, and is significantly more cost-friendly.[/p][p]We first rolled it out in the Steam China region, then in Russian and German, and finally implemented it across all languages for the global version this September.[/p][p][/p][hr][/hr][h3]Pros & Cons[/h3][p]MiniMax isn't the only service we tested. Our team compared multiple providers on price, stability, scalability, and technical support. Here's how they stack up for our case:[/p]
- [p]Azure: Pricey[/p]
 - [p]ElevenLabs: Widely used and good quality, but the most expensive.[/p]
 - [p]OpenAI: Also pricey, with no available technical support.[/p]
 - [p]Inworld: ❌ Not fitting for our use case.[/p]
 - [p]MiniMax: Cost-effective, with voice output that can reflect different emotions, plus excellent technical support from their customer service team.[/p]