We are launching three new voices for Polly. Powered by a new long-form engine, the voices are natural and expressive, with appropriate pauses, emphasis, and tone.
The new long-form voices are perfect for blog posts, news articles, training videos, and marketing content. The underlying Machine Learning model extracts meaning from the text, learning about speech segments, prosody (the pattern of rhythm and pauses), intonation, and other aspects of expressive speech, allowing the synthesized audio to express emotions, especially in dialogs. The new long-form engine uses a deep learning text-to-speech (TTS) model trained to acquire a contextual understanding of the text that allows it to express prosody in an appropriate way. This allows the intention of the story to drive the vocal performance and create the correct emphasis, pauses, and tones of a realistic human voice.
Here are the new voices:
AWS Management Console, AWS Command Line Interface (AWS CLI), or the AWS SDKs. Using the CLI, I start by listing the voices that use the new long-form engine:
I can pick one, or I can try all of them:
My shell script had a small quoting bug, but the resulting audio was too funny not to include!
Programmatically, you can reproduce my example by writing code that calls the
Things to Know
Pricing – Long-form voices are priced at $100 per million characters or Speech Marks requests. Check out the Amazon Polly Pricing page to learn more.
Engines & Voices – Some of the voices that I listed above can be used with more than one engine. For example, the Danielle voice can be used with the new long-form engine and the existing neural engine.
Regions – The new engine and voices are available in the US East (N. Virginia) Region.
Check out the new voices, build something awesome, and let me know what you think!