Basic voices guide

Modified on Wed, 26 Apr, 2023 at 11:37 AM

Currently, there are more than 65 languages available in Elai with more than 300 various voices and accents. This is our basic package, available for all types of subscription (Trial, Basic, Advanced and Corporate plans).

If you are wondering how to make the pronunciation sound just right, you can check the tips below:

Basic tips

1. Use the Play button

When you type in your text, you can click the "Play" button and listen to how the audio sounds. This will help you detect if there are any mispronunciations.

2. Check the spelling

We all make typos, so make sure to double-check your spelling to get the correct pronunciation. Adding the necessary commas, hyphens, and question marks can be the key to making the AI voice sound the way you want it to.

3. One language at a time

Try not to mix languages on one slide. If you want to use multiple languages, it is better to have a different slide for a different language.

To make it just right:

1. You can insert a pause

If you need to insert a pause in the audio, use the right mouse button where you need it. A pop-up menu appears so you can Cut, Copy and Paste the text, add Variables or put a Pause automatically. Premium voices support SSML tags as well.

For example, if you need a one-second pause in the text, this is how your script should look: Hello everyone! <break time="1s" /> This is my first Elai.io video.

2. Adjust the pronunciation

If you feel like a word is not pronounced correctly, you can put a hyphen in this word to help the AI. For example, writing "lap-top" instead of "laptop".You can also use the Prosody SSML tag which allows you to change how a specific word or sentence will sound. You can change rate, volume, pitch, contour, and range.

3. Emphasize words

You can emphasize words by wrapping them into pitch tag with decreased rate and short pause before the word, like this: Entrance is on the <break time="100ms"/><prosody rate="-25%">ground</prosody> level.As an option you can change Contour of the word: <prosody contour="(1%, +85%)">word</prosody>

4. Use phonemes to improve pronunciation

By using Speech Synthesis Markup Language (SSML) in conjunction with text-to-speech technology, you can dictate the precise pronunciation of spoken words. With SSML, you can utilize phonemes and personalized lexicons to enhance speech clarity. Additionally, you can use SSML to specify the proper enunciation of specific words or mathematical expressions. For more in-depth guidance on incorporating SSML elements to enhance pronunciation, please consult the sections below. If you require further clarification on SSML syntax, see SSML document structure and events.

Phoneme Element

In SSML documents, the phoneme element serves the purpose of indicating phonetic pronunciation. It is important to include human-readable speech as a fallback option.

Phonetic alphabets consist of phones, which are comprised of letters, numbers, or symbols, often used in combination. Each phone corresponds to a distinct speech sound, unlike the Latin alphabet, in which a single letter may represent multiple spoken sounds.

Consider the different en-US pronunciations of the letter "c" in the words "candy" and "cease" or the different pronunciations of the letter combination "th" in the words "thing" and "those."

See SSML phonetic alphabets

See Universal Phone Set

Custom lexicon

You can define how single entities (such as company, a medical term, or an emoji) are read in SSML by using the phoneme and sub elements. To define how multiple entities are read, create an XML structured custom lexicon file. Then you upload the custom lexicon XML file and reference it with the SSML lexicon element.

After you've published your custom lexicon, you can reference it from your SSML. The following SSML example references a custom lexicon that was uploaded to https://www.example.com/customlexicon.xml.

To define how multiple entities are read, you can define them in a custom lexicon XML file with either the .xml or .pls file extension.

Here are some limitations of the custom lexicon file:

File size: The custom lexicon file size is limited to a maximum of 100 KB. If the file size exceeds the 100-KB limit, the synthesis request fails.
Lexicon cache refresh: The custom lexicon is cached with the URI as the key on text-to-speech when it's first loaded. The lexicon with the same URI won't be reloaded within 15 minutes, so the custom lexicon change needs to wait 15 minutes at the most to take effect.

Here are some examples of the supported elements and attributes:

The lexicon element contains at least one lexeme element. Lexicon contains the necessary xml:lang attribute to indicate which locale it should be applied for. One custom lexicon is limited to one locale by design, so if you apply it for a different locale, it won't work. The lexicon element also has an alphabet attribute to indicate the alphabet used in the lexicon. The possible values are ipa and x-microsoft-sapi.
Each lexeme element contains at least one grapheme element and one or more grapheme, alias, and phoneme elements. The lexeme element is case sensitive in the custom lexicon. For example, if you only provide a phoneme for the lexeme "Hello", it won't work for the lexeme "hello".
The grapheme element contains text that describes the orthography.
The alias elements are used to indicate the pronunciation of an acronym or an abbreviated term.
The phoneme element provides text that describes how the lexeme is pronounced. The syllable boundary is '.' in the IPA alphabet. The phoneme element can't contain white space when you use the IPA alphabet.
When the alias and phoneme elements are provided with the same grapheme element, alias has higher priority.

Microsoft provides a validation tool for the custom lexicon that helps you find errors (with detailed error messages) in the custom lexicon file. Using the tool is recommended before you use the custom lexicon XML file in production with the Speech service.

Find more information here.