This content originally appeared on DEV Community and was authored by Jan Küster
When I implemented my first speech-synthesis app using the Web Speech API
I was shocked how hard it was to setup and execute it with cross-browser support in mind:
- Some browsers don't support speech synthesis at all, for instance IE (at least I don't care 🤷♂️) and Opera (I do care 😠) and a few more mobile browsers (I haven't decided yet, whether I care or not 🤔).
- On top of that, each browser implements the API differently or with some specific quirks the other browsers don't have
Just try it yourself - go to and execute the MDN speech synthesis example on different browsers and different platforms:
- Linux, Windows, MacOS, BSD, Android, iOS
- Firefox, Chrome, Chromium, Safari, Opera, Edge, IE, Samsung Browser, Android Webview, Safari on iOS, Opera Mini
You will realize that this example will only work on a subset of these platform-browser combinations. Worst: when you start researching you'll get shocked how quirky and underdeveloped this whole API still is in 2021/2022.
To be fair: it is still labeled as experimental technology. However, it's almost 10 years now, since it has been drafted and still is not a living standard.
This makes it much harder to leverage for our applications and I hope this guide I will help you to get the most out of it for as many browsers as possible.
Minimal example
Let's approach this topic step-by-step and start with a minimal example that all browsers (that generally support speech synthesis) should run:
if ('speechSynthesis' in window) {
window.speechSynthesis.speak(
new SpeechSynthesisUtterance('Hello, world!')
)
}
You can simply copy that code and execute it in your browser console.
If you have basic support you will hear some "default" voice speaking the text 'Hello, world!'
and it may sound natural or not, depending on the default "voice" that is used.
Loading voices
Browsers may detect your current language and select a default voice, if installed. However, this may not represent the desired language you'd like to hear for the text to be spoken.
In such case you need to load the list of voices, which are instances of SpeechSynthesisVoice
. This is the first greater obstacle where browsers behave quite differently:
Load voices sync-style
const voices = window.speechSynthesis.getVoices()
voices // Array of voices or empty if none are installed
Firefox and Safari Desktop just load the voices immediately in sync-style. This however would return an empty array on Chrome Desktop, Chrome Android and may return an empty Array on Firefox Android (see next section).
Load voices async-style
window.speechSynthesis.onvoiceschanged = function () {
const voices = window.speechSynthesis.getVoices()
voices // Array of voices or empty if none are installed
}
This methods loads the voices async, so your overall system needs a callback or wrap it with a Promise
. Firefox Desktop does not support this method at all, although it's defined as property of window.speechSynthesis
, while Safari does not have it at all.
In contrast: Firefox Android loads the voices the first time using this method and on a refresh has them available via the sync-style method.
Loading using interval
Some users of older Safari have reported that their voices are not available immediately (while onvoiceschanged
is not available, too). For this case we need to check in a constant interval for the voices:
let timeout = 0
const maxTimeout = 2000
const interval = 250
const loadVoices = (cb) => {
const voices = speechSynthesis.getVoices()
if (voices.length > 0) {
return cb(undefined, voices)
}
if (timeout >= maxTimeout) {
return cb(new Error('loadVoices max timeout exceeded'))
}
timeout += interval
setTimeout(() => loadVoices(cb), interval)
}
loadVoices((err, voices) => {
if (err) return console.error(err)
voices // voices loaded and available
})
Speaking with a certain voice
There are use-cases, where the default selected voice is not the same language as the text to be spoken. We need to change the voice for the "utterance" to speak.
Step 1: get a voice by a given language
// assume voices are loaded, see previous section
const getVoicebyLang = lang => speechSynthesis
.getVoices()
.find(voice => voice.startsWith(lang))
const german = getVoicebyLang('de')
Note: Voices have standard language codes, like en-GB
or en-US
or de-DE
. However, on Android's Samsung Browser or Android Chrome voices have underscore-connected codes, like en_GB
.
Then on Firefox android voices have three characters before the separator, like deu-DEU-f00
or eng-GBR-f00
.
However, they all start with the language code so passing a two-letter short-code should be sufficient.
Step 2: create a new utterance
We can now pass the voice to a new SpeechSynthesisUtterance
and as your precognitive abilities correctly manifest - there are again some browser-specific issues to consider:
const text = 'Guten Tag!'
const utterance = new SpeechSynthesisUtterance(text)
if (utterance.text !== text) {
// I found no browser yet that does not support text
// as constructor arg but who knows!?
utterance.text = text
}
utterance.voice = german // ios required
utterance.lang = voice.lang // // Android Chrome required
utterance.voiceURI = voice.voiceURI // Who knows if required?
utterance.pitch = 1
utterance.volume = 1
// API allows up to 10 but values > 2 break on all Chrome
utterance.rate = 1
We can now pass the utterance to the speak function as a preview:
speechSynthesis.speak(utterance) // speaks 'Guten Tag!' in German
Step 3: add events and speak
This is of course just the half of it. We actually want to get deeper insights of what's happening and what's missing by tapping into some of the utterance's events:
const handler = e => console.debug(e.type)
utterance.onstart = handler
utterance.onend = handler
utterance.onerror = e => console.error(e)
// SSML markup is rarely supported
// See: https://www.w3.org/TR/speech-synthesis/
utterance.onmark = handler
// word boundaries are supported by
// Safari MacOS and on windows but
// not on Linux and Android browsers
utterance.onboundary = handler
// not supported / fired
// on many browsers somehow
utterance.onpause = handler
utterance.onresume = handler
// finally speak and log all the events
speechSynthesis.speak(utterance)
Step 4: Chrome-specific fix
Longer texts on Chrome-Desktop will be cancelled automatically after 15 seconds. This can be fixed by either chunking the texts or by using an interval of "zero"-latency pause/resume combination. At the same time this fix breaks on Android, since Android devices don't implement speechSynthesis.pause()
as pause but as cancel:
let timer
utterance.onstart = () => {
// detection is up to you for this article as
// this is an own huge topic for itself
if (!isAndroid) {
resumeInfinity(utterance)
}
}
const clear = () => { clearTimeout(timer) }
utterance.onerror = clear
utterance.onend = clear
const resumeInfinity = (target) => {
// prevent memory-leak in case utterance is deleted, while this is ongoing
if (!target && timer) { return clear() }
speechSynthesis.pause()
speechSynthesis.resume()
timer = setTimeout(function () {
resumeInfinity(target)
}, 5000)
}
Furthermore, some browser don't update the speechSynthesis.paused
property when speechSynthesis.pause()
is executed (and speech is correctly paused). You need to manage these states yourself then.
Issues that can't be fixed with JavaScript:
All the above fixes rely on JavaScript but some issues are platform-specific. You need to your app in a way to avoid these issues, where possible:
- All browsers on Android actually do a cancel/stop when calling
speechSynthesis.pause
; pause is simply not supported on Android 👎 - There are no voices on Chromium-Ubuntu and Ubuntu-derivatives unless the browser is started with a flag 👎
- If on Chromium-Desktop Ubuntu and the very first page wants to load speech synthesis, then there are no voices ever loaded until the page is refreshed or a new page is entered. This can be fixed with JavaScript but it can lead to very bad UX to auto-refresh the page. 👎
- If voices are not installed on the host-OS and there are no voices loaded from remote by the browser, then there are no voices and thus no speech synthesis 👎
- There is no chance to just instant-load custom voices from remote and use them as a shim in case there are no voices 👎
- If the installed voices are just bad users have to manually install better voices 👎
Making your life easier with EasySpeech
Now you have seen the worst and believe me, it takes ages to implement all potential fixes.
Fortunately I already did this and published a package to NPM with the intent to provide a common API that handles most issues internally and provide the same experience across browsers (that support speechSynthesis
):
jankapunkt / easy-speech
Cross browser Speech Synthesis
Easy Speech
Cross browser Speech Synthesis
This project was created, because it's always a struggle to get the synthesis
part of Web Speech API
running on most major browsers.
Note: this is not a polyfill package, if your target browser does not support speech synthesis or the Web Speech API, this package is not usable.
Install
Install from npm via
$ npm install easy-speech
Usage
Import EasySpeech
and first, detect, if your browser is capable of tts (text
to speech):
import EasySpeech from 'easy-speech'
EasySpeech.detect()
it returns an Object with the following information:
{
speechSynthesis: SpeechSynthesis|undefined,
speechSynthesisUtterance: SpeechSynthesisUtterance|undefined,
speechSynthesisVoice: SpeechSynthesisVoice|undefined,
speechSynthesisEvent: SpeechSynthesisEvent|undefined,
speechSynthesisErrorEvent: SpeechSynthesisErrorEvent|undefined,
onvoiceschanged: Boolean,
onboundary: Boolean,
onend: Boolean,
onerror: Boolean,
onmark: Boolean,
onpause:
…
You should give it a try if you want to implement speech synthesis the next time. It also comes with a DEMO page so you can easy test and debug your devices there: https://jankapunkt.github.io/easy-speech/
Let's take a look how it works:
import EasySpeech from 'easy-speech'
// sync, returns Object with detected features
EasySpeech.detect()
EasySpeech.init()
.catch(e => console.error('no speech synthesis:', error.message)
.then(() = > {
EasySpeech.speak({ text: 'Hello, world!' })
})
It will not only detect, which features are available but also loads an optimal default voice, based on a few heuristics.
Of course there is much more to use and the full API is also documented via JSDoc: https://github.com/jankapunkt/easy-speech/blob/master/API.md
If you like it leave a star and please file an issue if you found (yet another) browser-specific issue.
References
- https://wicg.github.io/speech-api/#tts-section
- https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis
- https://gist.github.com/alrra/6741915
- https://github.com/ubershmekel/audio-language-tests
- https://stackoverflow.com/questions/33889107/speech-synthesis-in-chrome-for-android-not-loading-voices
- https://stackoverflow.com/questions/49506716/speechsynthesis-getvoices-returns-empty-array-on-windows
- https://stackoverflow.com/questions/21947730/chrome-speech-synthesis-with-longer-texts
- https://stackoverflow.com/a/34130734
- https://stackoverflow.com/a/68060634
- https://stackoverflow.com/a/48056986
- https://bugs.chromium.org/p/chromium/issues/detail?id=582455
- https://stackoverflow.com/a/65883556
This content originally appeared on DEV Community and was authored by Jan Küster
Jan Küster | Sciencx (2021-12-07T10:42:24+00:00) Cross browser speech synthesis – the hard way and the easy way. Retrieved from https://www.scien.cx/2021/12/07/cross-browser-speech-synthesis-the-hard-way-and-the-easy-way/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.