Get Live Speech Transcriptions In Your Browser

There are so many projects you can build with Deepgram’s streaming audio transcriptions. Today, we are going to get live transcriptions from a user’s mic inside of your browser.

Before We Start

For this project, you will need a Deepgram API…


This content originally appeared on DEV Community and was authored by Kevin Lewis

There are so many projects you can build with Deepgram's streaming audio transcriptions. Today, we are going to get live transcriptions from a user's mic inside of your browser.

Before We Start

For this project, you will need a Deepgram API Key - get one here. That's it in terms of dependencies - this project is entirely browser-based.

Create a new index.html file, open it in a code editor, and add the following boilerplate code:

<!DOCTYPE html>
<html>
  <body>
    <p id="status">Connection status will go here</p>
    <p id="transcript">Deepgram transcript will go here</p>
    <script>
      // Further code goes here
    </script>
  </body>
</html>

Get User Microphone

You can request access to a user's media input devices (microphones and cameras) using a built in getUserMedia() method. If allowed by the user, it will return a MediaStream which we can then prepare to send to Deepgram. Inside of your <script> add the following:

navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
  console.log({ stream })
  // Further code goes here
})

Load your index.html file in your browser, and you should immediately receive a prompt to access your microphone. Grant it, and then look at the console in your developer tools.

The first half of the image shows the browser asking for access to the mic. An arrow with the phrase "once granted" points to the second half of the image, which has the browser console open, showing an object containing a MediaStream

Now we have a MediaStream we must provide it to a MediaRecorder which will prepare the data and, once available, emit it with a datavailable event:

if (!MediaRecorder.isTypeSupported('audio/webm'))
  return alert('Browser not supported')
const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' })

We now have everything we need to send Deepgram.

Connect to Deepgram

To stream audio to Deepgram's Speech Recognition service, we must open a WebSocket connection and send data via it. First, establish the connection:

const socket = new WebSocket('wss://api.deepgram.com/v1/listen', [
  'token',
  'YOUR_DEEPGRAM_API_KEY',
])

!!! A reminder that this key is client-side and, therefore, your users can see it. Please factor this into your actual projects.

Then, log when socket onopen, onmessage, onclose, and onerror events are triggered:

socket.onopen = () => {
  console.log({ event: 'onopen' })
}

socket.onmessage = (message) => {
  console.log({ event: 'onmessage', message })
}

socket.onclose = () => {
  console.log({ event: 'onclose' })
}

socket.onerror = (error) => {
  console.log({ event: 'onerror', error })
}

Refresh your browser and watch the console. You should see the socket connection is opened and then closed. To keep the connection open, we must swiftly send some data once the connection is opened.

Sending Data to Deepgram

Inside of the socket.onopen function send data to Deepgram in 250ms increments:

mediaRecorder.addEventListener('dataavailable', async (event) => {
  if (event.data.size > 0 && socket.readyState == 1) {
    socket.send(event.data)
  }
})
mediaRecorder.start(250)

Deepgram isn't fussy about the timeslice you provide (here it's 250ms), but bear in mind that the bigger this number is, the longer between words being spoken and it being sent, slowing down your transcription. 100-250 is ideal.

Take a look at your console now while speaking into your mic - you should be seeing data come back from Deepgram!

The browser console shows four onmessage events. The last one is expanded and shows a JSON object, including a data object. The data object contains the words "how are you doing today."

Handling the Deepgram Response

Inside of the socket.onmessage function parse the data sent from Deepgram, pull out the transcript only, and determine if it's the final transcript for that phrase ("utterance"):

const received = JSON.parse(message.data)
const transcript = received.channel.alternatives[0].transcript
if (transcript && received.is_final) {
  console.log(transcript)
}

You may have noticed that for each phrase, you have received several messages from Deepgram - each growing by a word (for example "hello", "hello how", "hello how are", etc). Deepgram will send you back data as each word is transcribed, which is great for getting a speedy response. For this simple project, we will only show the final version of each utterance which is denoted by an is_final property in the response.

To neaten this up, remove the console.log({ event: 'onmessage', message }) from this function, and then test your code again.

The terminal shows two phrases written in plain text.

That's it! That's the project. Before we wrap up, let's give the user some indication of progress in the web page itself.

Showing Status & Progress In Browser

Change the text inside of <p id="status"> to 'Not Connected'. Then, at the top of your socket.onopen function add this line:

document.querySelector('#status').textContent = 'Connected'

Remove the text inside of <p id="transcript">. Where you are logging the transcript in your socket.onmessage function add this line:

document.querySelector('#transcript').textContent += transcript + ' '

Try your project once more, and your web page should show you when you're connected and what words you have spoken, thanks to Deepgram's Speech Recognition.

The final project code is available at https://github.com/deepgram-devs/browser-mic-streaming, and if you have any questions, please feel free to reach out on Twitter - we're @DeepgramDevs.


This content originally appeared on DEV Community and was authored by Kevin Lewis


Print Share Comment Cite Upload Translate Updates
APA

Kevin Lewis | Sciencx (2021-11-29T10:50:36+00:00) Get Live Speech Transcriptions In Your Browser. Retrieved from https://www.scien.cx/2021/11/29/get-live-speech-transcriptions-in-your-browser/

MLA
" » Get Live Speech Transcriptions In Your Browser." Kevin Lewis | Sciencx - Monday November 29, 2021, https://www.scien.cx/2021/11/29/get-live-speech-transcriptions-in-your-browser/
HARVARD
Kevin Lewis | Sciencx Monday November 29, 2021 » Get Live Speech Transcriptions In Your Browser., viewed ,<https://www.scien.cx/2021/11/29/get-live-speech-transcriptions-in-your-browser/>
VANCOUVER
Kevin Lewis | Sciencx - » Get Live Speech Transcriptions In Your Browser. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2021/11/29/get-live-speech-transcriptions-in-your-browser/
CHICAGO
" » Get Live Speech Transcriptions In Your Browser." Kevin Lewis | Sciencx - Accessed . https://www.scien.cx/2021/11/29/get-live-speech-transcriptions-in-your-browser/
IEEE
" » Get Live Speech Transcriptions In Your Browser." Kevin Lewis | Sciencx [Online]. Available: https://www.scien.cx/2021/11/29/get-live-speech-transcriptions-in-your-browser/. [Accessed: ]
rf:citation
» Get Live Speech Transcriptions In Your Browser | Kevin Lewis | Sciencx | https://www.scien.cx/2021/11/29/get-live-speech-transcriptions-in-your-browser/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.