Introduction
Speech recognition refers to the the process of enabling a computer to identify and respond to the sounds produced in human speech. It was first introduced in 1952 at Bell Laboratories and this version could only recognize numbers but not words. Few years later, speech recognition had grown from just recognizing numbers to recognizing text and even detecting noise. This technology was developed as an alternative to typing on keyboard, you will only have to talk to your computer and your words appear on your computer screen
Web Speech Api
In year 2012, the Web Speech Api was introduced with the aim of enabling speech recognition and also converting text to speech on modern web browsers.
Note: Speech recognition is not currently supported on all browsers click here for list of compatible browsers.
Getting Started
The first thing we need to do is to check if our browser is compatible with speech recognition, we can easily achieve that with the code below ๐
The next step is to create a new speech recognition object and check for when recording start.
Finally to start our speech recognition and do something with the output.
Code Explanation:
recognition.onstart
: This is an event handler that will run when the speech recognition service has begun listening to incoming audio.recognition.onresult
: Another event handler that will run when the speech recognition service returns a result.recognition.start()
: This method will start the speech recognition service and start listening to incoming audio, running this code for the first time will show a dialog asking for access to your device microphone like below.
transcript
: This is the text output generated after the speech recognition service had stopped, and that's all we need from all the code we've written so far. For now, we are just logging the output to the console, you can choose to do something else with it.
There are more properties, methods and event handlers that can be used when accessing the speech recognition objects, some of which include:
recognition.grammars
: Used to set the grammars that will be understood by the speech recognition service.recognition.continuous
: Boolean to set whether continuous results are returned for each recognition, or only a single result.
Click here for full list of supported methods, properties and event handlers.
Sayit ๐ฃ
I'd recently built a progressive web app (utilizing speech recognition) that convert spoken word to text and provide a button to instantly share this text across various social media platform. This project could be handy when you want to send a lengthy email or post on social media. View the project live here and if you think its cool, kindly give a star on github (contributions are also welcome ๐ค).
Conclusion
+1 for Accessibility
Speech recognition had played a great role in accessibility over the past few years, most especially for the visually impaired, people with injured arm and many more. Since they cannot use the keyboard for typing, they'd to default to using their voice for controlling and navigating through applications and web pages.
Project Idea
If you are so into speech recognition (like i am), how about building a web pages that is fully automated and controlled with voice rather than clicking or swiping. For example from the index page, i could just say go to about page, and i will be redirected to about page, sounds cool? yeah!. I will love to see what you've built, you can send me a message on twitter, i will gladly answer your questions.
P.s: i'm looking to make new dev friends. Lets connect on twitter ๐ค
Thanks for reading ๐