Have you ever wondered how to use Python to code simple magic to start using speech recognition? Do you want to tell the computer what to search on the Internet using only the voice in your own script?

First of all, as usual, we need to use some great modules and wait for us to build great scripts. For speech recognition, one of the first things that happen is to accurately recognize our voice. This is done by a popular class called Recognizer, which exists in most popular modules of these technologies.

When we create an instance of this class, it will take on the task of recognizing speech. This or course has several settings that need to be configured in order to work more effectively when recognizing speech from audio sources.

But wait a minute, grab those horses, okay? Which package or module will we use in this article to use speech recognition once and for all? Well, let me tell you right away, dear friend.

Each instance of the recognizer class has seven methods that can recognize speech from audio sources using various APIs. these are:

  • recognize_bing: Microsoft Bing Speech

  • recognize_google: Google Web Speech API This is the one we will use in this article.

  • recognize_google_cloud: Google Cloud Speech-requires installation of the google-cloud-speech package

  • recognize_houndify: Houndify by SoundHound

  • recognize_ibm: IBM Speech to Text

  • recognize_sphinx: CMU Sphinx-requires installing PocketSphinx

  • recognize_wit: Wit.ai

Among the seven recognition methods, only the recognize_sphinx and CMU Sphinx engines work offline. Internet connection is required for everything else

After clarifying this point, let's proceed to install the speech recognition module. Let's first create a folder that will contain our script and all these dependencies that we will install. I call my voice recognition, but you are looking for the best name you can think of, okay?

After creating the folder, let us change the directory, once in, let us type the following command to install this magical module:

pip install SpeechRecognition

In addition to this, since we will use the microphone to indicate our query so that our script uses the browser to find it, we will need some other dependencies to make it work. Let's install these additional packages:

PyAudio package: The process of installing PyAudio will vary depending on your operating system. The irony is that the easiest installation is to use windows, which I find a bit weird: not having windows is annoying, but I know, right?

Debian Linux: If you use Debian-based Linux (such as Ubuntu), you can install PyAudio using apt:

sudo apt-get install python-pyaudio python3-pyaudio

After installation, you may still need to run pip install pyaudio, especially when working in a virtual environment.

macOS: For macOS, you first need to install PortAudio using Homebrew, and then install PyAudio using pip:

brew install portaudio
pip install pyaudio

Windows operating system: On Windows, you can install PyAudio using pip:

pip install pyaudio

Writing a script

Okay, guys! That's it, but now is the time for those bad boys to take action, right? Now let's build our code.

Let's open our favorite code editor and create a new file called sp_recog.py. Hope you also use vsCode on this side of the screen like me, haha.

It's time to write the first line of code. Let's import the knight that will perform all the magic of this script by typing the following in our new file:

# Importing the libraries that will do the magic part 🐵
import speech_recognition as sr
import webbrowser as wb

Now, let's create a function to save our entire routine.

def fn_speech_recognition():

Now, let us initialize the microphone in the speech recognition instance by using the Microphone method and passing the device_index parameter with the initial or default value of 0. This will allow us to get the first usable microphone in our computer

sr.Microphone(device_index = 0)

If you are curious and want to know how many microphones are installed in your computer, you can use the following command

print(f"MICs Found on this Computer: \n {sr.Microphone.list_microphone_names()}")

Now, let's create an instance of the recognizer and set some of the most important parameters to make it run smoothly:

# Creating a recognition object instance
r = sr.Recognizer()
r.energy_threshold=4000
r.dynamic_energy_threshold = False

Now the microphone will be the source for us to capture the commands given by the user. We will do this by adjusting the ambient noise and monitoring methods:

    with sr.Microphone() as source:
        print('Please Speak Loud and Clear:')
        #reduce noise
        r.adjust_for_ambient_noise(source)
        #take voice input from the microphone
        audio = r.listen(source)

As usual, I recommend using a try...catch block to manage your errors. Let's complete the code for this tutorial and integrate this block into it as follows:

        try:
            phrase = r.recognize_google(audio)
            print(f"Did you just say: {phrase} ?")
            url = "https://www.google.com/search?q="
            search_url = url+phrase
            wb.open(search_url)
        except TimeoutException as msg:
            print(msg)
        except WaitTimeoutError:
            print("listening timed out while waiting for phrase to start")
            quit()
        # speech is unintelligible
        except LookupError:
            print("Could not understand what you've requested.")
        else:
            print("Your results will appear in the default browser. Good bye for now...")

Finally, we call the function and start using our script

fn_speech_recognition()

Final source code:

Now that we have broken down our script, let's put the final source code together for you to test and use it. Please leave your comment and some other methods or better solutions to update the post when needed

import webbrowser as wb
def fn_speech_recognition():
    sr.Microphone(device_index = 0)
    print(f"MICs Found on this Computer: \n {sr.Microphone.list_microphone_names()}")
    # Creating a recognition object
    r = sr.Recognizer()
    r.energy_threshold=4000
    r.dynamic_energy_threshold = False

    with sr.Microphone() as source:
        print('Please Speak Loud and Clear:')
        #reduce noise
        r.adjust_for_ambient_noise(source)
        #take voice input from the microphone
        audio = r.listen(source)
        try:
            phrase = r.recognize_google(audio)
            print(f"Did you just say: {phrase} ?")
            url = "https://www.google.com/search?q="
            search_url = url+phrase
            wb.open(search_url)
        except TimeoutException as msg:
            print(msg)
        except WaitTimeoutError:
            print("listening timed out while waiting for phrase to start")
            quit()
        # speech is unintelligible
        except LookupError:
            print("Could not understand what you've requested.")
        else:
            print("Your results will appear in the default browser. Good bye for now...")


fn_speech_recognition()
Likes(0)

Comment list count 0 Comments

No Comments