“Easy” Computer Speech Recognition with Azure Cognitive Services and Python

Azure Cognitive Speech Services allow you to easily add speech services to your device or app with an API call. In this class I show how to use Python, but many languages can be used, and different languages have the ability to access different resources on Azure.

Slide Notes:

Azure Speech

WARNING: Imposter Alert!!!

Azure Cognitive Services

What is Azure Cognitive Services?

Cognitive Services brings AI within reach of every developer—without requiring machine learning expertise. All it takes is an API call to embed the ability to see, hear, speak, search, understand, and accelerate decision-making into your apps. Enable developers of all skill levels to easily add AI capabilities to their apps.

Serverless Architecture

  • Compute in the cloud
  • Send Cloud Function/ Azure Function file and requirements, get back results in json

Azure Speech Services

Languages and SDK Compatibility

SSML – Language Style

  • Speech Synthesis Markup Language (SSML)

Speech to Text from Mic

Continuous Speech

  • Add while True:
  • Need to Trigger to Turn On
  • Need Trigger to Turn Off

Continuous Speech with Trigger

  • Add if state meant and conditional to text variable
  • Capitalization issues with text

Control Raspberry Pi with Azure/ Python

Speech to Text from File

  • Can be read out load
  • Can be saved to .wav file to be used by other applications

Text to Speech to .wav

Pricing

Pricing

Continuous Speech Recognition Code:

This Python code uses while True: to continuously loop. It also has trigger words so that is someone says “hipster” they get a verbal warning from the script, and if they say “close” the script will exit.

# This script continuously listens and turns speech to text in command line.  
# The script turns the input into a string varible, and that variable is tested against predefined trigger words
# in if statements

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
# Code modified by Eli Etherton/ Eli the Computer Guy for Silicon Dojo

import azure.cognitiveservices.speech as speechsdk

speech_key, service_region = "8a5562834e0a42429722b3ce3a464230", "eastus"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("Say something...")

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

trigger_word = "hipster"
exit_word = "close"

warning = "Play Nice Everyone. This has gone on your permenant record."
goodbye = "Goodbye.  Thanks for Playing."

while True:
    result = speech_recognizer.recognize_once()

    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(result.text))
        text_string = ("Recognized: {}".format(result.text))
        if trigger_word in text_string:
            speech_synthesizer.speak_text_async(warning).get()
            print("WARNING")

        if exit_word in text_string:
            speech_synthesizer.speak_text_async(goodbye).get()
            print("Script Closed")  
            quit()   

    elif result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(result.no_match_details))
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))

Trigger Fan Python Script:

This script sends commands using GET to a Raspberry Pi based on trigger words. This allows the script to control a fan being turned on or off.

This script also reads the value from a webpage if you use the proper trigger word and tells you the current value on that page.

# This script continuously listens and turns speech to text in command line.  
# The script turns the input into a string varible, and that variable is tested against predefined trigger words
# in if statements

# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license. See LICENSE.md file in the project root for full license information.
# Code modified by Eli Etherton/ Eli the Computer Guy for Silicon Dojo

import azure.cognitiveservices.speech as speechsdk
import requests
import urllib.request

speech_key, service_region = "8a5562834e0a42429722b3ce3a464230", "eastus"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("Say something...")

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)

trigger_on = "fan on"
trigger_off = "fan off"
trigger_temp = "what is the temperature"

exit_word = "close"

alert_fan_on = "The fan has been turned on"
alert_fan_off = "The fan is now off"
goodbye = "Goodbye.  Thanks for Playing."

while True:
    result = speech_recognizer.recognize_once()

    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(result.text))
        text_string = ("Recognized: {}".format(result.text))
        
        if trigger_on in text_string:
            speech_synthesizer.speak_text_async(alert_fan_on).get()
            requests.get("http://10.0.1.8/iot.php?command=on")

        if trigger_off in text_string:
            speech_synthesizer.speak_text_async(alert_fan_off).get() 
            requests.get("http://10.0.1.8/iot.php?command=off")

        if trigger_temp in text_string:
            alert = requests.get("http://10.0.1.8/current_temp.html")
            print(alert.text)
            speech_synthesizer.speak_text_async(alert.text).get() 

        if exit_word in text_string:
            speech_synthesizer.speak_text_async(goodbye).get()
            print("Script Closed")  
            quit()   

    elif result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(result.no_match_details))
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))