Implement Azure AI speech for specially abled users
Wassup guys?
Here is a very popular AI service offered by Azure AI, that can just not decode any text to speech, but also can convert speech to text as well. Yeah, you heard me right -- it can do so effortlessly and with absolute accuracy.
Given the speed and accuracy with which it can do the same, it can help any business, in a lot many areas, viz.,
a. Helping Specially abled readers to communicate/interact with outside world
b. Integrating any speech to be apprehended and saved as media (sound) file as a Blob or a SharePoint attachment.
c. Buget friendly and auto-scaleable.
d. Selecting your preferred model (accent, gender, impetus, emphasis) for the voice generator
e. Real time/near real time integration abilities
Create an Azure AI speech resource
Here is an example that can help you in implementing the same.
Reach out to https://portal.azure.com and create the following resource: Speech Service --
Here are the details:
1. Select a proper resource group/create a new one
2. Give a suitable name
3. Select a suitable region
4. Pricing tier is what you should choose to decide the obbvious cost implications. For a complete pricing details, click on the following URL: https://go.microsoft.com/fwlink/?linkid=2100053
Ensure to select System assigned managed security:
This will help you manage other resources (for example you want an Azure logic apps/function to call this service) to interact/access this resource.Click on Review + Create >> Create to Continue.
The resource will take a while to get deployed under the selected resource group.
This is the default landing page to Azure AI speech. You can browse down to Speech studio and try out various manual utilities available out of box:
Text to speech
Here is a small C# code that can convert any speech to text:
a. Start with a Windows + Console Project >> and right click on your project area >> select Manage Nuget Packages >> search and install Microsoft.CognitiveServices.Speech and Microsoft.CognitiveServices.Speech.audio.
b. Define the variables under LocalSettings.Json:
speechKey and speechRegion,
which you can get as under
c. Rest of the code is very straight forward:using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
static void OutputSpeechRecognitionResult(SpeechRecognitionResult speechRecognitionResult)
{
switch (speechRecognitionResult.Reason)
{
case ResultReason.RecognizedSpeech:
Console.WriteLine($"Decoded: Text= {speechRecognitionResult.Text}");
break;
case ResultReason.NoMatch:
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
break;
case ResultReason.Canceled:
var cancellation = CancellationDetails.FromResult(speechRecognitionResult);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
Console.WriteLine($"CANCELED: Did you set the speech resource key and region values?");
}
break;
}
Console.ReadKey();
}
The above code is a method that accepts any inputted text and converts it into speech, by invoking the speech calss: SpeechRecognitionResult . If any error is there, it would simply break and cause an API cancellation.
Let us now call this method from the Main:
async static Task Main(string[] args)
{
var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
speechConfig.SpeechRecognitionLanguage = "en-US";
using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
Console.WriteLine("Say something now:");
var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
OutputSpeechRecognitionResult(speechRecognitionResult);
}
Where by it's prompting the user to say something and then just getting hold of the Microphone input and converting it into Text, by calling the above mentioned method.
Let us run the code now:
It asked me to say something, and then it tapped in what I said, and converted it into Speech.Text to speech:
Just as above, we can define a method that can understand the keyed in text and convert it into text.
static void OutputSpeechSynthesisResult(SpeechSynthesisResult speechSynthesisResult, string text)
{
switch (speechSynthesisResult.Reason)
{
case ResultReason.SynthesizingAudioCompleted:
Console.WriteLine($"Analyzing input for speech: [{text}]");
break;
case ResultReason.Canceled:
var cancellation = SpeechSynthesisCancellationDetails.FromResult(speechSynthesisResult);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
if (cancellation.Reason == CancellationReason.Error)
{
Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
Console.WriteLine($"CANCELED: Ensure that you have correctly set the speech resource key and region values.");
}
break;
default:
break;
}
}
The program reads out in a feminine voice with US accent, like how I have chosen in the screenshot above.
Comments
Post a Comment