Showing posts with label Azure Cognitive Services tutorial. Show all posts
Showing posts with label Azure Cognitive Services tutorial. Show all posts

Saturday, September 27, 2025

Azure Cognitive Services — Step-by-step guide with a real-time example (for .NET Core + Angular)

 Short summary: This article explains what Azure Cognitive Services (now part of Azure AI services) is, shows the service families, gives a step-by-step setup and integration guide, and walks through a concrete real-time example: live speech → transcription → sentiment analysis using the Speech SDK + Text Analytics in a .NET Core app. Code samples and production best practices are included so you can copy this into a blog post. Microsoft Azure


What is Azure Cognitive Services (Azure AI services)?

Azure Cognitive Services (now presented as Azure AI services) is a set of cloud-hosted, pre-built and customizable AI APIs that let developers add vision, speech, language, decisioning and search capabilities into apps without building complex ML models from scratch. Services are available as REST APIs and native SDKs for common languages. Use cases range from OCR and face/object detection, to speech-to-text, text understanding and personalizers/recommenders. Microsoft Azure+1


Service families (quick overview)

  • Vision — image analysis, OCR (Read API), custom vision, object detection. Microsoft Learn

  • Speech — real-time and batch speech-to-text, text-to-speech, speech translation, speaker recognition. Microsoft Learn

  • Language / Text (Text Analytics) — sentiment analysis, key phrase extraction, named entity recognition, summarization, custom classification. Microsoft Learn

  • Decision & Search — recommendations, anomaly detection, and Azure AI Search for “chat with your data” scenarios. Microsoft Learn


Why use Azure Cognitive Services?

  • Fast time-to-market: pre-trained models that work out of the box. Microsoft Azure

  • Scalable & managed: Microsoft hosts and manages the infrastructure. Microsoft Learn

  • Multiple access patterns: REST + SDKs + streaming SDKs (for real-time audio). Microsoft Learn+1


Step-by-step: getting started (high level)

Prerequisites

  • Azure subscription (free tier available for initial experiments).

  • .NET 6/7 SDK and Visual Studio / VS Code (for the .NET sample).

  • A microphone (for the live speech demo) and basic familiarity with Azure Portal or Azure CLI.

Step 1 — Create an Azure AI / Cognitive Services resource

  1. In the Azure Portal click Create a resource → AI + Machine Learning → Azure AI services / Cognitive Services (or use the new Azure AI multi-service resource).

  2. Choose region, pricing tier (e.g., S0 or free if available), and resource group.

  3. After deployment, you’ll have an endpoint and keys you can use to call APIs. (You can also create resources programmatically with az cognitiveservices account create — see docs for full CLI parameters). Microsoft Learn+1

Tip: for production consider creating individual resources for heavy workloads (e.g., Speech in one region close to your users), or use the multi-service resource if you want one credential for multiple services. Microsoft Learn

Step 2 — Get keys and endpoint (or configure Azure AD)

  • From your resource in the portal, open Keys and Endpoint; copy the key and endpoint to your app's configuration (or store them in Azure Key Vault). Alternatively, prefer Azure AD / managed identity (keyless) authentication for production to avoid long-lived keys. Microsoft Learn+1

Step 3 — Choose SDK (recommended for streaming) or REST (good for simple requests)

  • Streaming / real-time audio: use the Speech SDK (native support for microphone streaming). Microsoft Learn

  • Text NLP tasks: use the Text Analytics / Language SDKs (Azure.AI.TextAnalytics or the newer language SDKs) or REST. Microsoft Learn+1

Step 4 — Install SDKs (example .NET)

dotnet add package Microsoft.CognitiveServices.Speech dotnet add package Azure.AI.TextAnalytics

(These package names are the official NuGet packages for Speech and Text Analytics.) NuGet+1


Real-time example: Live speech → transcription → sentiment analysis

Goal: capture live audio (microphone), transcribe it in real time, and run sentiment on each recognized utterance — useful for customer support dashboards, call monitoring, or live captions with emotion/sentiment tagging.

Architecture (simple):
Browser (Angular) → microphone capture (WebRTC or browser media) → upload/stream audio to Backend (.NET Core) → backend uses Speech SDK for low-latency transcription → send transcript to Text Analytics for sentiment → results saved/displayed on UI.

For demo simplicity we'll show a .NET console app performing real-time local microphone transcription + sentiment. In a real app you’d move the logic into your backend Web API and stream audio from frontend.

Environment variables (set these before running)

  • AZURE_SPEECH_KEY — Speech resource key

  • AZURE_SPEECH_REGION — Speech resource region (e.g., eastus)

  • AZURE_TEXT_KEY — Text Analytics key

  • AZURE_TEXT_ENDPOINT — Text Analytics endpoint (e.g., https://<your-resource>.cognitiveservices.azure.com/)

Minimal C# sample (real-time transcription → sentiment)

// Program.cs using System; using System.Threading.Tasks; using Microsoft.CognitiveServices.Speech; using Azure; using Azure.AI.TextAnalytics; class Program { static async Task Main() { // Read credentials from environment (or use secure configuration / managed identity in prod) var speechKey = Environment.GetEnvironmentVariable("AZURE_SPEECH_KEY"); var speechRegion = Environment.GetEnvironmentVariable("AZURE_SPEECH_REGION"); var textKey = Environment.GetEnvironmentVariable("AZURE_TEXT_KEY"); var textEndpoint = Environment.GetEnvironmentVariable("AZURE_TEXT_ENDPOINT"); if (string.IsNullOrEmpty(speechKey) || string.IsNullOrEmpty(speechRegion) || string.IsNullOrEmpty(textKey) || string.IsNullOrEmpty(textEndpoint)) { Console.WriteLine("Set AZURE_SPEECH_KEY, AZURE_SPEECH_REGION, AZURE_TEXT_KEY and AZURE_TEXT_ENDPOINT."); return; } var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion); using var recognizer = new SpeechRecognizer(speechConfig); var textClient = new TextAnalyticsClient(new Uri(textEndpoint), new AzureKeyCredential(textKey)); recognizer.Recognized += async (s, e) => { // e.Result may be partial or final depending on SDK/events; check the Reason if (e.Result.Reason == ResultReason.RecognizedSpeech) { var transcript = e.Result.Text; Console.WriteLine($"Transcript: {transcript}"); // Call Text Analytics sentiment API var sentimentResponse = await textClient.AnalyzeSentimentAsync(transcript); var sentiment = sentimentResponse.Value; Console.WriteLine($"Sentiment: {sentiment.Sentiment} (pos: {sentiment.ConfidenceScores.Positive:0.00}, " + $"neu: {sentiment.ConfidenceScores.Neutral:0.00}, neg: {sentiment.ConfidenceScores.Negative:0.00})"); } else if (e.Result.Reason == ResultReason.NoMatch) { Console.WriteLine("No speech could be recognized."); } }; recognizer.Canceled += (s, e) => { Console.WriteLine($"Recognition canceled: {e.Reason}. ErrorDetails: {e.ErrorDetails}"); }; await recognizer.StartContinuousRecognitionAsync(); Console.WriteLine("Listening — press ENTER to stop."); Console.ReadLine(); await recognizer.StopContinuousRecognitionAsync(); } }

This sample uses the Speech SDK for microphone streaming and the Azure.AI.TextAnalytics client for sentiment analysis. For more advanced control see the Speech SDK quickstarts which cover diarization, language detection and low-latency streaming. Microsoft Learn+1


Production considerations & best practices

Authentication & security

  • Prefer managed identities / Azure AD (keyless) for production to avoid embedding keys; use Azure Key Vault for stored secrets if needed. Many AI services support Microsoft Entra authentication and managed identities. Microsoft Learn+1

Scaling & architecture

  • For many concurrent audio streams, move recognition to an autoscaled backend (AKS / Azure Functions) and use message queues (Service Bus) to decouple ingestion from processing.

  • Batch text analytics calls when possible to reduce per-call overhead (Text Analytics supports batch input).

Cost & throttling

  • Monitor usage and enable quotas / alerts. Speech transcription and text analytics are billed per audio minute / text transactions — design batching and sampling accordingly. Refer to the pricing pages for the service you use. Microsoft Azure

Privacy & compliance

  • If you process PII or health data, confirm the service’s compliance and region choices (Azure publishes certifications and regional availability). Configure data retention and use private network options as needed. Microsoft Learn

Reliability

  • Add retry policies and exponential backoff for transient network/API errors. Use SDK built-in retry policies where available.

  • Implement fallbacks (e.g., if speech streaming fails, fall back to short file uploads + batch transcription).


Troubleshooting tips

  • No audio / no match — check microphone permissions, supported audio format, and sample rate.

  • Poor transcription — try appropriate language model, set the SpeechConfig.SpeechRecognitionLanguage, or use custom speech models for domain language. Microsoft Learn

  • Auth errors — ensure key/endpoint match resource region and that managed identity has proper role assignments if using Azure AD. Microsoft Learn


Resources & further reading

Blog Archive

Don't Copy

Protected by Copyscape Online Plagiarism Checker

Pages