Skip to main content

Text Analysis & Speech with Foundry

Exam objectives:

  • Build a lightweight application that includes text analysis
  • Respond to spoken prompts using a deployed multimodal model
  • Build a lightweight application using Azure Speech in Foundry Tools

Overviewโ€‹

Azure AI Language and Azure AI Speech are both accessible through Foundry Tools โ€” the Foundry portal's built-in integrations for Azure AI services. This section covers how to use these services within the Foundry ecosystem and how to build lightweight Python applications that call them.

Key Conceptsโ€‹

Text Analysis Capabilitiesโ€‹

CapabilityAzure AI Language featureWhat it returns
Sentiment analysisanalyze_sentiment()positive, negative, neutral, mixed + confidence scores
Key phrase extractionextract_key_phrases()List of important phrases
Named Entity Recognition (NER)recognize_entities()Entities with category (Person, Location, Organization, etc.)
Personally Identifiable Information (PII) detectionrecognize_pii_entities()PII entities with redaction option
Language detectiondetect_language()Detected language + confidence score
Text summarizationbegin_abstract_summary()Abstractive or extractive summary

Text Analysis App Patternโ€‹

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

client = TextAnalyticsClient(
endpoint="https://<your-resource>.cognitiveservices.azure.com/",
credential=AzureKeyCredential("<your-key>")
)

documents = ["Azure AI Foundry makes it easy to build AI solutions on Azure."]

# Sentiment
sentiment_result = client.analyze_sentiment(documents)
print(sentiment_result[0].sentiment) # "positive"

# Key phrases
kp_result = client.extract_key_phrases(documents)
print(kp_result[0].key_phrases) # ["Azure AI Foundry", "AI solutions", "Azure"]

# Named entities
ner_result = client.recognize_entities(documents)
for entity in ner_result[0].entities:
print(f"{entity.text} โ†’ {entity.category}")

Speech Capabilities via Foundryโ€‹

CapabilityDescription
Speech-to-text (STT)Convert spoken audio (microphone or audio file) to text
Text-to-speech (TTS)Convert text to natural-sounding audio using neural voices
Spoken prompts with multimodal modelSend audio directly to a multimodal model (e.g., GPT-4o audio)

Speech App Patternโ€‹

import azure.cognitiveservices.speech as speechsdk

speech_config = speechsdk.SpeechConfig(
subscription="<your-key>",
region="<your-region>"
)

# Speech to Text
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
result = recognizer.recognize_once()
print(result.text)

# Text to Speech
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
synthesizer.speak_text_async("Hello from Azure AI Speech.").get()

Responding to Spoken Prompts with a Multimodal Modelโ€‹

GPT-4o supports audio input directly. You can send spoken audio as a prompt and receive a text response:

# Using the Foundry SDK with audio input (multimodal)
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import UserMessage, AudioContentItem
from azure.core.credentials import AzureKeyCredential
import base64

with open("question.wav", "rb") as f:
audio_data = base64.b64encode(f.read()).decode("utf-8")

client = ChatCompletionsClient(
endpoint="<your-endpoint>",
credential=AzureKeyCredential("<your-key>")
)

response = client.complete(
model="gpt-4o-audio",
messages=[
UserMessage(content=[
AudioContentItem(audio=audio_data, format="wav")
])
]
)
print(response.choices[0].message.content)

Azure Services & Foundry Featuresโ€‹

ServiceAccess via FoundryKey use
Azure AI LanguageFoundry Tools โ†’ LanguageText analysis
Azure AI SpeechFoundry Tools โ†’ SpeechSTT, TTS, audio understanding
GPT-4o (multimodal)Foundry model catalogAudio + vision + text

Study Resourcesโ€‹