Text Analysis & Speech with Foundry
Exam objectives:
- Build a lightweight application that includes text analysis
- Respond to spoken prompts using a deployed multimodal model
- Build a lightweight application using Azure Speech in Foundry Tools
Overviewโ
Azure AI Language and Azure AI Speech are both accessible through Foundry Tools โ the Foundry portal's built-in integrations for Azure AI services. This section covers how to use these services within the Foundry ecosystem and how to build lightweight Python applications that call them.
Key Conceptsโ
Text Analysis Capabilitiesโ
| Capability | Azure AI Language feature | What it returns |
|---|---|---|
| Sentiment analysis | analyze_sentiment() | positive, negative, neutral, mixed + confidence scores |
| Key phrase extraction | extract_key_phrases() | List of important phrases |
| Named Entity Recognition (NER) | recognize_entities() | Entities with category (Person, Location, Organization, etc.) |
| Personally Identifiable Information (PII) detection | recognize_pii_entities() | PII entities with redaction option |
| Language detection | detect_language() | Detected language + confidence score |
| Text summarization | begin_abstract_summary() | Abstractive or extractive summary |
Text Analysis App Patternโ
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
client = TextAnalyticsClient(
endpoint="https://<your-resource>.cognitiveservices.azure.com/",
credential=AzureKeyCredential("<your-key>")
)
documents = ["Azure AI Foundry makes it easy to build AI solutions on Azure."]
# Sentiment
sentiment_result = client.analyze_sentiment(documents)
print(sentiment_result[0].sentiment) # "positive"
# Key phrases
kp_result = client.extract_key_phrases(documents)
print(kp_result[0].key_phrases) # ["Azure AI Foundry", "AI solutions", "Azure"]
# Named entities
ner_result = client.recognize_entities(documents)
for entity in ner_result[0].entities:
print(f"{entity.text} โ {entity.category}")
Speech Capabilities via Foundryโ
| Capability | Description |
|---|---|
| Speech-to-text (STT) | Convert spoken audio (microphone or audio file) to text |
| Text-to-speech (TTS) | Convert text to natural-sounding audio using neural voices |
| Spoken prompts with multimodal model | Send audio directly to a multimodal model (e.g., GPT-4o audio) |
Speech App Patternโ
import azure.cognitiveservices.speech as speechsdk
speech_config = speechsdk.SpeechConfig(
subscription="<your-key>",
region="<your-region>"
)
# Speech to Text
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
result = recognizer.recognize_once()
print(result.text)
# Text to Speech
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
synthesizer.speak_text_async("Hello from Azure AI Speech.").get()
Responding to Spoken Prompts with a Multimodal Modelโ
GPT-4o supports audio input directly. You can send spoken audio as a prompt and receive a text response:
# Using the Foundry SDK with audio input (multimodal)
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import UserMessage, AudioContentItem
from azure.core.credentials import AzureKeyCredential
import base64
with open("question.wav", "rb") as f:
audio_data = base64.b64encode(f.read()).decode("utf-8")
client = ChatCompletionsClient(
endpoint="<your-endpoint>",
credential=AzureKeyCredential("<your-key>")
)
response = client.complete(
model="gpt-4o-audio",
messages=[
UserMessage(content=[
AudioContentItem(audio=audio_data, format="wav")
])
]
)
print(response.choices[0].message.content)
Azure Services & Foundry Featuresโ
| Service | Access via Foundry | Key use |
|---|---|---|
| Azure AI Language | Foundry Tools โ Language | Text analysis |
| Azure AI Speech | Foundry Tools โ Speech | STT, TTS, audio understanding |
| GPT-4o (multimodal) | Foundry model catalog | Audio + vision + text |