Text Analysis & Speech with Foundry

Exam objectives:

Build a lightweight application that includes text analysis
Respond to spoken prompts using a deployed multimodal model
Build a lightweight application using Azure Speech in Foundry Tools

Overview

Azure AI Language and Azure AI Speech are both accessible through Foundry Tools — the Foundry portal's built-in integrations for Azure AI services. This section covers how to use these services within the Foundry ecosystem and how to build lightweight Python applications that call them.

Key Concepts

Text Analysis Capabilities

Capability	Azure AI Language feature	What it returns
Sentiment analysis	`analyze_sentiment()`	`positive`, `negative`, `neutral`, `mixed` + confidence scores
Key phrase extraction	`extract_key_phrases()`	List of important phrases
Named Entity Recognition (NER)	`recognize_entities()`	Entities with category (Person, Location, Organization, etc.)
Personally Identifiable Information (PII) detection	`recognize_pii_entities()`	PII entities with redaction option
Language detection	`detect_language()`	Detected language + confidence score
Text summarization	`begin_abstract_summary()`	Abstractive or extractive summary

Text Analysis App Pattern

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

client = TextAnalyticsClient(
    endpoint="https://<your-resource>.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>")
)

documents = ["Azure AI Foundry makes it easy to build AI solutions on Azure."]

# Sentiment
sentiment_result = client.analyze_sentiment(documents)
print(sentiment_result[0].sentiment)  # "positive"

# Key phrases
kp_result = client.extract_key_phrases(documents)
print(kp_result[0].key_phrases)  # ["Azure AI Foundry", "AI solutions", "Azure"]

# Named entities
ner_result = client.recognize_entities(documents)
for entity in ner_result[0].entities:
    print(f"{entity.text} → {entity.category}")

Speech Capabilities via Foundry

Capability	Description
Speech-to-text (STT)	Convert spoken audio (microphone or audio file) to text
Text-to-speech (TTS)	Convert text to natural-sounding audio using neural voices
Spoken prompts with multimodal model	Send audio directly to a multimodal model (e.g., GPT-4o audio)

Speech App Pattern

import azure.cognitiveservices.speech as speechsdk

speech_config = speechsdk.SpeechConfig(
    subscription="<your-key>",
    region="<your-region>"
)

# Speech to Text
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
result = recognizer.recognize_once()
print(result.text)

# Text to Speech
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
synthesizer.speak_text_async("Hello from Azure AI Speech.").get()

Responding to Spoken Prompts with a Multimodal Model

GPT-4o supports audio input directly. You can send spoken audio as a prompt and receive a text response:

# Using the Foundry SDK with audio input (multimodal)
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import UserMessage, AudioContentItem
from azure.core.credentials import AzureKeyCredential
import base64

with open("question.wav", "rb") as f:
    audio_data = base64.b64encode(f.read()).decode("utf-8")

client = ChatCompletionsClient(
    endpoint="<your-endpoint>",
    credential=AzureKeyCredential("<your-key>")
)

response = client.complete(
    model="gpt-4o-audio",
    messages=[
        UserMessage(content=[
            AudioContentItem(audio=audio_data, format="wav")
        ])
    ]
)
print(response.choices[0].message.content)

Azure Services & Foundry Features

Service	Access via Foundry	Key use
Azure AI Language	Foundry Tools → Language	Text analysis
Azure AI Speech	Foundry Tools → Speech	STT, TTS, audio understanding
GPT-4o (multimodal)	Foundry model catalog	Audio + vision + text

Study Resources

📖 Azure AI Language documentation
📖 Azure AI Speech documentation
📖 Quickstart: Text analytics with Python
📖 Quickstart: Speech SDK with Python
🧪 Language Studio

Overview​

Key Concepts​

Text Analysis Capabilities​

Text Analysis App Pattern​

Speech Capabilities via Foundry​

Speech App Pattern​

Responding to Spoken Prompts with a Multimodal Model​

Azure Services & Foundry Features​

Study Resources​