AI Workloads & Capabilities
Exam objective: Identify AI workloads and their features
Overview​
AI-901 tests your ability to match a real-world scenario to the appropriate AI workload type and Azure capability. This includes text analysis, speech, computer vision, information extraction, generative AI, and agentic AI.
Key Concepts​
Common AI Workload Types​
| Workload | What it does | Key Azure capability |
|---|
| Text analysis | Extract meaning from text | Azure AI Language |
| Speech | Convert between audio and text | Azure AI Speech (via Foundry Tools) |
| Computer vision | Interpret visual content | Multimodal models, Azure AI Vision |
| Information extraction | Pull structured data from documents, images, audio, video | Azure Content Understanding |
| Generative AI | Create new text, images, or code from prompts | Azure OpenAI (GPT, DALL-E) |
| Agentic AI | AI that takes multi-step actions using tools to complete a goal | Azure AI Agents (Foundry) |
Text Analysis Techniques​
| Technique | Description | Example |
|---|
| Keyword extraction | Identify the most important words and phrases | "Azure, AI, certification" from an article |
| Entity detection (NER) | Recognize named entities — people, places, dates, organizations | "Microsoft" → Organization, "Seattle" → Location |
| Sentiment analysis | Determine the emotional tone — positive, negative, neutral, mixed | Product review → "positive" |
| Summarization | Condense long text into a shorter version | Summarize a 5-page report into 3 sentences |
Speech Capabilities​
| Capability | Description |
|---|
| Speech recognition (STT) | Convert spoken audio to text |
| Speech synthesis (TTS) | Convert text to natural-sounding speech |
| Speaker recognition | Identify or verify a speaker from their voice |
| Real-time translation | Translate spoken language in near real-time |
Computer Vision Capabilities​
| Capability | Description |
|---|
| Image analysis | Describe image content, detect objects, read text (OCR) |
| Image captioning | Generate a natural language description of an image |
| Image generation | Create a new image from a text prompt (DALL-E) |
| Visual Q&A | Answer questions about an image using a multimodal model |
Information Extraction (Azure Content Understanding)​
| Source | What can be extracted |
|---|
| Documents & forms | Fields, tables, key-value pairs, signatures |
| Images | Text, objects, structured data in visual layout |
| Audio | Transcription, speaker turns, topics |
| Video | Transcription, scenes, faces, on-screen text |
Agentic AI​
An agent is an AI system that can plan and take multi-step actions to complete a goal. Unlike a simple chatbot, an agent can:
- Use tools (e.g., search the web, run code, call an API)
- Maintain state across multiple steps
- Plan sequences of actions to achieve an objective
In Foundry, you can create a single-agent solution that combines a deployed model with tools.
Study Resources​