AI Model Components & Configuration
Exam objective: Identify AI model components and configurations
Overviewโ
Understanding how generative AI models work and how to select and configure them is a core skill for AI-901. The exam tests your ability to choose the right model for a task and understand the key configuration parameters that affect model behavior.
Key Conceptsโ
How Generative AI Models Workโ
| Concept | Description |
|---|
| Transformer architecture | The underlying architecture of most modern LLMs โ processes tokens using attention mechanisms |
| Token | The unit of text a model processes (roughly ยพ of a word). Models have a context window (max tokens) |
| Pre-training | Training on massive text datasets to learn general language patterns |
| Fine-tuning | Additional training on a smaller, task-specific dataset to specialize the model |
| Prompt | The input text you provide to a model to guide its output |
| Completion / Response | The text the model generates in response to a prompt |
| Embedding | A numeric vector representation of text, used for semantic search and similarity |
Choosing the Right Modelโ
| Scenario | Appropriate model type |
|---|
| Generate or summarize text | GPT-4o, GPT-4, GPT-3.5-turbo |
| Analyze images + generate text | Multimodal model (e.g., GPT-4o) |
| Convert text to images | DALL-E |
| Convert speech to text | Whisper |
| Semantic search, RAG | Embedding model (e.g., text-embedding-ada-002) |
Key Configuration Parametersโ
| Parameter | What it controls |
|---|
| Temperature | Randomness of output. 0 = deterministic/predictable, 1 = creative/varied |
| Max tokens | Maximum length of the generated response |
| Top-p (nucleus sampling) | Controls diversity โ the model picks from the top tokens whose probability sums to p |
| System prompt | Sets the model's behavior, persona, and constraints for the entire conversation |
| Deployment name | The name you give when deploying a model in Foundry โ used when calling the API |
Model Deployment Options in Foundryโ
| Option | Description |
|---|
| Standard deployment | Pay-per-token, globally load-balanced |
| Provisioned throughput | Reserved capacity for predictable workloads |
| Serverless API | Pay-per-use for models from the Foundry model catalog |
Study Resourcesโ