Deep expertise in Azure AI workloads — provisioning and managing Azure OpenAI Service deployments, Azure AI Search indexes and skillsets, Azure AI Studio/Foundry projects, Cognitive Services endpoints, content filtering policies, quota management, and responsible AI governance via ARM REST API and service-specific data planes.
Resources
1Install
npx skillscat add markus41/claude-m/azure-ai-services Install via the SkillsCat registry.
Azure AI Services
This skill provides comprehensive knowledge for provisioning, managing, and governing Azure AI workloads. It covers the Azure OpenAI Service (deployments, quotas, fine-tuning, content filtering), Azure AI Search (indexes, indexers, skillsets, vector search), Azure AI Studio / AI Foundry (projects, connections, model catalog, evaluations), and Azure Cognitive Services (Language, Vision, Speech, Translator).
Integration Context Contract
- Canonical contract: `docs/integration-context.md`
| Workflow | tenantId | subscriptionId | environmentCloud | principalType | scopesOrRoles |
|---|---|---|---|---|---|
| Azure OpenAI provisioning and management | required | required | AzureCloud* |
service-principal |
Cognitive Services Contributor or Azure AI Administrator |
| Azure OpenAI data plane (completions, embeddings) | required | required | AzureCloud* |
service-principal |
Cognitive Services OpenAI User or Cognitive Services OpenAI Contributor |
| Azure AI Search management | required | required | AzureCloud* |
service-principal |
Search Service Contributor |
| Azure AI Search data plane (index/query) | required | required | AzureCloud* |
service-principal |
Search Index Data Contributor / Search Index Data Reader |
| AI Studio / Foundry project management | required | required | AzureCloud* |
service-principal |
Azure AI Developer or Owner on the AI hub |
| Cognitive Services (Language, Vision, Speech) | required | required | AzureCloud* |
service-principal |
Cognitive Services Contributor |
* Use sovereign cloud values from the canonical contract when applicable.
Fail fast when required context is missing. Redact tenant, subscription, and endpoint keys in outputs.
Architecture Overview
Azure AI Services Ecosystem
├─ Azure OpenAI Service
│ ├─ Resource (Cognitive Services account, kind=OpenAI)
│ ├─ Deployments (model + SKU + quota)
│ ├─ Fine-tuning jobs
│ └─ Content filter policies
│
├─ Azure AI Search
│ ├─ Search service (SKU: free/basic/standard/storage-optimized)
│ ├─ Indexes (fields, vector fields, semantic configs)
│ ├─ Indexers + data sources
│ └─ Skillsets (built-in + custom)
│
├─ Azure AI Studio / AI Foundry
│ ├─ AI Hub (shared infra: storage, Key Vault, ACR, compute)
│ ├─ AI Projects (experiments, evals, deployments)
│ ├─ Model catalog (Azure OpenAI, HuggingFace, Meta Llama, etc.)
│ └─ Connections (OpenAI endpoint, search, blob, etc.)
│
└─ Cognitive Services (multi-service or single-service)
├─ Language (NER, sentiment, summarization, CLU)
├─ Vision (OCR, object detection, face)
├─ Speech (STT, TTS, speaker recognition)
└─ Translator (text, document)Azure OpenAI Service
ARM Resource Type
Microsoft.CognitiveServices/accounts with kind: OpenAI
Create Azure OpenAI resource:
PUT https://management.azure.com/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{accountName}?api-version=2023-10-01-preview
Authorization: Bearer {arm-token}
Content-Type: application/json
{
"kind": "OpenAI",
"sku": { "name": "S0" },
"location": "eastus",
"properties": {
"customSubDomainName": "{accountName}",
"networkAcls": { "defaultAction": "Allow" },
"publicNetworkAccess": "Enabled"
}
}Deployments
List deployments:
GET https://management.azure.com/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{accountName}/deployments?api-version=2023-10-01-previewCreate deployment:
PUT https://management.azure.com/.../accounts/{accountName}/deployments/{deploymentName}?api-version=2023-10-01-preview
{
"sku": { "name": "Standard", "capacity": 120 },
"properties": {
"model": {
"format": "OpenAI",
"name": "gpt-4o",
"version": "2024-05-13"
},
"versionUpgradeOption": "OnceCurrentVersionExpired"
}
}Capacity units: capacity is in thousands-of-tokens-per-minute (TPM). A capacity of 120 = 120,000 TPM.
Available models (as of 2026): gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-35-turbo, text-embedding-3-large, text-embedding-3-small, text-embedding-ada-002, dall-e-3, whisper
Quota Management
Get quota usage:
GET https://management.azure.com/subscriptions/{sub}/providers/Microsoft.CognitiveServices/locations/{location}/usages?api-version=2023-10-01-previewResponse includes current value and limit per model family.
Quota limits by region: Quota is shared across all deployments in a subscription per region per model family. Monitor currentValue / limit to avoid throttling.
Content Filters
List content filter policies:
GET https://management.azure.com/.../accounts/{accountName}/raiPolicies?api-version=2023-10-01-previewCreate custom content filter policy:
PUT .../accounts/{accountName}/raiPolicies/{policyName}
{
"properties": {
"mode": "Default",
"contentFilters": [
{ "name": "Hate", "blocking": true, "enabled": true, "severityThreshold": "Medium", "source": "Prompt" },
{ "name": "Hate", "blocking": true, "enabled": true, "severityThreshold": "Medium", "source": "Completion" },
{ "name": "Violence", "blocking": true, "enabled": true, "severityThreshold": "Low", "source": "Prompt" },
{ "name": "Violence", "blocking": true, "enabled": true, "severityThreshold": "Low", "source": "Completion" },
{ "name": "SelfHarm", "blocking": true, "enabled": true, "severityThreshold": "Low", "source": "Prompt" },
{ "name": "SelfHarm", "blocking": true, "enabled": true, "severityThreshold": "Low", "source": "Completion" },
{ "name": "Sexual", "blocking": true, "enabled": true, "severityThreshold": "Medium", "source": "Prompt" },
{ "name": "Sexual", "blocking": true, "enabled": true, "severityThreshold": "Medium", "source": "Completion" }
]
}
}Assign content filter to deployment by including raiPolicyName in the deployment properties.
Data Plane (Completions)
https://{accountName}.openai.azure.com/openai/deployments/{deploymentName}/chat/completions?api-version=2024-02-01Authentication: api-key header (key-based) or Authorization: Bearer {entra-token} (managed identity — preferred).
Azure AI Search
ARM Resource Type
Microsoft.Search/searchServices
Create search service:
PUT https://management.azure.com/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Search/searchServices/{serviceName}?api-version=2023-11-01
{
"location": "eastus",
"sku": { "name": "standard" },
"properties": {
"replicaCount": 1,
"partitionCount": 1,
"hostingMode": "default",
"publicNetworkAccess": "enabled",
"semanticSearch": "standard"
}
}SKU options: free (1 index, 50 MB), basic (15 indexes, 2 GB), standard (50 indexes, 25 GB/partition), standard2, standard3, storage_optimized_l1, storage_optimized_l2
Index Schema (Data Plane)
https://{serviceName}.search.windows.netCreate index with vector support:
{
"name": "products-index",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true, "retrievable": true },
{ "name": "title", "type": "Edm.String", "searchable": true, "retrievable": true, "analyzer": "en.microsoft" },
{ "name": "content", "type": "Edm.String", "searchable": true, "retrievable": true },
{ "name": "category", "type": "Edm.String", "filterable": true, "facetable": true, "retrievable": true },
{
"name": "contentVector",
"type": "Collection(Edm.Single)",
"searchable": true,
"retrievable": false,
"dimensions": 1536,
"vectorSearchProfile": "hnsw-profile"
}
],
"vectorSearch": {
"profiles": [{ "name": "hnsw-profile", "algorithm": "hnsw-config" }],
"algorithms": [{ "name": "hnsw-config", "kind": "hnsw", "hnswParameters": { "m": 4, "efConstruction": 400, "efSearch": 500, "metric": "cosine" } }]
},
"semantic": {
"configurations": [{
"name": "semantic-config",
"prioritizedFields": {
"titleField": { "fieldName": "title" },
"contentFields": [{ "fieldName": "content" }]
}
}]
}
}Indexer and Data Source
Create Blob data source:
{
"name": "blob-datasource",
"type": "azureblob",
"credentials": { "connectionString": "DefaultEndpointsProtocol=https;..." },
"container": { "name": "documents", "query": null }
}Create indexer with skillset:
{
"name": "blob-indexer",
"dataSourceName": "blob-datasource",
"targetIndexName": "products-index",
"skillsetName": "enrichment-skillset",
"schedule": { "interval": "PT2H" },
"parameters": { "configuration": { "dataToExtract": "contentAndMetadata", "parsingMode": "default" } }
}Skillsets
Built-in cognitive skills: OcrSkill, ImageAnalysisSkill, MergeSkill, SplitSkill, EntityRecognitionSkill, KeyPhraseExtractionSkill, LanguageDetectionSkill, SentimentSkill, TranslationSkill
Custom Web API skill for calling Azure OpenAI embeddings during indexing:
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"name": "embedding-skill",
"uri": "https://{functionApp}.azurewebsites.net/api/GenerateEmbedding",
"httpMethod": "POST",
"inputs": [{ "name": "text", "source": "/document/content" }],
"outputs": [{ "name": "embedding", "targetName": "contentVector" }]
}Search Queries
Hybrid search (keyword + vector):
{
"search": "renewable energy storage",
"vectorQueries": [{
"kind": "vector",
"vector": [0.123, -0.456, ...],
"k": 5,
"fields": "contentVector"
}],
"queryType": "semantic",
"semanticConfiguration": "semantic-config",
"queryLanguage": "en-us",
"top": 10,
"select": "id,title,content,category",
"filter": "category eq 'technology'"
}Azure AI Studio / AI Foundry
AI Studio is deployed as ARM resources under Microsoft.MachineLearningServices/workspaces with kind: Hub or kind: Project.
Create AI Hub:
PUT https://management.azure.com/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.MachineLearningServices/workspaces/{hubName}?api-version=2024-01-01-preview
{
"kind": "Hub",
"location": "eastus",
"identity": { "type": "SystemAssigned" },
"properties": {
"friendlyName": "My AI Hub",
"storageAccount": "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/{sa}",
"keyVault": "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.KeyVault/vaults/{kv}",
"applicationInsights": "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Insights/components/{ai}"
}
}Create AI Project under Hub:
PUT .../workspaces/{projectName}?api-version=2024-01-01-preview
{
"kind": "Project",
"location": "eastus",
"properties": {
"friendlyName": "My AI Project",
"hubResourceId": "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.MachineLearningServices/workspaces/{hubName}"
}
}Add connection (Azure OpenAI endpoint):
PUT .../workspaces/{projectName}/connections/{connectionName}?api-version=2024-01-01-preview
{
"properties": {
"category": "AzureOpenAI",
"target": "https://{openaiAccountName}.openai.azure.com/",
"authType": "ApiKey",
"credentials": { "key": "{api-key}" }
}
}Responsible AI Governance
Checklist for New Deployments
- Content filter policy reviewed and applied (not using
Microsoft.Defaultin production) - DALL-E image generation has
Jailbreakfilter enabled - Model monitoring configured for hallucination and toxicity scoring
- No API keys stored in code — use managed identity or Key Vault reference
- Network access restricted to VNet or Private Endpoint for production
- Diagnostic logs exported to Log Analytics workspace
- Quota alerts set (80% threshold on TPM)
- Usage metering reviewed monthly
Throttling and Error Handling
| Status | Error code | Cause | Fix |
|---|---|---|---|
| 429 | RateLimitReached |
TPM quota exhausted | Retry with exponential backoff; consider raising quota |
| 429 | TokensPerMinuteExceeded |
Per-minute burst exceeded | Add jitter to retry; use streaming to reduce perceived latency |
| 400 | content_filter |
Content filter triggered | Review prompt for policy-violating content |
| 404 | DeploymentNotFound |
Deployment name typo or not deployed | List deployments and confirm name |
| 503 | ServiceUnavailable |
Regional outage | Check Azure status; retry with backoff |
Output Convention
Every operation produces a structured markdown report:
- Header: operation, timestamp, resource location
- Resource summary: resource ID, SKU, status
- Deployment table: model, version, capacity (TPM), status
- Quota table: current usage / limit by model family
- Recommendations: quota headroom, governance gaps, cost observations
Reference Files
| Reference | Path | Topics |
|---|---|---|
| Azure OpenAI Reference | references/azure-openai-reference.md |
Deployments, models, quotas, fine-tuning, content filters, data plane, monitoring |
| AI Search Reference | references/ai-search-reference.md |
Index schema, vector fields, indexers, skillsets, semantic ranking, hybrid search |
Progressive Disclosure — Reference Files
| Topic | File |
|---|---|
| Azure OpenAI REST API, deployments, chat completions, embeddings, function calling, streaming, content filtering, BYOD, rate limits | `references/azure-openai-reference.md` |
| AI Search index schema, vector fields, indexers, skillsets, semantic ranking, hybrid search queries | `references/ai-search-reference.md` |
| Language Service (sentiment, NER, PII, summarization, CLU), Translator, Text Analytics for Health, QnA → Custom QA migration | `references/cognitive-services.md` |
| Document Intelligence REST API v4.0, prebuilt models, custom/neural models, composed models, confidence scores, bounding polygons | `references/document-intelligence.md` |
| Speech SDK (STT/TTS/translation/speaker recognition), Computer Vision 4.0 (analyze/OCR), Video Indexer, Face API retirement | `references/speech-vision.md` |