Overview
This reference implementation demonstrates an enterprise AI gateway built using Azure API Management and Azure OpenAI (Azure AI Foundry).
The platform introduces a centralized governance layer for AI services, allowing organizations to securely expose AI capabilities while maintaining control over authentication, cost management, and observability.
Key aspects include:
- Secretless authentication using Azure Managed Identity
- Subscription-based access control with rate limiting and quotas
- Comprehensive observability via Application Insights and Azure Monitor
- Infrastructure provisioned through modular Terraform
- Automated deployment via a multi-stage Azure DevOps pipeline
- Policy-driven request transformation and security controls
The project reflects how platform teams design and operate AI service gateways in enterprise Azure environments.
Context
Organizations adopting AI services frequently encounter platform-level challenges:
- controlling access and preventing unauthorized usage
- managing costs and preventing runaway token consumption
- establishing observability for AI request patterns
- securing API keys and credentials
- providing consistent API contracts across multiple AI models
- tracking usage and costs per team or application
This platform addresses these concerns using Azure-native gateway patterns and platform engineering practices, prioritizing operational control, governance, and security over AI model complexity.
Architecture
High-Level Architecture
Client Applications││ HTTPS (Subscription Key)▼Azure API Management (Gateway)• Subscription Key Validation• Rate Limiting (100 req/min)• Daily Quotas (10,000 req/day)• Request/Response Transformation• Logging to Application Insights││ Managed Identity▼Azure OpenAI (Azure AI Foundry)• GPT-4o Model• Cognitive Services Resource• 128K Context Window│└── Observability Layer• Application Insights• Azure Monitor Alerts• Log Analytics• Key Vault
The architecture prioritizes managed Azure services and eliminates credential management by using Managed Identity authentication between services.
Infrastructure Organization
Infrastructure is implemented using a modular Terraform structure.
terraform/├── modules/│ ├── api-management│ ├── ai-foundry│ ├── key-vault│ ├── managed-identity│ ├── monitoring│ └── monitoring-alerts├── main.tf├── variables.tf└── outputs.tf
Design principles:
- modular resource components
- clear separation of concerns
- consistent naming conventions
- remote state using an Azure Storage backend
- environment-specific variable groups
This structure enables infrastructure reuse across development, staging, and production environments.
Security Model
Managed Identity Authentication
All service-to-service communication uses Azure Managed Identity with RBAC.
API Management uses a User-Assigned Managed Identity to authenticate to:
- Azure OpenAI (Cognitive Services User role)
- Azure Key Vault (Secrets User role)
- Application Insights (Metrics Publisher role)
Azure OpenAI is configured with:
local_auth_enabled = false
This disables key-based authentication and enforces identity-based access only.
Authentication flow:
Client → [Subscription Key] → APIM → [Managed Identity] → Azure OpenAI
This approach enables zero-trust service authentication without storing credentials in code or configuration.
Client Authentication
API consumers authenticate using subscription keys managed by API Management.
Security controls include:
- subscription key validation (401 on failure)
- per-subscription rate limiting (100 requests/minute)
- daily quotas (10,000 requests/day)
- request ID tracking for audit
- TLS 1.2+ enforcement with certificate validation
Network Security
Security layers include:
- HTTPS / TLS 1.2+ enforced for all connections
- legacy SSL/TLS protocols explicitly disabled
- backend certificate chain validation enabled
- Key Vault network ACLs (default deny)
- public endpoints (VNet integration available as a future enhancement)
Resource Protection
Production resources are protected with:
- CanNotDelete locks on critical resources
- Key Vault soft delete enabled (90 days)
- purge protection for sensitive data stores
- centralized audit logging to Log Analytics
API Operations
The gateway exposes a simplified REST API that abstracts Azure OpenAI complexity.
Text Summarization
POST /ai/summarize
Accepts text and style parameters and returns a concise summary.
Request
{"text": "Long article text...","max_length": 500,"style": "concise"}
Response
{"summary": "This article discusses...","tokens_used": 1234,"request_id": "550e8400-...","model": "gpt-4o"}
Information Extraction
POST /ai/extract
Extracts structured data from unstructured text using JSON Schema.
Request
{"text": "INVOICE #12345\nDate: March 11, 2026...","schema": {"type": "object","properties": {"invoice_number": { "type": "string" },"total": { "type": "number" }}}}
Response
{"extracted_data": {"invoice_number": "12345","total": 2450.00},"confidence": 0.98,"tokens_used": 234}
Health Check
GET /ai/health
Returns operational status of the gateway and backend services.
Request Transformation
APIM policies transform client requests into Azure OpenAI request format.
Client Request → APIM Transformation → Azure OpenAI Request
Transformations include:
- converting simple input to OpenAI message format
- injecting system prompts based on operation type
- applying token limits and model parameters
- authenticating via Managed Identity
- simplifying backend responses
This abstraction allows backend AI providers to change without impacting client applications.
Observability
Observability is implemented using Azure-native monitoring services.
Application Insights Integration
Collected metrics include:
- request/response logs with payload tracking
- custom metrics (token usage, cost per request)
- latency metrics (P50 / P95 / P99)
- error rates by status code
- quota exhaustion events
Azure Monitor Alerts
Configured alerts:
Critical
- error rate >10%
- API Management availability <99%
Warning
- error rate >5%
- P95 latency >5 seconds
- quota usage >90%
Informational
- unusual traffic patterns
- cost anomalies
Cost Visibility
Custom telemetry enables tracking of:
- token usage trends per subscription
- cost allocation per API consumer
- model usage distribution
- rate-limit hits
This visibility enables data-driven capacity planning and cost optimization.
CI/CD Automation
The platform uses a multi-stage Azure DevOps pipeline for infrastructure deployment.
Pipeline Stages
Validate
- Terraform validation and formatting
- APIM policy XML validation
- security scanning with Checkov
- parallel execution for fast feedback
Plan
- Terraform plan generation
- plan artifact publishing
- cost estimation with Infracost
- environment-specific configuration injection
Deploy
- manual approval gates (staging and production)
- Terraform apply using saved plan
- output extraction for verification
- post-deployment health checks
Deployment Flow
main branch│▼Validate│▼Plan Dev → Deploy Dev (automatic)│▼Plan Staging → Deploy Staging (manual approval)│▼Plan Prod → Deploy Prod (manual approval)
This workflow provides progressive validation while maintaining fast development feedback loops.
Security Scanning
Pipeline checks include:
- Checkov policy scanning
- XML schema validation
- secret detection via pre-commit hooks
- Terraform formatting enforcement
Governance with APIM Policies
The gateway enforces policy-driven governance across all APIs.
Global Policies
Applied to all operations:
- subscription validation
- CORS configuration
- base rate limiting
- centralized error handling
Operation-Specific Policies
Each API operation includes dedicated policies for:
- request validation and transformation
- backend routing to the appropriate AI model
- response transformation
- custom logging and metadata enrichment
Error Handling
Standardized error responses include:
- sanitized error messages
- request ID for traceability
- appropriate HTTP status codes
- retry guidance with
Retry-Afterheaders
This policy architecture enables centralized governance without application code changes.
Platform vs Consumer Responsibilities
A clear separation of responsibilities enables scalable platform operations.
Platform Team
Responsible for:
- Azure infrastructure provisioning with Terraform
- APIM configuration and policy management
- AI model deployment and capacity planning
- observability stack configuration
- security baselines and compliance
- CI/CD pipeline maintenance
- subscription management and access control
API Consumers
Responsible for:
- application integration with the gateway API
- subscription key protection
- request rate management
- cost monitoring and budgeting
- application-level error handling
This model enables self-service AI consumption while maintaining platform governance.
Cost Management
Cost Drivers
Typical baseline costs for a development environment:
- API Management (Developer tier) ~ $50/month
- Azure OpenAI pay-per-token usage
- Application Insights ~$10–20/month depending on ingestion
- Key Vault ~$1/month
- Log Analytics ~$5–10/month
Estimated baseline cost: ~$75–100/month plus token consumption.
Production environments typically use Standard or Premium APIM tiers with higher capacity and SLA guarantees.
Cost Controls
Implemented controls include:
- per-subscription rate limiting
- daily request quotas
- token limits per request
- Azure Monitor budget alerts
- usage tracking metrics
These controls prevent runaway token usage and unexpected cost spikes.
Possible Extensions
The platform can evolve with additional capabilities:
- OAuth 2.0 / Azure Entra ID authentication
- private endpoints and VNet integration
- multi-region API Management deployment
- semantic caching for LLM responses
- PII detection and masking
- fine-tuned model deployments
- developer portal self-service onboarding
- cost chargeback by business unit
Scope
This project intentionally focuses on platform architecture and governance rather than AI application complexity.
The implementation demonstrates:
- Azure-native AI gateway platform architecture
- secretless authentication via Managed Identity
- modular infrastructure automation with Terraform
- policy-driven API governance using APIM
- full observability from day one
- enterprise CI/CD deployment practices
- cost control and usage governance
Minimal AI operations are implemented only to validate the platform capabilities. The architecture supports adding new AI capabilities without infrastructure redesign.
Last Updated: March 2026



