Overview
This reference implementation demonstrates an Azure Kubernetes Service (AKS) platform baseline designed around platform engineering principles.
The focus is not application complexity, but the design of a secure, observable, and operationally stable container platform using Azure-native services.
Key aspects include:
- Secretless workload authentication using Azure Workload Identity
- Day-1 observability with managed Prometheus and Azure Managed Grafana
- Infrastructure provisioning through modular Terraform
- Automated infrastructure and application deployment via CI/CD pipelines
- Policy-driven governance with Azure Policy for Kubernetes
The project reflects how platform teams provision and operate Kubernetes infrastructure in enterprise Azure environments.
Context
Teams adopting Kubernetes frequently encounter similar platform challenges:
- managing credentials and secrets across workloads
- establishing observability before applications deploy
- controlling operational complexity of cluster tooling
- enforcing security and compliance early
- structuring infrastructure code for reuse and automation
This platform baseline addresses these concerns through Azure-native services and platform engineering practices, prioritizing operational clarity over application complexity.
Architecture
High-Level Architecture
Azure Subscription┌───────────────────────────────────────────────┐│ ││ AKS Cluster ││ • OIDC Issuer + Workload Identity ││ • Azure CNI networking ││ • Application Routing (NGINX Ingress) ││ • Secrets Store CSI Driver ││ • Azure Policy Add-on ││ • Container Insights ││ ││ Azure Monitor Workspace ││ • Managed Prometheus metrics ││ • Data Collection Rules ││ ││ Azure Managed Grafana ││ • Entra ID authentication ││ • Platform dashboards ││ ││ Azure Key Vault ││ • Secret storage ││ • Workload Identity authentication ││ ││ Azure Container Registry ││ • Image storage and pull permissions ││ │└───────────────────────────────────────────────┘
The architecture prioritizes managed Azure services to reduce operational overhead while maintaining strong security and observability capabilities.
Infrastructure Organization
Infrastructure is implemented using a modular Terraform structure.
infra/terraform/├── modules/│ ├── aks│ ├── networking│ ├── monitoring│ ├── keyvault│ ├── acr│ └── workload-identity└── envs/dev
Design principles:
- modular infrastructure components
- clear dependency ordering
- consistent resource naming
- remote Terraform state with locking
This structure enables reusable infrastructure patterns and supports future multi-environment deployments.
Security Model
Workload Identity
Authentication between Kubernetes workloads and Azure services uses Azure Workload Identity with OIDC federation.
Kubernetes ServiceAccounts are mapped to Azure Managed Identities using federated credentials.
Applications can securely access Azure resources such as Key Vault without storing credentials inside pods or pipelines.
This approach replaces legacy Pod Identity and aligns with modern zero-trust authentication patterns.
Key Vault Integration
Secrets are retrieved through the Secrets Store CSI Driver.
- secrets stored in Azure Key Vault
- mounted into pods as read-only files
- authentication handled through Workload Identity
The Key Vault uses Azure RBAC authorization, avoiding legacy access policy models.
Network Security
The cluster is deployed within a dedicated virtual network.
Security controls include:
- Azure CNI networking
- network security groups
- Kubernetes network policies
- isolated node subnet
Private endpoints can be introduced later for Key Vault and Container Registry.
Observability
Observability is implemented using Azure-native managed services:
- Azure Monitor managed service for Prometheus
- Azure Managed Grafana
- Container Insights for logs and cluster diagnostics
Metrics are collected from:
- Kubernetes nodes
- kube-state-metrics
- application endpoints
- ingress controller metrics
Grafana dashboards provide operational visibility for:
- cluster health and resource usage
- ingress traffic and error rates
- application performance indicators
Dashboards follow the RED method (Rate, Errors, Duration) for service monitoring.
CI/CD Automation
The platform uses a two-pipeline model.
Infrastructure Pipeline
Responsible for provisioning and updating infrastructure.
Stages:
- Terraform validation
- Terraform plan generation
- Manual approval gate
- Terraform apply
Publishing the plan as an artifact allows review before infrastructure changes are applied.
Application Pipeline
Responsible for building and deploying container workloads.
Stages include:
- container image build
- image push to Azure Container Registry
- Kubernetes deployment
- rolling updates with health checks
Approval gates provide governance while maintaining delivery speed.
Governance with Azure Policy
The AKS cluster integrates the Azure Policy add-on with OPA Gatekeeper.
Policies enforce baseline Kubernetes security practices.
Examples include:
- blocking privileged containers
- requiring resource limits and requests
- preventing use of
:latestimage tags - enforcing health probes
- restricting dangerous Linux capabilities
Policies initially run in audit mode, allowing teams to identify violations before enforcement is enabled.
Platform vs Application Responsibilities
A clear separation of responsibilities enables scalable platform operations.
Platform Team
Responsible for:
- infrastructure provisioning with Terraform
- AKS lifecycle management
- ingress controller configuration
- observability stack
- security baselines and policies
- CI/CD pipeline templates
Application Teams
Responsible for:
- Kubernetes manifests
- application container images
- application-level dashboards
- namespace-specific configuration
- service routing and ingress definitions
This model enables self-service application deployment while maintaining platform consistency.
Possible Extensions
The platform can evolve with additional capabilities:
- multiple environments (dev / staging / production)
- GitOps deployment model using Flux or Argo CD
- private AKS cluster and private endpoints
- cluster autoscaling and cost optimization
- advanced traffic management via Gateway API
Scope
This project intentionally focuses on platform architecture rather than application complexity.
The implementation demonstrates:
- Azure-native Kubernetes platform engineering
- secure workload identity and secret management
- infrastructure automation with Terraform
- observability from day one
- policy-based Kubernetes governance
- CI/CD automation for infrastructure and workloads
A minimal demo application is used only to validate the platform capabilities.
Last Updated: March 2026
Estimated Monthly Cost (dev environment): ~$200



