Short summary (TL;DR):
Azure Kubernetes Service (AKS) is Microsoft’s managed Kubernetes offering: Azure manages the control plane and operational plumbing while you run containerized workloads on node pools (VMs/VMSS). You configure network, identity, storage, monitoring and autoscaling, then deploy apps the same way you would on vanilla Kubernetes. This article explains AKS architecture, how it works, and shows a practical step-by-step cluster configuration (CLI + example manifests). Microsoft Learn+1
1) What AKS is and how it works (high level)
AKS is a managed Kubernetes service: Azure hosts and operates the Kubernetes control plane (API server, scheduler, etc.), automatically patches and upgrades it, and provides integrations (monitoring, ingress, identity, networking). Your responsibility is the node pools (worker VMs), cluster configuration, and the apps that run in Kubernetes. This separation means you focus on apps while Azure handles the Kubernetes control plane operational burden. Microsoft Learn+1
Core runtime model
-
Control plane (managed by Azure): API server, etcd, controller-manager — Azure maintains HA, scaling and upgrades.
-
Node pools (you manage): groups of worker VMs (Virtual Machine Scale Sets by default) that run kubelet/kube-proxy and your pods. AKS supports multiple node pools (Linux/Windows, GPU, spot). Microsoft Learn+1
-
Add-ons / Extensions: monitoring agent (Azure Monitor/Container Insights), ingress controllers, virtual nodes (ACI), and cluster extensions. Microsoft Learn+1
Common AKS features you’ll encounter
-
Managed identities and Microsoft Entra (Azure AD) integration for auth.
-
Multiple networking models (Azure CNI, kubenet, and Azure CNI overlay). Note: legacy kubenet behaviour and legacy CNIs are evolving — plan your networking. Microsoft Learn+1
2) Key AKS components (quick reference)
-
Control plane (managed) — invisible VM/tenant to you; Azure guarantees availability and upgrades. Microsoft Learn
-
Node Pools — system vs user pools; each pool is a VM scale set (VMSS) and can be sized/typed independently. Use system pools for critical system pods and user pools for workloads. Microsoft Learn+1
-
Networking — choose between
azure
(Azure CNI) orkubenet
(basic) or overlay modes; each has IP planning/scale tradeoffs. Kubenet is being phased/retired in legacy docs — plan migration if you rely on it. Microsoft Learn+1 -
Identity — use managed identities + Microsoft Entra for user auth and Workload Identity (OIDC + federated credentials) for pod-level access to Azure resources (preferred over old AAD Pod Identity). Microsoft Learn+1
-
Monitoring & Logging — Azure Monitor / Container Insights (Log Analytics + Managed Prometheus) hooks into AKS for cluster and workload telemetry. Microsoft Learn+1
-
Autoscaling — cluster autoscaler and node-pool autoscaling let AKS scale nodes up/down automatically; pod autoscaling (HPA/VPA) still operates inside Kubernetes. Microsoft Learn
3) When to use AKS
Use AKS when you want a production-ready Kubernetes environment with:
-
Reduced control-plane ops burden (patching/HA/upgrades).
-
Deep Azure integrations (Azure AD, Key Vault, ACR, Monitor, Load Balancer).
-
Ability to run mixed workloads (Linux/Windows/GPU/Spot) with fine control over node pools. Microsoft Azure+1
4) Planning checklist (before you create a cluster)
-
Azure subscription & region — pick a region that supports required VM SKUs and AKS features.
-
Network design — will you use Azure CNI (pods get VNet IPs) or kubenet? (Azure CNI requires careful IP planning; kubenet consumes fewer IPs but is more limited). Consider API server VNet integration and private clusters for higher isolation. Microsoft Learn+1
-
Identity & auth — plan Microsoft Entra (Azure AD) integration and whether you’ll use Workload Identity for pod access to Key Vault and other resources. Microsoft Learn
-
Monitoring & logging — create or select a Log Analytics workspace to enable Container Insights. Microsoft Learn
-
Node sizing — choose VM sizes for system node pool (>=4 vCPU/4GB recommended for system pools) and user pools (on-demand / spot / GPU). Microsoft Learn
5) Step-by-step: create and configure an AKS cluster (CLI focused)
Below is a practical set of commands you can copy/paste and adapt. These follow Azure docs patterns (Azure CLI). Replace variables with your values.
Prerequisites
az login
az account set --subscription <YOUR-SUBSCRIPTION-ID>
az extension add --name aks-preview # optional if you need preview features
1) Create a resource group
az group create --name rg-aks-demo --location eastus
2) (Optional but recommended) Create a Log Analytics workspace for monitoring
az monitor log-analytics workspace create \ --resource-group rg-aks-demo \ --workspace-name aks-logs-demo
You can also let AKS create a workspace automatically when enabling monitoring. Microsoft Learn+1
3) Create an AKS cluster (example — Azure CNI, monitoring, managed identity, OIDC + workload identity enabled)
This example creates a 3-node cluster, enables monitoring, sets Azure CNI networking (pods take VNet IPs), and enables OIDC + workload identity to allow secure pod → Azure resource access.
# variables
RG=rg-aks-demo
CLUSTER=aks-demo
LOC=eastus
WORKSPACE_ID=$(az monitor log-analytics workspace show -g $RG -n aks-logs-demo --query id -o tsv)
az aks create \
--resource-group $RG \
--name $CLUSTER \
--location $LOC \
--node-count 3 \
--node-vm-size Standard_D2s_v3 \
--network-plugin azure \
--enable-managed-identity \
--enable-addons monitoring \
--workspace-resource-id $WORKSPACE_ID \
--generate-ssh-keys \
--enable-oidc-issuer \
--enable-workload-identity
Notes & references: --network-plugin azure
chooses Azure CNI; --enable-addons monitoring
wires Container Insights; --enable-oidc-issuer --enable-workload-identity
prepares the cluster for Microsoft Entra Workload Identity (pod → Azure resource federation). See Azure AKS CLI quickstarts for full parameter details. Microsoft Learn+2Microsoft Learn+2
4) Get kubeconfig (connect to cluster)
az aks get-credentials --resource-group $RG --name $CLUSTER
kubectl get nodes
5) Add a new user node pool (example: spot / autoscaling)
az aks nodepool add \
--resource-group $RG \
--cluster-name $CLUSTER \
--name spotpool \
--node-count 1 \
--priority Spot \
--spot-max-price -1 \
--enable-cluster-autoscaler \
--min-count 0 \
--max-count 3
You can also enable the cluster autoscaler cluster-wide or on individual node pools; AKS supports both. Microsoft Learn+1
6) Enable/adjust cluster autoscaler after creation
# enable cluster autoscaler on the cluster (cluster-wide)
az aks update --resource-group $RG --name $CLUSTER --enable-cluster-autoscaler --min-count 1 --max-count 5
# or update a specific node pool
az aks nodepool update \
--resource-group $RG \
--cluster-name $CLUSTER \
--name spotpool \
--update-cluster-autoscaler \
--min-count 0 \
--max-count 3
Autoscaler docs explain min/max constraints and tuning. Microsoft Learn
6) Deploy a simple .NET Core app to AKS (example manifests)
Assume you have container image myregistry.azurecr.io/echo-api:1.0
pushed to ACR (or Docker Hub).
Deployment + Service
# deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: echo-api
spec:
replicas: 2
selector:
matchLabels:
app: echo-api
template:
metadata:
labels:
app: echo-api
spec:
containers:
- name: echo-api
image: myregistry.azurecr.io/echo-api:1.0
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
---
apiVersion: v1
kind: Service
metadata:
name: echo-api-svc
spec:
selector:
app: echo-api
ports:
- port: 80
targetPort: 80
type: ClusterIP
Ingress (NGINX) — assumes NGINX ingress controller is installed
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: echo-api-ingress
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: echo.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: echo-api-svc
port:
number: 80
AKS supports several ingress choices (NGINX, Application Gateway, Istio/Envoy); choose based on features and Azure integrations. Microsoft Learn+1
7) Identity: Workload Identity (pod → Azure resources) — short how-to
-
Ensure cluster has OIDC issuer & workload identity enabled (
--enable-oidc-issuer --enable-workload-identity
). Azure Docs -
Create a user-assigned managed identity (UAMI) and note its
clientId
. -
Create a Kubernetes ServiceAccount annotated with
azure.workload.identity/client-id: "<clientId>"
. -
Create a Federated Identity Credential connecting the UAMI to the cluster’s OIDC issuer and the service account subject.
-
Grant the UAMI RBAC rights on the Azure resource (Key Vault, Storage, etc.).
-
Your pod can now acquire tokens via the workload identity flow — no secrets in Kubernetes. See the AKS workload identity tutorials for commands and examples. Microsoft Learn+1
8) Monitoring, logging & observability
-
Enable Container Insights (Log Analytics) when creating the cluster (
--enable-addons monitoring
) or afterwards withaz aks enable-addons
. The Container Insights agent (daemonset) collects logs and metrics; you can add Managed Prometheus and Managed Grafana for metrics and dashboards. Microsoft Learn+1 -
Use alerts & dashboards: push critical metrics (node pressure, pod restarts, app errors) into Log Alerts and dashboards in Azure Monitor. Microsoft Learn
9) Security & production hardening (top checklist)
-
Use managed Identities / Microsoft Entra instead of long-lived service principals. Enable Azure RBAC for Kubernetes Authorization if you want unified Azure role management. Microsoft Learn+1
-
Private clusters / API server VNet integration for sensitive environments. Microsoft Learn
-
Network policies (Calico / Azure) to isolate namespaces/pods.
-
Pod security / admission (use policies to limit capabilities).
-
Image scanning / supply chain (CI pipeline should scan images before push).
-
Control plane & node upgrades — test upgrades in staging; use node pool rolling upgrades and max-surge settings. Microsoft Learn
10) Operational topics: upgrades, autoscale, cost, and limits
-
Upgrades: AKS supports in-place control plane upgrades; node pools are upgraded separately to control disruption. Use multiple node pools and max-surge to avoid downtime. Microsoft Learn
-
Autoscaler: cluster autoscaler handles node scaling (respecting min/max), HPA handles pod scaling. Tune autoscaler profile to match workload pattern. Microsoft Learn
-
Cost control: use spot node pools for fault-tolerant tasks, right-size VM SKUs, and monitor consumption (Log Analytics + Cost Management).
-
Quotas & IP planning: Azure CNI consumes VNet IPs per pod; plan subnets accordingly (or use Azure CNI overlay to save IP consumption). Kubenet may be simpler but has limitations and legacy behavior to consider. Microsoft Learn+1
11) Useful AKS CLI snippets (cheat sheet)
# Show AKS versions available in region
az aks get-versions -l eastus -o table
# Add node pool (Linux)
az aks nodepool add -g rg-aks-demo -c aks-demo -n pool1 --node-count 2 --node-vm-size Standard_D4s_v3
# Scale node pool
az aks nodepool scale -g rg-aks-demo --cluster-name aks-demo --name pool1 --node-count 4
# Enable monitoring addon
az aks enable-addons --addons monitoring -g rg-aks-demo -n aks-demo --workspace-resource-id $WORKSPACE_ID
# Update cluster to enable OIDC + workload identity
az aks update -g rg-aks-demo -n aks-demo --enable-oidc-issuer --enable-workload-identity
Refer to Azure CLI az aks
docs and az aks nodepool
docs for complete parameter lists. Microsoft Learn+1