Appearance
ADR-002: Azure Container Apps as Container Runtime
Status
Accepted
Date
2023-Q2
Context
Following ADR-001 (microservices architecture), we needed a container runtime to host 14+ services. The choices evaluated were:
| Option | Description |
|---|---|
| Azure Kubernetes Service (AKS) | Full Kubernetes cluster, maximum control |
| Azure App Service (Containers) | Managed PaaS, simple, limited scaling |
| Azure Container Apps (ACA) | Serverless containers, KEDA built-in, VNet integration |
| Azure Container Instances (ACI) | Simple containers, no orchestration |
AKS vs ACA Evaluation
| Criterion | AKS | ACA |
|---|---|---|
| Operational overhead | High (cluster management, upgrades) | Low (fully managed) |
| Scaling | Full KEDA + HPA + VPA | KEDA built-in |
| VNet integration | Full | Supported (with limitations) |
| Service mesh | BYO (Istio/Linkerd) or OSM | Built-in Dapr / Envoy mTLS |
| Cost (idle) | Node pool minimum | Scales to zero |
| Learning curve | High (Kubernetes expertise needed) | Low |
| Control | Full | Limited to ACA abstractions |
| Debugging | kubectl + full k8s tooling | ACA logs + metrics |
App Service vs ACA
App Service was evaluated and rejected because:
- No KEDA for event-driven scaling
- No built-in VNet-private service-to-service communication
- Higher cost for multiple services (separate App Service plan per service)
Decision
Use Azure Container Apps (ACA) as the container runtime for all Microtec ERP services.
Specific configuration choices:
- Two CAEs per environment: Public CAE (Gateway + Keycloak, internet-facing) and Private CAE (all other services, VNet-internal only)
- mTLS enabled on Private CAE for service-to-service communication
- KEDA HTTP scaler for all services
- Managed identity for ACR image pulls (no registry credentials stored in secrets)
- Scale to zero in dev/stage; minimum 1 replica in prod
Dual-CAE Architecture
Services in the Private CAE have no public ingress — they can only be reached from within the VNet.
Consequences
Positive
- No Kubernetes ops: No cluster upgrades, node pool management, or Kubernetes API versioning concerns
- KEDA built-in: Event-driven scaling (HTTP, ASB queue depth) without custom configuration
- Scale to zero: Dev/stage environments idle at zero cost
- Built-in mTLS: Service-to-service encryption without a service mesh deployment
- Managed identity: ACR pull authentication without secrets
- Simpler deployments:
az containerapp update --image ...vs complex Helm charts - Cost efficiency: Pay per use; idle stage services cost nothing
Negative
- ACA abstractions: Cannot use arbitrary Kubernetes features (custom CRDs, advanced networking)
- VNet integration limitations: ACA VNet integration has specific subnet size requirements (minimum /23); IP exhaustion risk in complex topologies
- Cold start latency: Scale-to-zero services have ~10–30s cold start on first request after idle period (stage only; prod has min 1 replica)
- Limited pod-level debugging: No
kubectl exec; must use ACA console or log streaming - Workload profiles: Dedicated workload profiles required for higher memory/CPU — added cost complexity
- Dapr vs direct HTTP: We chose direct HTTP (YARP gateway) over Dapr sidecars — Dapr's additional complexity not justified for our patterns
Neutral
- Azure-specific: We are committed to Azure; ACA is an Azure-only offering
- CAE per environment: 5 environments × 2 CAEs = 10 CAEs total (manageable via Bicep)
- Log streaming: ACA log streaming via
az containerapp logs showis functional but less ergonomic thankubectl logs
VNet CIDR Allocation
Each environment uses a dedicated VNet to ensure isolation:
| Environment | VNet CIDR |
|---|---|
| dev | 10.0.0.0/16 |
| stage | 10.1.0.0/16 |
| preprod | 10.6.0.0/16 |
| uat | 10.5.0.0/16 |
| production | 10.2.0.0/16 |
| shared-sql | 10.100.0.0/16 |
Known Issues
Init Container VNet Gotcha
ACA init containers may fail on first deployment after VNet changes because the VNet integration is not fully established when the init container is scheduled. Mitigation: retry logic in init containers or separate migration jobs (see also ADR-009 / Fooj documentation).
Subnet Sizing
ACA requires a minimum /23 subnet for the CAE infrastructure. Attempting to use smaller subnets causes deployment failures with misleading error messages.
Related ADRs
- ADR-001: Microservices architecture (this decision serves ADR-001)
- ADR-009: Fooj shared NAT (a consequence of ACA VNet integration)