Skip to content

ADR-002: Azure Container Apps as Container Runtime

Status

Accepted

Date

2023-Q2


Context

Following ADR-001 (microservices architecture), we needed a container runtime to host 14+ services. The choices evaluated were:

OptionDescription
Azure Kubernetes Service (AKS)Full Kubernetes cluster, maximum control
Azure App Service (Containers)Managed PaaS, simple, limited scaling
Azure Container Apps (ACA)Serverless containers, KEDA built-in, VNet integration
Azure Container Instances (ACI)Simple containers, no orchestration

AKS vs ACA Evaluation

CriterionAKSACA
Operational overheadHigh (cluster management, upgrades)Low (fully managed)
ScalingFull KEDA + HPA + VPAKEDA built-in
VNet integrationFullSupported (with limitations)
Service meshBYO (Istio/Linkerd) or OSMBuilt-in Dapr / Envoy mTLS
Cost (idle)Node pool minimumScales to zero
Learning curveHigh (Kubernetes expertise needed)Low
ControlFullLimited to ACA abstractions
Debuggingkubectl + full k8s toolingACA logs + metrics

App Service vs ACA

App Service was evaluated and rejected because:

  • No KEDA for event-driven scaling
  • No built-in VNet-private service-to-service communication
  • Higher cost for multiple services (separate App Service plan per service)

Decision

Use Azure Container Apps (ACA) as the container runtime for all Microtec ERP services.

Specific configuration choices:

  1. Two CAEs per environment: Public CAE (Gateway + Keycloak, internet-facing) and Private CAE (all other services, VNet-internal only)
  2. mTLS enabled on Private CAE for service-to-service communication
  3. KEDA HTTP scaler for all services
  4. Managed identity for ACR image pulls (no registry credentials stored in secrets)
  5. Scale to zero in dev/stage; minimum 1 replica in prod

Dual-CAE Architecture

Services in the Private CAE have no public ingress — they can only be reached from within the VNet.


Consequences

Positive

  • No Kubernetes ops: No cluster upgrades, node pool management, or Kubernetes API versioning concerns
  • KEDA built-in: Event-driven scaling (HTTP, ASB queue depth) without custom configuration
  • Scale to zero: Dev/stage environments idle at zero cost
  • Built-in mTLS: Service-to-service encryption without a service mesh deployment
  • Managed identity: ACR pull authentication without secrets
  • Simpler deployments: az containerapp update --image ... vs complex Helm charts
  • Cost efficiency: Pay per use; idle stage services cost nothing

Negative

  • ACA abstractions: Cannot use arbitrary Kubernetes features (custom CRDs, advanced networking)
  • VNet integration limitations: ACA VNet integration has specific subnet size requirements (minimum /23); IP exhaustion risk in complex topologies
  • Cold start latency: Scale-to-zero services have ~10–30s cold start on first request after idle period (stage only; prod has min 1 replica)
  • Limited pod-level debugging: No kubectl exec; must use ACA console or log streaming
  • Workload profiles: Dedicated workload profiles required for higher memory/CPU — added cost complexity
  • Dapr vs direct HTTP: We chose direct HTTP (YARP gateway) over Dapr sidecars — Dapr's additional complexity not justified for our patterns

Neutral

  • Azure-specific: We are committed to Azure; ACA is an Azure-only offering
  • CAE per environment: 5 environments × 2 CAEs = 10 CAEs total (manageable via Bicep)
  • Log streaming: ACA log streaming via az containerapp logs show is functional but less ergonomic than kubectl logs

VNet CIDR Allocation

Each environment uses a dedicated VNet to ensure isolation:

EnvironmentVNet CIDR
dev10.0.0.0/16
stage10.1.0.0/16
preprod10.6.0.0/16
uat10.5.0.0/16
production10.2.0.0/16
shared-sql10.100.0.0/16

Known Issues

Init Container VNet Gotcha

ACA init containers may fail on first deployment after VNet changes because the VNet integration is not fully established when the init container is scheduled. Mitigation: retry logic in init containers or separate migration jobs (see also ADR-009 / Fooj documentation).

Subnet Sizing

ACA requires a minimum /23 subnet for the CAE infrastructure. Attempting to use smaller subnets causes deployment failures with misleading error messages.


  • ADR-001: Microservices architecture (this decision serves ADR-001)
  • ADR-009: Fooj shared NAT (a consequence of ACA VNet integration)

Internal Documentation — Microtec Platform Team