Appearance
Runbook: Scale a Service
Configure KEDA autoscaling rules for Azure Container Apps services in the Microtec ERP platform.
Audience: DevOps, backend engineers
Prerequisites:azCLI authenticated, contributor access to the target resource group
Overview
Azure Container Apps uses KEDA (Kubernetes-based Event-Driven Autoscaling) natively. Scaling rules are defined in services-config.json and applied via pipeline or az CLI.
Scaling concepts
| Concept | Description |
|---|---|
minReplicas | Minimum running instances (set to 0 for scale-to-zero) |
maxReplicas | Maximum running instances (hard cap) |
trigger | The metric that drives scaling decisions |
cooldownPeriod | Seconds to wait before scaling down (default 300 s) |
[WARNING] Setting
minReplicas: 0enables scale-to-zero. The first request after a cold start will experience 2–10 s latency while a new replica starts. UseminReplicas: 1for latency-sensitive services.
Scaling Rule Types
1. CPU-Based Scaling
Scale out when average CPU across all replicas exceeds a threshold.
jsonc
// services-config.json — inside the service entry
{
"minReplicas": 1,
"maxReplicas": 10,
"triggers": [
{
"name": "cpu-trigger",
"type": "cpu",
"metadata": {
"type": "Utilization",
"value": "70" // Scale out when CPU > 70%
}
}
]
}Recommended CPU thresholds by service type:
| Service Type | Scale-Out Threshold | minReplicas | maxReplicas |
|---|---|---|---|
| API (light) | 70% | 1 | 5 |
| API (heavy) | 60% | 2 | 10 |
| Background worker | 80% | 1 | 8 |
| Reporting service | 60% | 1 | 6 |
2. Memory-Based Scaling
Scale out when average memory consumption exceeds a threshold.
jsonc
{
"minReplicas": 1,
"maxReplicas": 8,
"triggers": [
{
"name": "memory-trigger",
"type": "memory",
"metadata": {
"type": "Utilization",
"value": "75" // Scale out when memory > 75%
}
}
]
}[INFO] Memory scaling is useful for services that hold in-memory caches or process large documents (e.g.,
Reporting.Apis,Import.Apis).
3. HTTP Request–Based Scaling
Scale based on concurrent HTTP requests per replica. Best for API services with unpredictable traffic bursts.
jsonc
{
"minReplicas": 1,
"maxReplicas": 20,
"triggers": [
{
"name": "http-trigger",
"type": "http",
"metadata": {
"concurrentRequests": "100" // Add replica when > 100 concurrent requests
}
}
]
}Recommended HTTP thresholds:
| Scenario | concurrentRequests |
|---|---|
| Lightweight CRUD APIs | 150 |
| Medium-complexity APIs | 100 |
| Heavy processing APIs | 50 |
| Reporting / export | 20 |
4. Cron Warmup (Schedule-Based Minimum Replicas)
Pre-warm replicas before expected traffic peaks (e.g., business hours in KSA).
jsonc
{
"minReplicas": 1,
"maxReplicas": 10,
"triggers": [
{
"name": "http-trigger",
"type": "http",
"metadata": {
"concurrentRequests": "100"
}
},
{
"name": "cron-warmup",
"type": "cron",
"metadata": {
"timezone": "Asia/Riyadh",
"start": "0 7 * * 0-4", // 07:00 Sun–Thu KSA
"end": "0 20 * * 0-4", // 20:00 Sun–Thu KSA
"desiredReplicas": "3" // Minimum 3 replicas during business hours
}
}
]
}Common cron expressions (Asia/Riyadh):
| Schedule | Expression |
|---|---|
| Business hours (Sun–Thu, 07:00–20:00) | start: 0 7 * * 0-4 / end: 0 20 * * 0-4 |
| Extended hours (07:00–23:00) | start: 0 7 * * 0-4 / end: 0 23 * * 0-4 |
| Night batch window (01:00–05:00) | start: 0 1 * * * / end: 0 5 * * * |
5. Combined Triggers (Recommended Pattern)
Use HTTP + cron together for production services — cron ensures availability during peak hours, HTTP handles unexpected bursts:
jsonc
{
"name": "apps-portal",
"minReplicas": 1,
"maxReplicas": 15,
"triggers": [
{
"name": "http-trigger",
"type": "http",
"metadata": { "concurrentRequests": "100" }
},
{
"name": "cron-warmup",
"type": "cron",
"metadata": {
"timezone": "Asia/Riyadh",
"start": "0 7 * * 0-4",
"end": "0 20 * * 0-4",
"desiredReplicas": "2"
}
}
]
}Where to Configure
Option A — services-config.json (pipeline-managed, recommended)
File: Devops/azure/config/container-backend/services-config.json
Changes here are applied on the next pipeline run. This is the source of truth — always update this file first.
jsonc
{
"services": [
{
"name": "apps-portal",
// ... other config ...
"minReplicas": 1,
"maxReplicas": 10,
"triggers": [
// ... trigger definitions from above ...
]
}
]
}Option B — az containerapp update (immediate, hotfix only)
Use this only for urgent scaling changes. Always back-port the change to services-config.json afterward.
bash
export ENV="dev"
export SVC="apps-portal"
export RG="mic-erp-be-${ENV}-containers-rg"
export APP="mic-erp-be-${ENV}-${SVC}"
# [ACTION] Update min/max replicas immediately
az containerapp update \
--name "${APP}" \
--resource-group "${RG}" \
--min-replicas 2 \
--max-replicas 15
# [ACTION] Add/replace a CPU trigger
az containerapp update \
--name "${APP}" \
--resource-group "${RG}" \
--scale-rule-name "cpu-trigger" \
--scale-rule-type "cpu" \
--scale-rule-metadata "type=Utilization" "value=70"
# [ACTION] Add/replace an HTTP trigger
az containerapp update \
--name "${APP}" \
--resource-group "${RG}" \
--scale-rule-name "http-trigger" \
--scale-rule-type "http" \
--scale-rule-metadata "concurrentRequests=100"Apply via Pipeline Redeploy
After updating services-config.json, trigger a pipeline run:
bash
# [ACTION] Trigger pipeline via Azure DevOps CLI
az pipelines run \
--name "deploy-container-backend" \
--parameters environment=dev \
--org https://dev.azure.com/microtec \
--project ERPOr manually via Azure DevOps UI: Pipelines → Select pipeline → Run pipeline → Choose environment.
Verify Scaling is Working
bash
# [VERIFY] Check current replica count
az containerapp replica list \
--name "mic-erp-be-dev-apps-portal" \
--resource-group "mic-erp-be-dev-containers-rg" \
--query "[].name" -o tsv
# [VERIFY] Check scale rules applied to the app
az containerapp show \
--name "mic-erp-be-dev-apps-portal" \
--resource-group "mic-erp-be-dev-containers-rg" \
--query "properties.template.scale" -o jsonExpected output
json
{
"minReplicas": 1,
"maxReplicas": 10,
"rules": [
{
"name": "http-trigger",
"custom": {
"type": "http",
"metadata": { "concurrentRequests": "100" }
}
}
]
}Monitor Autoscaling Activity
Navigate to App Insights or Seq to observe scaling events:
kusto
// App Insights — KQL: replica count over time
customMetrics
| where name == "ContainerAppReplicaCount"
| where customDimensions["ContainerAppName"] == "mic-erp-be-dev-apps-portal"
| summarize avg(value) by bin(timestamp, 5m)
| render timechartbash
# [INFO] Watch live logs during a scaling event
az containerapp logs show \
--name "mic-erp-be-dev-apps-portal" \
--resource-group "mic-erp-be-dev-containers-rg" \
--follow --tail 50Scaling Limits by Environment
| Environment | Typical maxReplicas | Notes |
|---|---|---|
| dev | 3–5 | Cost optimised; scale-to-zero acceptable |
| stage | 5–10 | Mirror prod behaviour; minReplicas: 1 |
| preprod | 5–10 | Load testing may exceed this temporarily |
| uat | 3–5 | Matches stage configuration |
| production | 10–30 | Set based on capacity planning |
[WARNING] Increasing
maxReplicasabove the subscription quota limit will silently cap scaling. Checkaz containerapp env showfor the current workload profile limits.
Rollback Scaling Changes
bash
# [ROLLBACK] If a scaling change causes instability, revert to safe defaults
az containerapp update \
--name "mic-erp-be-dev-apps-portal" \
--resource-group "mic-erp-be-dev-containers-rg" \
--min-replicas 1 \
--max-replicas 3 \
--remove-scale-rule "http-trigger" \
--remove-scale-rule "cron-warmup"Related Runbooks
- Deploy New Service — initial service setup before scaling configuration
- Incident Response — handle outages caused by scaling misconfiguration