Skip to content

Runbook: Scale a Service

Configure KEDA autoscaling rules for Azure Container Apps services in the Microtec ERP platform.

Audience: DevOps, backend engineers
Prerequisites: az CLI authenticated, contributor access to the target resource group


Overview

Azure Container Apps uses KEDA (Kubernetes-based Event-Driven Autoscaling) natively. Scaling rules are defined in services-config.json and applied via pipeline or az CLI.

Scaling concepts

ConceptDescription
minReplicasMinimum running instances (set to 0 for scale-to-zero)
maxReplicasMaximum running instances (hard cap)
triggerThe metric that drives scaling decisions
cooldownPeriodSeconds to wait before scaling down (default 300 s)

[WARNING] Setting minReplicas: 0 enables scale-to-zero. The first request after a cold start will experience 2–10 s latency while a new replica starts. Use minReplicas: 1 for latency-sensitive services.


Scaling Rule Types

1. CPU-Based Scaling

Scale out when average CPU across all replicas exceeds a threshold.

jsonc
// services-config.json — inside the service entry
{
  "minReplicas": 1,
  "maxReplicas": 10,
  "triggers": [
    {
      "name": "cpu-trigger",
      "type": "cpu",
      "metadata": {
        "type": "Utilization",
        "value": "70"    // Scale out when CPU > 70%
      }
    }
  ]
}

Recommended CPU thresholds by service type:

Service TypeScale-Out ThresholdminReplicasmaxReplicas
API (light)70%15
API (heavy)60%210
Background worker80%18
Reporting service60%16

2. Memory-Based Scaling

Scale out when average memory consumption exceeds a threshold.

jsonc
{
  "minReplicas": 1,
  "maxReplicas": 8,
  "triggers": [
    {
      "name": "memory-trigger",
      "type": "memory",
      "metadata": {
        "type": "Utilization",
        "value": "75"    // Scale out when memory > 75%
      }
    }
  ]
}

[INFO] Memory scaling is useful for services that hold in-memory caches or process large documents (e.g., Reporting.Apis, Import.Apis).


3. HTTP Request–Based Scaling

Scale based on concurrent HTTP requests per replica. Best for API services with unpredictable traffic bursts.

jsonc
{
  "minReplicas": 1,
  "maxReplicas": 20,
  "triggers": [
    {
      "name": "http-trigger",
      "type": "http",
      "metadata": {
        "concurrentRequests": "100"  // Add replica when > 100 concurrent requests
      }
    }
  ]
}

Recommended HTTP thresholds:

ScenarioconcurrentRequests
Lightweight CRUD APIs150
Medium-complexity APIs100
Heavy processing APIs50
Reporting / export20

4. Cron Warmup (Schedule-Based Minimum Replicas)

Pre-warm replicas before expected traffic peaks (e.g., business hours in KSA).

jsonc
{
  "minReplicas": 1,
  "maxReplicas": 10,
  "triggers": [
    {
      "name": "http-trigger",
      "type": "http",
      "metadata": {
        "concurrentRequests": "100"
      }
    },
    {
      "name": "cron-warmup",
      "type": "cron",
      "metadata": {
        "timezone": "Asia/Riyadh",
        "start": "0 7 * * 0-4",    // 07:00 Sun–Thu KSA
        "end":   "0 20 * * 0-4",   // 20:00 Sun–Thu KSA
        "desiredReplicas": "3"     // Minimum 3 replicas during business hours
      }
    }
  ]
}

Common cron expressions (Asia/Riyadh):

ScheduleExpression
Business hours (Sun–Thu, 07:00–20:00)start: 0 7 * * 0-4 / end: 0 20 * * 0-4
Extended hours (07:00–23:00)start: 0 7 * * 0-4 / end: 0 23 * * 0-4
Night batch window (01:00–05:00)start: 0 1 * * * / end: 0 5 * * *

Use HTTP + cron together for production services — cron ensures availability during peak hours, HTTP handles unexpected bursts:

jsonc
{
  "name": "apps-portal",
  "minReplicas": 1,
  "maxReplicas": 15,
  "triggers": [
    {
      "name": "http-trigger",
      "type": "http",
      "metadata": { "concurrentRequests": "100" }
    },
    {
      "name": "cron-warmup",
      "type": "cron",
      "metadata": {
        "timezone": "Asia/Riyadh",
        "start": "0 7 * * 0-4",
        "end":   "0 20 * * 0-4",
        "desiredReplicas": "2"
      }
    }
  ]
}

Where to Configure

File: Devops/azure/config/container-backend/services-config.json

Changes here are applied on the next pipeline run. This is the source of truth — always update this file first.

jsonc
{
  "services": [
    {
      "name": "apps-portal",
      // ... other config ...
      "minReplicas": 1,
      "maxReplicas": 10,
      "triggers": [
        // ... trigger definitions from above ...
      ]
    }
  ]
}

Option B — az containerapp update (immediate, hotfix only)

Use this only for urgent scaling changes. Always back-port the change to services-config.json afterward.

bash
export ENV="dev"
export SVC="apps-portal"
export RG="mic-erp-be-${ENV}-containers-rg"
export APP="mic-erp-be-${ENV}-${SVC}"

# [ACTION] Update min/max replicas immediately
az containerapp update \
  --name "${APP}" \
  --resource-group "${RG}" \
  --min-replicas 2 \
  --max-replicas 15

# [ACTION] Add/replace a CPU trigger
az containerapp update \
  --name "${APP}" \
  --resource-group "${RG}" \
  --scale-rule-name "cpu-trigger" \
  --scale-rule-type "cpu" \
  --scale-rule-metadata "type=Utilization" "value=70"

# [ACTION] Add/replace an HTTP trigger
az containerapp update \
  --name "${APP}" \
  --resource-group "${RG}" \
  --scale-rule-name "http-trigger" \
  --scale-rule-type "http" \
  --scale-rule-metadata "concurrentRequests=100"

Apply via Pipeline Redeploy

After updating services-config.json, trigger a pipeline run:

bash
# [ACTION] Trigger pipeline via Azure DevOps CLI
az pipelines run \
  --name "deploy-container-backend" \
  --parameters environment=dev \
  --org https://dev.azure.com/microtec \
  --project ERP

Or manually via Azure DevOps UI: Pipelines → Select pipeline → Run pipeline → Choose environment.


Verify Scaling is Working

bash
# [VERIFY] Check current replica count
az containerapp replica list \
  --name "mic-erp-be-dev-apps-portal" \
  --resource-group "mic-erp-be-dev-containers-rg" \
  --query "[].name" -o tsv

# [VERIFY] Check scale rules applied to the app
az containerapp show \
  --name "mic-erp-be-dev-apps-portal" \
  --resource-group "mic-erp-be-dev-containers-rg" \
  --query "properties.template.scale" -o json

Expected output

json
{
  "minReplicas": 1,
  "maxReplicas": 10,
  "rules": [
    {
      "name": "http-trigger",
      "custom": {
        "type": "http",
        "metadata": { "concurrentRequests": "100" }
      }
    }
  ]
}

Monitor Autoscaling Activity

Navigate to App Insights or Seq to observe scaling events:

kusto
// App Insights — KQL: replica count over time
customMetrics
| where name == "ContainerAppReplicaCount"
| where customDimensions["ContainerAppName"] == "mic-erp-be-dev-apps-portal"
| summarize avg(value) by bin(timestamp, 5m)
| render timechart
bash
# [INFO] Watch live logs during a scaling event
az containerapp logs show \
  --name "mic-erp-be-dev-apps-portal" \
  --resource-group "mic-erp-be-dev-containers-rg" \
  --follow --tail 50

Scaling Limits by Environment

EnvironmentTypical maxReplicasNotes
dev3–5Cost optimised; scale-to-zero acceptable
stage5–10Mirror prod behaviour; minReplicas: 1
preprod5–10Load testing may exceed this temporarily
uat3–5Matches stage configuration
production10–30Set based on capacity planning

[WARNING] Increasing maxReplicas above the subscription quota limit will silently cap scaling. Check az containerapp env show for the current workload profile limits.


Rollback Scaling Changes

bash
# [ROLLBACK] If a scaling change causes instability, revert to safe defaults
az containerapp update \
  --name "mic-erp-be-dev-apps-portal" \
  --resource-group "mic-erp-be-dev-containers-rg" \
  --min-replicas 1 \
  --max-replicas 3 \
  --remove-scale-rule "http-trigger" \
  --remove-scale-rule "cron-warmup"

Internal Documentation — Microtec Platform Team