Runbook: Scale a Service

Configure KEDA autoscaling rules for Azure Container Apps services in the Microtec ERP platform.

Audience: DevOps, backend engineers
Prerequisites: az CLI authenticated, contributor access to the target resource group

Overview

Azure Container Apps uses KEDA (Kubernetes-based Event-Driven Autoscaling) natively. Scaling rules are defined in services-config.json and applied via pipeline or az CLI.

Scaling concepts

Concept	Description
`minReplicas`	Minimum running instances (set to 0 for scale-to-zero)
`maxReplicas`	Maximum running instances (hard cap)
`trigger`	The metric that drives scaling decisions
`cooldownPeriod`	Seconds to wait before scaling down (default 300 s)

[WARNING] Setting minReplicas: 0 enables scale-to-zero. The first request after a cold start will experience 2–10 s latency while a new replica starts. Use minReplicas: 1 for latency-sensitive services.

Scaling Rule Types

1. CPU-Based Scaling

Scale out when average CPU across all replicas exceeds a threshold.

jsonc

// services-config.json — inside the service entry
{
  "minReplicas": 1,
  "maxReplicas": 10,
  "triggers": [
    {
      "name": "cpu-trigger",
      "type": "cpu",
      "metadata": {
        "type": "Utilization",
        "value": "70"    // Scale out when CPU > 70%
      }
    }
  ]
}

Recommended CPU thresholds by service type:

Service Type	Scale-Out Threshold	`minReplicas`	`maxReplicas`
API (light)	70%	1	5
API (heavy)	60%	2	10
Background worker	80%	1	8
Reporting service	60%	1	6

2. Memory-Based Scaling

Scale out when average memory consumption exceeds a threshold.

jsonc

{
  "minReplicas": 1,
  "maxReplicas": 8,
  "triggers": [
    {
      "name": "memory-trigger",
      "type": "memory",
      "metadata": {
        "type": "Utilization",
        "value": "75"    // Scale out when memory > 75%
      }
    }
  ]
}

[INFO] Memory scaling is useful for services that hold in-memory caches or process large documents (e.g., Reporting.Apis, Import.Apis).

3. HTTP Request–Based Scaling

Scale based on concurrent HTTP requests per replica. Best for API services with unpredictable traffic bursts.

jsonc

{
  "minReplicas": 1,
  "maxReplicas": 20,
  "triggers": [
    {
      "name": "http-trigger",
      "type": "http",
      "metadata": {
        "concurrentRequests": "100"  // Add replica when > 100 concurrent requests
      }
    }
  ]
}

Recommended HTTP thresholds:

Scenario	`concurrentRequests`
Lightweight CRUD APIs	150
Medium-complexity APIs	100
Heavy processing APIs	50
Reporting / export	20

4. Cron Warmup (Schedule-Based Minimum Replicas)

Pre-warm replicas before expected traffic peaks (e.g., business hours in KSA).

jsonc

{
  "minReplicas": 1,
  "maxReplicas": 10,
  "triggers": [
    {
      "name": "http-trigger",
      "type": "http",
      "metadata": {
        "concurrentRequests": "100"
      }
    },
    {
      "name": "cron-warmup",
      "type": "cron",
      "metadata": {
        "timezone": "Asia/Riyadh",
        "start": "0 7 * * 0-4",    // 07:00 Sun–Thu KSA
        "end":   "0 20 * * 0-4",   // 20:00 Sun–Thu KSA
        "desiredReplicas": "3"     // Minimum 3 replicas during business hours
      }
    }
  ]
}

Common cron expressions (Asia/Riyadh):

Schedule	Expression
Business hours (Sun–Thu, 07:00–20:00)	start: `0 7 * * 0-4` / end: `0 20 * * 0-4`
Extended hours (07:00–23:00)	start: `0 7 * * 0-4` / end: `0 23 * * 0-4`
Night batch window (01:00–05:00)	start: `0 1 * * ` / end: `0 5 * *`

5. Combined Triggers (Recommended Pattern)

Use HTTP + cron together for production services — cron ensures availability during peak hours, HTTP handles unexpected bursts:

jsonc

{
  "name": "apps-portal",
  "minReplicas": 1,
  "maxReplicas": 15,
  "triggers": [
    {
      "name": "http-trigger",
      "type": "http",
      "metadata": { "concurrentRequests": "100" }
    },
    {
      "name": "cron-warmup",
      "type": "cron",
      "metadata": {
        "timezone": "Asia/Riyadh",
        "start": "0 7 * * 0-4",
        "end":   "0 20 * * 0-4",
        "desiredReplicas": "2"
      }
    }
  ]
}

Where to Configure

Option A — `services-config.json` (pipeline-managed, recommended)

File: Devops/azure/config/container-backend/services-config.json

Changes here are applied on the next pipeline run. This is the source of truth — always update this file first.

jsonc

{
  "services": [
    {
      "name": "apps-portal",
      // ... other config ...
      "minReplicas": 1,
      "maxReplicas": 10,
      "triggers": [
        // ... trigger definitions from above ...
      ]
    }
  ]
}

Option B — `az containerapp update` (immediate, hotfix only)

Use this only for urgent scaling changes. Always back-port the change to services-config.json afterward.

bash

export ENV="dev"
export SVC="apps-portal"
export RG="mic-erp-be-${ENV}-containers-rg"
export APP="mic-erp-be-${ENV}-${SVC}"

# [ACTION] Update min/max replicas immediately
az containerapp update \
  --name "${APP}" \
  --resource-group "${RG}" \
  --min-replicas 2 \
  --max-replicas 15

# [ACTION] Add/replace a CPU trigger
az containerapp update \
  --name "${APP}" \
  --resource-group "${RG}" \
  --scale-rule-name "cpu-trigger" \
  --scale-rule-type "cpu" \
  --scale-rule-metadata "type=Utilization" "value=70"

# [ACTION] Add/replace an HTTP trigger
az containerapp update \
  --name "${APP}" \
  --resource-group "${RG}" \
  --scale-rule-name "http-trigger" \
  --scale-rule-type "http" \
  --scale-rule-metadata "concurrentRequests=100"

Apply via Pipeline Redeploy

After updating services-config.json, trigger a pipeline run:

bash

# [ACTION] Trigger pipeline via Azure DevOps CLI
az pipelines run \
  --name "deploy-container-backend" \
  --parameters environment=dev \
  --org https://dev.azure.com/microtec \
  --project ERP

Or manually via Azure DevOps UI: Pipelines → Select pipeline → Run pipeline → Choose environment.

Verify Scaling is Working

bash

# [VERIFY] Check current replica count
az containerapp replica list \
  --name "mic-erp-be-dev-apps-portal" \
  --resource-group "mic-erp-be-dev-containers-rg" \
  --query "[].name" -o tsv

# [VERIFY] Check scale rules applied to the app
az containerapp show \
  --name "mic-erp-be-dev-apps-portal" \
  --resource-group "mic-erp-be-dev-containers-rg" \
  --query "properties.template.scale" -o json

Expected output

json

{
  "minReplicas": 1,
  "maxReplicas": 10,
  "rules": [
    {
      "name": "http-trigger",
      "custom": {
        "type": "http",
        "metadata": { "concurrentRequests": "100" }
      }
    }
  ]
}

Monitor Autoscaling Activity

Navigate to App Insights or Seq to observe scaling events:

kusto

// App Insights — KQL: replica count over time
customMetrics
| where name == "ContainerAppReplicaCount"
| where customDimensions["ContainerAppName"] == "mic-erp-be-dev-apps-portal"
| summarize avg(value) by bin(timestamp, 5m)
| render timechart

bash

# [INFO] Watch live logs during a scaling event
az containerapp logs show \
  --name "mic-erp-be-dev-apps-portal" \
  --resource-group "mic-erp-be-dev-containers-rg" \
  --follow --tail 50

Scaling Limits by Environment

Environment	Typical maxReplicas	Notes
dev	3–5	Cost optimised; scale-to-zero acceptable
stage	5–10	Mirror prod behaviour; `minReplicas: 1`
preprod	5–10	Load testing may exceed this temporarily
uat	3–5	Matches stage configuration
production	10–30	Set based on capacity planning

[WARNING] Increasing maxReplicas above the subscription quota limit will silently cap scaling. Check az containerapp env show for the current workload profile limits.

Rollback Scaling Changes

bash

# [ROLLBACK] If a scaling change causes instability, revert to safe defaults
az containerapp update \
  --name "mic-erp-be-dev-apps-portal" \
  --resource-group "mic-erp-be-dev-containers-rg" \
  --min-replicas 1 \
  --max-replicas 3 \
  --remove-scale-rule "http-trigger" \
  --remove-scale-rule "cron-warmup"

Deploy New Service — initial service setup before scaling configuration
Incident Response — handle outages caused by scaling misconfiguration

Runbook: Scale a Service ​

Overview ​

Scaling concepts ​

Scaling Rule Types ​

1. CPU-Based Scaling ​

2. Memory-Based Scaling ​

3. HTTP Request–Based Scaling ​

4. Cron Warmup (Schedule-Based Minimum Replicas) ​

5. Combined Triggers (Recommended Pattern) ​

Where to Configure ​

Option A — services-config.json (pipeline-managed, recommended) ​

Option B — az containerapp update (immediate, hotfix only) ​

Apply via Pipeline Redeploy ​

Verify Scaling is Working ​

Expected output ​

Monitor Autoscaling Activity ​

Scaling Limits by Environment ​

Rollback Scaling Changes ​

Related Runbooks ​

Runbook: Scale a Service

Overview

Scaling concepts

Scaling Rule Types

1. CPU-Based Scaling

2. Memory-Based Scaling

3. HTTP Request–Based Scaling

4. Cron Warmup (Schedule-Based Minimum Replicas)

5. Combined Triggers (Recommended Pattern)

Where to Configure

Option A — `services-config.json` (pipeline-managed, recommended)

Option B — `az containerapp update` (immediate, hotfix only)

Apply via Pipeline Redeploy

Verify Scaling is Working

Expected output

Monitor Autoscaling Activity

Scaling Limits by Environment

Rollback Scaling Changes

Related Runbooks