Appearance
Runbook: Database Backup and Restore
Audience: DevOps engineers, on-call engineers
Prerequisites: Azure CLI, SQL Server Management Studio or Azure Data Studio, Owner access on the SQL resource group
Overview
Microtec ERP uses Azure SQL with the database-per-tenant model. Each tenant has an independently named database on the shared SQL server (mic-backend-shared-sql). Backups are managed at two levels:
- Automated Azure SQL backups — full, differential, and log backups managed by Azure (default: 7–35 day retention)
- Manual exports —
.bacpacfiles for tenant migration or long-term archival
Automated Azure SQL Backups
Azure SQL automatically performs:
| Backup Type | Frequency | Retention |
|---|---|---|
| Full backup | Weekly | Configurable (7–35 days) |
| Differential backup | Every 12–24 hours | Included in retention window |
| Transaction log backup | Every 5–12 minutes | Included in retention window |
These backups enable point-in-time restore (PITR) to any second within the retention window.
Current Retention Configuration
| Environment | Retention Days |
|---|---|
dev | 7 |
stage | 14 |
preprod | 14 |
uat | 14 |
production | 35 |
Verify Backup Configuration
bash
ENV=production
SQL_SERVER="mic-backend-shared-sql"
RG="mic-backend-shared-sql-rg"
# List all databases and their backup retention
az sql db list \
--server $SQL_SERVER \
--resource-group $RG \
--query "[].{name:name, pitrDays:backupRetentionDays}" \
-o tableAdjust Retention (production change)
bash
# Example: extend retention for a specific tenant database
az sql db update \
--server $SQL_SERVER \
--resource-group $RG \
--name "tenant_acme_production" \
--backup-storage-redundancy Geo \
--retention-days 35Point-in-Time Restore
Use PITR when a tenant's data was accidentally corrupted or deleted.
Step 1 — Identify the restore point
Determine the timestamp (UTC) before the data corruption occurred. Get this from:
- The incident report / tenant complaint
- Azure Monitor audit logs
- Application
Seqstructured logs
Step 2 — Restore to a new database
Never restore over the production database
Always restore to a new database name (e.g., tenant_acme_production_restore_20260530). Validate the restored data before switching the tenant's connection string.
bash
SQL_SERVER="mic-backend-shared-sql"
RG="mic-backend-shared-sql-rg"
SOURCE_DB="tenant_acme_production"
TARGET_DB="tenant_acme_production_restore_$(date +%Y%m%d%H%M)"
RESTORE_POINT="2026-05-30T08:00:00Z" # UTC timestamp before corruption
az sql db restore \
--server $SQL_SERVER \
--resource-group $RG \
--dest-name $TARGET_DB \
--name $SOURCE_DB \
--time $RESTORE_POINT \
--edition GeneralPurpose \
--family Gen5 \
--capacity 2Expected duration: 5–30 minutes depending on database size.
Step 3 — Validate the restored database
Connect to the restored database and verify the expected data state:
bash
# Using sqlcmd (on any server with sqlcmd installed)
sqlcmd \
-S "mic-backend-shared-sql.database.windows.net" \
-d "$TARGET_DB" \
-U sqladmin \
-P '<password>' \
-Q "SELECT COUNT(*) AS invoices FROM Invoices; SELECT MAX(CreatedAt) AS latestInvoice FROM Invoices;"Step 4 — Switch the tenant's connection string
Once the restored database is validated:
- Update the tenant's connection string in the admin database:
sql
-- Run on the admin_db database
UPDATE Tenants
SET ConnectionString = 'Server=mic-backend-shared-sql.database.windows.net;Database=tenant_acme_production_restore_20260530;...'
WHERE Code = 'acme';- Clear the Redis cache for the tenant's connection string resolution:
bash
# Redis CLI (use the correct Redis instance)
redis-cli DEL "tenant:connstr:acme"- Verify the tenant can log in and see correct data.
Step 5 — Clean up the corrupted database
After confirming the restore is good and the tenant is operational:
bash
# Archive the corrupted database first (rename)
az sql db rename \
--server $SQL_SERVER \
--resource-group $RG \
--name "tenant_acme_production" \
--new-name "tenant_acme_production_corrupted_$(date +%Y%m%d)"
# Delete it after 14 days if no further investigation is needed
az sql db delete \
--server $SQL_SERVER \
--resource-group $RG \
--name "tenant_acme_production_corrupted_20260530" \
--yesManual Export (.bacpac)
Use .bacpac exports for:
- Tenant offboarding (hand over their data)
- Long-term cold archival beyond the 35-day retention window
- Database migration to a different SQL server
Export to Azure Blob Storage
bash
STORAGE_ACCOUNT="micerpbeproductionsa"
CONTAINER="db-exports"
EXPORT_FILE="tenant_acme_$(date +%Y%m%d).bacpac"
az sql db export \
--server $SQL_SERVER \
--resource-group $RG \
--name "tenant_acme_production" \
--admin-user "sqladmin" \
--admin-password "<password>" \
--storage-key-type StorageAccessKey \
--storage-key "$(az storage account keys list --account-name $STORAGE_ACCOUNT --query '[0].value' -o tsv)" \
--storage-uri "https://${STORAGE_ACCOUNT}.blob.core.windows.net/${CONTAINER}/${EXPORT_FILE}"Expected duration: 5–60 minutes depending on database size. The command returns an operation ID; track progress:
bash
# The export command prints an operation URL — poll it:
az sql db operation list \
--server $SQL_SERVER \
--resource-group $RG \
--database "tenant_acme_production" \
--query "[?operationType=='Export'].{state:state, percentComplete:percentComplete}" \
-o tableImport from .bacpac
bash
TARGET_DB="tenant_acme_restored_from_bacpac"
BACPAC_URL="https://${STORAGE_ACCOUNT}.blob.core.windows.net/${CONTAINER}/tenant_acme_20260530.bacpac"
az sql db import \
--server $SQL_SERVER \
--resource-group $RG \
--name $TARGET_DB \
--admin-user "sqladmin" \
--admin-password "<password>" \
--storage-key-type StorageAccessKey \
--storage-key "$(az storage account keys list --account-name $STORAGE_ACCOUNT --query '[0].value' -o tsv)" \
--storage-uri $BACPAC_URLAdmin Database Backup
The admin database (admin_db on the shared SQL server) contains the tenant registry and connection strings. It must be restored before any tenant can log in after a disaster recovery scenario.
Priority restore target
In a disaster recovery event, restore the admin database first. Without it, no tenant connection string can be resolved and the entire platform is non-functional.
Admin database restore follows the same PITR process above. Restore target: admin_db_restore_$(date).
After restoring:
- Run all pending EF Core migrations on the restored database
- Update the
BusinessOwnersservice connection string in Key Vault to point to the restored database - Restart the
BusinessOwnerscontainer app - Flush Redis cache:
FLUSHDBon the tenant cache Redis instance
Long-Term Retention (LTR)
For production databases, configure Long-Term Retention beyond the 35-day PITR window:
bash
# Enable weekly LTR backup for up to 1 year
az sql db ltr-policy set \
--server $SQL_SERVER \
--resource-group $RG \
--database "tenant_acme_production" \
--weekly-retention P52W # 52 weeks = 1 yearList LTR backups:
bash
az sql db ltr-backup list \
--location uksouth \
--server $SQL_SERVER \
--database "tenant_acme_production" \
--query "[].{name:name, backupTime:backupTime}" \
-o tableBackup Monitoring and Alerting
Azure Monitor alerts are configured to fire when:
- PITR backup health degrades for any production database
- A database enters an error state
Check backup health via Azure Portal → SQL Server → Manage Backups → PITR tab. All databases should show a green status and a backup time within the last 6 hours.
Checklist (Restore Operation)
- [ ] Incident timeline documented; corruption timestamp identified
- [ ] Restore target name chosen (with timestamp suffix, NOT overwriting production)
- [ ] PITR restore initiated with correct timestamp
- [ ] Restored database validated by querying key tables
- [ ] Tenant connection string updated in admin_db
- [ ] Redis cache cleared for affected tenant(s)
- [ ] Tenant logged in and confirmed correct data state
- [ ] Corrupted database renamed (not deleted) for 14-day hold
- [ ] Incident report filed with root cause