Appearance
Observability Technology Reference
Detailed reference for every observability tool used in the Microtec ERP platform: logging, tracing, metrics, and health checks.
Three-Pillar Model
Microtec ERP implements the three pillars of observability — Logs, Traces, and Metrics — using OpenTelemetry as the unifying collection layer.
Logs
Serilog
Version: 3.x
Role: Structured logging framework — all backend microservices log via Serilog
Configuration: Centralised in Microtec.Web.Hosting NuGet package — no per-service Serilog setup required
Sinks configured by environment:
| Environment | Sinks |
|---|---|
| Local dev | Console (coloured), Seq via OTLP |
| Cloud (all envs) | Application Insights (via Serilog.Sinks.ApplicationInsights) |
Log enrichers applied globally:
| Enricher | Data added |
|---|---|
WithCorrelationId | X-Correlation-ID request header value |
WithTenantId | Current tenant from ITenantContextManager |
WithUserId | Authenticated user sub from JWT |
WithServiceName | Container App name from environment variable |
WithEnvironmentName | dev / stage / preprod / uat / production |
WithMachineName | Replica host name (useful in multi-replica debugging) |
Structured logging conventions:
csharp
// CORRECT — structured, properties are searchable
Log.Information(
"Invoice {InvoiceId} created for tenant {TenantId} in {ElapsedMs}ms",
invoice.Id, tenantId, stopwatch.ElapsedMilliseconds);
// WRONG — string interpolation loses structure, not searchable in Seq
Log.Information($"Invoice {invoice.Id} created"); // Never do this
// CORRECT — exception with context
Log.Error(ex, "Failed to submit ZATCA invoice {InvoiceId}", invoiceId);Minimum log levels by environment:
| Environment | Minimum Level | Microsoft/System |
|---|---|---|
| dev | Debug | Warning |
| stage | Information | Warning |
| preprod/uat | Information | Error |
| production | Warning | Error |
Log Aggregation (Development)
Seq
Version: 2024.x
Role: Structured log viewer for local development and stage environment
Protocol: OTLP over HTTP (Serilog OTLP sink → Seq ingestion)
Ports:
| Port | Purpose |
|---|---|
| 1234 | Seq web UI + ingestion (local dev via Docker) |
| 80/443 | Seq (stage — cloud-hosted or eg-sv-ai) |
Local dev access: http://localhost:1234
Stage access: Contact the platform team for the stage Seq URL.
Docker Compose service (from dev/docker-compose.yml):
yaml
seq:
image: datalust/seq:latest
environment:
ACCEPT_EULA: "Y"
ports:
- "1234:80"
- "5341:5341"
volumes:
- seq-data:/dataUseful Seq queries:
# All errors in the last hour
@Level = 'Error' and @Timestamp > Now() - 1h
# All log events for a specific tenant
TenantId = '00000000-0000-0000-0000-000000000001'
# Slow requests (> 500 ms)
ElapsedMs > 500 and SourceContext like 'PerformanceBehavior%'
# Specific correlation ID (trace a request across services)
CorrelationId = 'abc-123-xyz'Correlation ID tracing
Every request from the Angular frontend includes an X-Correlation-ID header. All Serilog log events are enriched with this value. Search by CorrelationId in Seq to reconstruct the full request journey across services.
Distributed Tracing & Metrics
OpenTelemetry .NET SDK
Version: 1.x (OpenTelemetry.* packages)
Role: Distributed tracing and metrics collection — auto-instruments all major libraries
Configuration: Centralised in Microtec.Web.Hosting — no per-service OTel setup required
Auto-instrumented libraries:
| Library | Signal | What is traced |
|---|---|---|
| ASP.NET Core | Traces + Metrics | Incoming HTTP requests, response codes, duration |
| EF Core | Traces | Database queries (SQL text in dev only, redacted in prod) |
| HttpClient | Traces + Metrics | Outbound HTTP calls, status codes, duration |
| Azure Service Bus (MassTransit) | Traces | Message publish/consume with message IDs |
| StackExchange.Redis | Traces | Cache commands, keys, duration |
| Hangfire | Traces | Background job execution |
Custom spans:
csharp
// Add custom span to an existing trace
using var activity = ActivitySource.StartActivity("ProcessZatcaSubmission");
activity?.SetTag("invoice.id", invoiceId);
activity?.SetTag("tenant.id", tenantId);
try
{
var result = await zatcaClient.SubmitAsync(invoice);
activity?.SetStatus(ActivityStatusCode.Ok);
}
catch (Exception ex)
{
activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
throw;
}OTLP exporter configuration:
| Environment | OTLP endpoint | Protocol |
|---|---|---|
| Local dev | http://localhost:4318 | HTTP/protobuf |
| Cloud (all) | Azure Monitor (built-in endpoint) | Azure Monitor exporter |
Local dev OTel stack (Docker Compose):
| Service | Port | Purpose |
|---|---|---|
| OTel Collector | 4317 (gRPC), 4318 (HTTP) | Collector pipeline |
| Jaeger UI | 16686 | Trace viewer |
| Prometheus | 9090 | Metrics scrape + query |
APM (Cloud)
Application Insights
Version: Azure Monitor workspace-based (2024 schema)
Role: Cloud APM — request traces, exception tracking, custom metrics, availability tests, Live Metrics
Integration: OpenTelemetry → Azure.Monitor.OpenTelemetry.AspNetCore package
Instrumentation key: Per-environment, stored in Key Vault → ApplicationInsights--ConnectionString
Key views used:
| Application Insights view | What it shows |
|---|---|
| Transaction search | End-to-end trace for a single request |
| Application map | Live dependency graph — services and their error rates |
| Failures | Grouped exception occurrences with stack traces |
| Performance | p50/p95/p99 duration by operation name |
| Live Metrics | Real-time request rate, failure rate, server count |
| Availability | Synthetic health probe results per environment |
Log Analytics workspace (shared per environment):
kusto
-- Find all 500 errors in the last 24 hours
requests
| where timestamp > ago(24h)
| where resultCode == "500"
| project timestamp, name, url, duration, cloud_RoleName
| order by timestamp descSampling: Adaptive sampling enabled in production (targets 5 traces/second per service). All failed requests are always captured regardless of sampling rate.
Health Checks
ASP.NET Core Health Checks
Package: Microsoft.Extensions.Diagnostics.HealthChecks
Role: Liveness and readiness probes for every microservice — consumed by ACA and AFD
Endpoints exposed on every service:
| Endpoint | Purpose | Checks included |
|---|---|---|
/health/live | Liveness — is the process alive? | None (always 200 if process running) |
/health/ready | Readiness — can the service accept traffic? | DB connectivity, Redis, ASB |
/health | Aggregated (Gateway) | Polls all downstream services |
Checks registered:
csharp
builder.Services.AddHealthChecks()
.AddSqlServer(connectionString, name: "sql", tags: ["ready"])
.AddRedis(redisConnection, name: "redis", tags: ["ready"])
.AddAzureServiceBusTopic(sbConnection, topicName, name: "asb", tags: ["ready"])
.AddCheck<KeyVaultHealthCheck>("keyvault", tags: ["ready"]);
// Expose endpoints
app.MapHealthChecks("/health/live", new HealthCheckOptions
{
Predicate = _ => false // No checks — liveness is process-alive only
});
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
Predicate = check => check.Tags.Contains("ready"),
ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});ACA probe configuration (services-config.json):
json
{
"probes": {
"liveness": { "path": "/health/live", "initialDelaySeconds": 10, "periodSeconds": 30 },
"readiness": { "path": "/health/ready", "initialDelaySeconds": 15, "periodSeconds": 15 },
"startup": { "path": "/health/live", "initialDelaySeconds": 5, "failureThreshold": 10 }
}
}Gateway health aggregation: Gateway.API exposes a combined /health endpoint that fans out to all private CAE services, aggregates responses, and returns a single health status. Azure Front Door's health probe targets this endpoint.
Alerting
Azure Monitor Alerts
Channels: Email to on-call team + Microsoft Teams webhook
Alert rules (production):
| Alert name | Metric | Threshold | Severity |
|---|---|---|---|
| High error rate | HTTP 5xx / total requests | > 1% for 5 min | Sev 1 |
| Slow P99 latency | Request duration p99 | > 3 s for 10 min | Sev 2 |
| Scale limit hit | Replica count = max | Sustained 5 min | Sev 2 |
| SQL DTU high | DTU utilisation | > 85% for 5 min | Sev 2 |
| Redis evictions | Cache eviction count | > 0 | Sev 3 |
| Service Bus DLQ | Dead-letter count | > 10 | Sev 2 |
| Health probe fail | Availability % | < 99% | Sev 1 |
Correlation: End-to-End Request Trace
A complete request from browser to database is traceable using a single CorrelationId:
- Angular frontend sets
X-Correlation-ID: {uuid}on every HTTP request - Gateway.API propagates the header downstream via
HttpContext.Request.Headers - All Serilog log events include
CorrelationIdenricher - OpenTelemetry trace context (
traceparent) is propagated via W3C headers - Azure Service Bus messages include
CorrelationIdas message property - Application Insights / Seq allows filtering by
CorrelationIdorTraceId
Browser → AFD → Gateway (logs + trace: abc-123) →
AppsPortal (logs + trace: abc-123) →
ASB message (CorrelationId: abc-123) →
Notification.Apis (logs: abc-123) → Email sentLocal Observability Stack (Docker Compose)
Full local observability from dev/docker-compose.yml:
| Service | Image | Port | Purpose |
|---|---|---|---|
| Seq | datalust/seq:latest | 1234 | Structured log viewer |
| OTel Collector | otel/opentelemetry-collector-contrib | 4317/4318 | Trace/metric collector |
| Jaeger | jaegertracing/all-in-one | 16686 | Distributed trace viewer |
| Prometheus | prom/prometheus | 9090 | Metrics storage + query |
| Grafana | grafana/grafana | 3000 | Metrics dashboards |
Start with: docker-compose -f dev/docker-compose.yml up seq otel-collector jaeger prometheus grafana