Skip to content

Observability Technology Reference

Detailed reference for every observability tool used in the Microtec ERP platform: logging, tracing, metrics, and health checks.


Three-Pillar Model

Microtec ERP implements the three pillars of observability — Logs, Traces, and Metrics — using OpenTelemetry as the unifying collection layer.


Logs

Serilog

Version: 3.x
Role: Structured logging framework — all backend microservices log via Serilog
Configuration: Centralised in Microtec.Web.Hosting NuGet package — no per-service Serilog setup required

Sinks configured by environment:

EnvironmentSinks
Local devConsole (coloured), Seq via OTLP
Cloud (all envs)Application Insights (via Serilog.Sinks.ApplicationInsights)

Log enrichers applied globally:

EnricherData added
WithCorrelationIdX-Correlation-ID request header value
WithTenantIdCurrent tenant from ITenantContextManager
WithUserIdAuthenticated user sub from JWT
WithServiceNameContainer App name from environment variable
WithEnvironmentNamedev / stage / preprod / uat / production
WithMachineNameReplica host name (useful in multi-replica debugging)

Structured logging conventions:

csharp
// CORRECT — structured, properties are searchable
Log.Information(
    "Invoice {InvoiceId} created for tenant {TenantId} in {ElapsedMs}ms",
    invoice.Id, tenantId, stopwatch.ElapsedMilliseconds);

// WRONG — string interpolation loses structure, not searchable in Seq
Log.Information($"Invoice {invoice.Id} created"); // Never do this

// CORRECT — exception with context
Log.Error(ex, "Failed to submit ZATCA invoice {InvoiceId}", invoiceId);

Minimum log levels by environment:

EnvironmentMinimum LevelMicrosoft/System
devDebugWarning
stageInformationWarning
preprod/uatInformationError
productionWarningError

Log Aggregation (Development)

Seq

Version: 2024.x
Role: Structured log viewer for local development and stage environment
Protocol: OTLP over HTTP (Serilog OTLP sink → Seq ingestion)

Ports:

PortPurpose
1234Seq web UI + ingestion (local dev via Docker)
80/443Seq (stage — cloud-hosted or eg-sv-ai)

Local dev access: http://localhost:1234
Stage access: Contact the platform team for the stage Seq URL.

Docker Compose service (from dev/docker-compose.yml):

yaml
seq:
  image: datalust/seq:latest
  environment:
    ACCEPT_EULA: "Y"
  ports:
    - "1234:80"
    - "5341:5341"
  volumes:
    - seq-data:/data

Useful Seq queries:

# All errors in the last hour
@Level = 'Error' and @Timestamp > Now() - 1h

# All log events for a specific tenant
TenantId = '00000000-0000-0000-0000-000000000001'

# Slow requests (> 500 ms)
ElapsedMs > 500 and SourceContext like 'PerformanceBehavior%'

# Specific correlation ID (trace a request across services)
CorrelationId = 'abc-123-xyz'

Correlation ID tracing

Every request from the Angular frontend includes an X-Correlation-ID header. All Serilog log events are enriched with this value. Search by CorrelationId in Seq to reconstruct the full request journey across services.


Distributed Tracing & Metrics

OpenTelemetry .NET SDK

Version: 1.x (OpenTelemetry.* packages)
Role: Distributed tracing and metrics collection — auto-instruments all major libraries
Configuration: Centralised in Microtec.Web.Hosting — no per-service OTel setup required

Auto-instrumented libraries:

LibrarySignalWhat is traced
ASP.NET CoreTraces + MetricsIncoming HTTP requests, response codes, duration
EF CoreTracesDatabase queries (SQL text in dev only, redacted in prod)
HttpClientTraces + MetricsOutbound HTTP calls, status codes, duration
Azure Service Bus (MassTransit)TracesMessage publish/consume with message IDs
StackExchange.RedisTracesCache commands, keys, duration
HangfireTracesBackground job execution

Custom spans:

csharp
// Add custom span to an existing trace
using var activity = ActivitySource.StartActivity("ProcessZatcaSubmission");
activity?.SetTag("invoice.id", invoiceId);
activity?.SetTag("tenant.id", tenantId);

try
{
    var result = await zatcaClient.SubmitAsync(invoice);
    activity?.SetStatus(ActivityStatusCode.Ok);
}
catch (Exception ex)
{
    activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
    throw;
}

OTLP exporter configuration:

EnvironmentOTLP endpointProtocol
Local devhttp://localhost:4318HTTP/protobuf
Cloud (all)Azure Monitor (built-in endpoint)Azure Monitor exporter

Local dev OTel stack (Docker Compose):

ServicePortPurpose
OTel Collector4317 (gRPC), 4318 (HTTP)Collector pipeline
Jaeger UI16686Trace viewer
Prometheus9090Metrics scrape + query

APM (Cloud)

Application Insights

Version: Azure Monitor workspace-based (2024 schema)
Role: Cloud APM — request traces, exception tracking, custom metrics, availability tests, Live Metrics
Integration: OpenTelemetry → Azure.Monitor.OpenTelemetry.AspNetCore package
Instrumentation key: Per-environment, stored in Key Vault → ApplicationInsights--ConnectionString

Key views used:

Application Insights viewWhat it shows
Transaction searchEnd-to-end trace for a single request
Application mapLive dependency graph — services and their error rates
FailuresGrouped exception occurrences with stack traces
Performancep50/p95/p99 duration by operation name
Live MetricsReal-time request rate, failure rate, server count
AvailabilitySynthetic health probe results per environment

Log Analytics workspace (shared per environment):

kusto
-- Find all 500 errors in the last 24 hours
requests
| where timestamp > ago(24h)
| where resultCode == "500"
| project timestamp, name, url, duration, cloud_RoleName
| order by timestamp desc

Sampling: Adaptive sampling enabled in production (targets 5 traces/second per service). All failed requests are always captured regardless of sampling rate.


Health Checks

ASP.NET Core Health Checks

Package: Microsoft.Extensions.Diagnostics.HealthChecks
Role: Liveness and readiness probes for every microservice — consumed by ACA and AFD

Endpoints exposed on every service:

EndpointPurposeChecks included
/health/liveLiveness — is the process alive?None (always 200 if process running)
/health/readyReadiness — can the service accept traffic?DB connectivity, Redis, ASB
/healthAggregated (Gateway)Polls all downstream services

Checks registered:

csharp
builder.Services.AddHealthChecks()
    .AddSqlServer(connectionString, name: "sql", tags: ["ready"])
    .AddRedis(redisConnection, name: "redis", tags: ["ready"])
    .AddAzureServiceBusTopic(sbConnection, topicName, name: "asb", tags: ["ready"])
    .AddCheck<KeyVaultHealthCheck>("keyvault", tags: ["ready"]);

// Expose endpoints
app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = _ => false  // No checks — liveness is process-alive only
});

app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
    ResponseWriter = UIResponseWriter.WriteHealthCheckUIResponse
});

ACA probe configuration (services-config.json):

json
{
  "probes": {
    "liveness":  { "path": "/health/live",  "initialDelaySeconds": 10, "periodSeconds": 30 },
    "readiness": { "path": "/health/ready", "initialDelaySeconds": 15, "periodSeconds": 15 },
    "startup":   { "path": "/health/live",  "initialDelaySeconds": 5,  "failureThreshold": 10 }
  }
}

Gateway health aggregation: Gateway.API exposes a combined /health endpoint that fans out to all private CAE services, aggregates responses, and returns a single health status. Azure Front Door's health probe targets this endpoint.


Alerting

Azure Monitor Alerts

Channels: Email to on-call team + Microsoft Teams webhook
Alert rules (production):

Alert nameMetricThresholdSeverity
High error rateHTTP 5xx / total requests> 1% for 5 minSev 1
Slow P99 latencyRequest duration p99> 3 s for 10 minSev 2
Scale limit hitReplica count = maxSustained 5 minSev 2
SQL DTU highDTU utilisation> 85% for 5 minSev 2
Redis evictionsCache eviction count> 0Sev 3
Service Bus DLQDead-letter count> 10Sev 2
Health probe failAvailability %< 99%Sev 1

Correlation: End-to-End Request Trace

A complete request from browser to database is traceable using a single CorrelationId:

  1. Angular frontend sets X-Correlation-ID: {uuid} on every HTTP request
  2. Gateway.API propagates the header downstream via HttpContext.Request.Headers
  3. All Serilog log events include CorrelationId enricher
  4. OpenTelemetry trace context (traceparent) is propagated via W3C headers
  5. Azure Service Bus messages include CorrelationId as message property
  6. Application Insights / Seq allows filtering by CorrelationId or TraceId
Browser → AFD → Gateway (logs + trace: abc-123) →
  AppsPortal (logs + trace: abc-123) →
    ASB message (CorrelationId: abc-123) →
      Notification.Apis (logs: abc-123) → Email sent

Local Observability Stack (Docker Compose)

Full local observability from dev/docker-compose.yml:

ServiceImagePortPurpose
Seqdatalust/seq:latest1234Structured log viewer
OTel Collectorotel/opentelemetry-collector-contrib4317/4318Trace/metric collector
Jaegerjaegertracing/all-in-one16686Distributed trace viewer
Prometheusprom/prometheus9090Metrics storage + query
Grafanagrafana/grafana3000Metrics dashboards

Start with: docker-compose -f dev/docker-compose.yml up seq otel-collector jaeger prometheus grafana


Internal Documentation — Microtec Platform Team