Skip to content

0005 — Observability model ​

Status: Accepted (tooling TBD; stack fork resolved by ADR 0024 — API compute = Azure Container Apps, DB = Postgres on Azure DB for PostgreSQL Flexible Server)

Date: 2026-06-17

ADO work item: AB#3073

Deciders: Kristopher Turner (platform owner)


Context ​

The platform needs monitoring, alerting, and logging across three surfaces: infrastructure, backend API, and web/mobile clients. The community also requires an application-level audit log (who logged in, from where, how long) for security compliance and for the minister's oversight of a closed community.

The four-layer observability model is locked; the specific tooling for layers 2–3 is coupled to the compute/DB stack fork (ADR 0004) and is decided once that fork closes.

Decision ​

We will implement a four-layer observability model. Layer 1 (audit log) is implemented regardless of stack choice. Layers 2–3 tooling follows the stack: Azure Application Insights + Azure Monitor for Azure-native; OpenTelemetry + Grafana for non-Azure. Layer 4 is the Clerk dashboard (ADR 0003).

Four layers:

  1. App audit/activity logAuditLog table in the platform database. Tracks every security-relevant event: login, logout, session duration, approval actions, content publish. This layer is implemented in Phase 2 regardless of stack.

  2. App telemetry — SDK instrumentation in apps/web, apps/api, and apps/mobile. Captures errors, performance traces, and custom events.

    • Azure-native stack: Azure Application Insights (~5 GB/month free).
    • Non-Azure stack: OpenTelemetry + Grafana Cloud free tier + Sentry for error tracking.
  3. Infrastructure monitoring + alerts — resource health, availability, cost alerts.

    • Azure-native stack: Azure Monitor + Log Analytics + action groups (~5 GB/month free).
    • Non-Azure stack: provider-native dashboards + OpenTelemetry collector.
  4. Auth-provider logsClerk dashboard (session events, sign-in attempts, suspicious activity). ADR 0003.

Alternatives considered ​

OptionProsConsWhy not chosen
Azure App Insights + Monitor (chosen for Azure path)Native Azure integration; free tier; no extra setupCoupled to Azure stack; not portableChosen if ADR 0004 → Azure-native
OpenTelemetry + Grafana (chosen for non-Azure path)Stack-agnostic; OSS; vendor-neutralOps overhead; Grafana Cloud free tier limitsChosen if ADR 0004 → Supabase/non-Azure
Sentry aloneStrong error tracking + session replayNot a full observability solution; cost at scaleUsed as a complement, not replacement
DatadogBest-in-class unified observabilityExpensive; not justified at ~200 membersRevisit if community grows significantly
No structured observabilityNothing to buildUnacceptable — audit log is a security and compliance requirement for a closed community with minorsHard requirement

Consequences ​

Positive ​

  • The AuditLog table (Layer 1) is independent of tooling choice — it ships in Phase 2 regardless.
  • The four-layer model is budget-friendly: free-tier telemetry and monitoring are viable at 200 members.
  • Minister/admin can audit who accessed the platform and when — required for a closed community with parent-managed child accounts.

Negative / trade-offs ​

  • Layers 2–3 tooling cannot be finalized until ADR 0004 (stack fork) closes. Do not instrument before that.
  • If the stack changes post-Phase 1, telemetry SDK calls must be updated throughout apps/api, apps/web, and apps/mobile. Mitigation: wrap the telemetry calls behind a thin packages/shared-utils/telemetry facade so the SDK is swappable.

Risks ​

  • Free-tier data retention — Application Insights free tier retains data for 90 days; Log Analytics free tier retains for 7 days. Document retention limits and set up export or archival if compliance requires longer retention.
  • Mobile telemetry — Application Insights does not have an official React Native SDK; Sentry or OpenTelemetry React Native is required for apps/mobile regardless of stack choice.

References ​

Heritage Community Hub — Internal. Access restricted via Cloudflare Access + Entra ID.