blogs/DIstributedAuditLogs
View on GitHub
TypeScript

Distributed Medical IoT Audit Logs

A production-grade, multi-tenant audit logging platform for medical IoT devices. Designed for hospitals running thousands of infusion pumps, patient monitors, ventilators, and smart beds that emit continuous streams of safety-critical events.

  • Scale target: 10,000 devices per hospital, ~10 events/sec/device, ~100,000 events/sec/hospital, multiplied across many tenants.
  • Event taxonomy: ~20 event types (infusion rate change, alarm raised, vitals, heartbeat, admin actions, firmware update, etc.) with 5 common envelope fields.
  • Deployment: Runs locally on k3s for development and demo, portable to AWS/Azure/GCP with zero application code changes via the abstraction layer documented in docs/portability-matrix.md.

High-Level Architecture

See docs/architecture.md for the full control-plane / data-plane breakdown.


Features

  • Canonical envelope with tenantId, hospitalId, deviceId, eventType, eventVersion, eventId (UUIDv7), eventTime, ingestionTime, sequenceNumber, schemaRef, payload, SHA-256 checksum, and criticality (see schemas/envelope.schema.json).
  • Schema Registry (Git-backed, signed commits) with backward/forward compatibility enforcement per event type. See docs/schema-validation.md.
  • 3-stage validation: edge (fast-fail, best-effort), ingestion (authoritative, blocking), consumer (defensive). Final recommendation and performance analysis in the validation doc.
  • Hash-chained integrity: every event carries SHA-256 over canonicalized payload; a per-device sequenceNumber + prevHash chain makes undetected tampering infeasible.
  • 3-tier storage: HOT OpenSearch (fast search, 0-30d), WARM cold nodes (30-90d, force-merged), COLD Parquet on MinIO/S3 with object lock (90d-2y). Details: docs/storage-tiering.md.
  • Multi-tenant isolation with logical-by-default, physical upgrade path for premium tenants. See docs/multi-tenancy.md.
  • Replay & re-validation pipeline for schema evolution and post-incident reprocessing.
  • Full observability stack: Prometheus, Grafana, Loki, Tempo, with SLOs on ingestion p99, consumer lag, DLQ rate, and archive freshness. See docs/observability.md.
  • Security: mTLS for service-to-service, OIDC for humans, per-tenant RBAC, AES-256-at-rest via KMS, WORM archive, ExternalSecrets/Vault. HIPAA control mapping (not a compliance claim) in docs/security.md.
  • Portable across local k3s and all three major clouds with a clean abstraction boundary per docs/portability-matrix.md.
  • Event-driven autoscaling via KEDA: Kafka-lag triggers for consumers (archive-writer, search-indexer, report-service, replay-service), Prometheus RPS + CPU triggers for HTTP services, ScaledJob with scale-to-zero for on-demand investigation bundle generation. Full design in docs/autoscaling.md.
  • REST API with async ingest, paginated query with opaque cursors, device timelines, investigation case builder, and export. See docs/api.md.
  • Investigator UI with provenance badges (hot/warm/cold), saved searches, audit-of-audit-access trail. See docs/ui.md.

Quickstart

Prerequisites

No Docker anywhere. Builds use nerdctl+buildkit; the runtime is containerd (embedded in k3s); Helm charts, JSON schemas, and SBOMs travel as OCI artifacts via helm push, oras push, and cosign. See docs/oci-nerdctl-oras.md.

OSPathNotes
Linux (Ubuntu 22.04+, Fedora 39+)Native k3s + nerdctl + ORAScurl -sfL https://get.k3s.io | sh - — fastest path; nerdctl wired to /run/k3s/containerd/containerd.sock
macOS (13+, Apple Silicon or Intel)Lima + k3s + nerdctl + ORAS (scripted) OR Rancher Desktop with containerd backend (GUI)brew install lima nerdctl helm helmfile kubectl oras jq
Windows 11WSL2 (Ubuntu 22.04) + native k3s + nerdctl + ORAS (scripted) OR Rancher Desktop with containerd backend (GUI)bootstrap.ps1 installs the WSL distro and delegates to bootstrap.sh inside it

Minimum laptop: 8 GB RAM for minimal profile (Kafka + 1 OpenSearch node + 1 MinIO + ingest + validator). Recommended 16 GB RAM for full profile including Grafana/Loki/Tempo and the UI.

Full per-OS walkthrough in docs/local-setup.md.

One-command bootstrap

bash
# Linux / macOS / WSL
./scripts/bootstrap.sh --profile minimal       # or --profile full

# Windows (PowerShell, outside WSL)
./scripts/bootstrap.ps1 -Profile minimal

The bootstrap script will:

  1. Create a k3s cluster (native on Linux, inside a Lima VM on macOS, inside a WSL2 distro on Windows).
  2. Install Helmfile-managed releases: Kafka (Bitnami, KRaft mode), OpenSearch, MinIO, Postgres (metadata), Redis (rate limits), kube-prometheus-stack, Loki, Tempo.
  3. Install the platform Helm chart (charts/audit-platform).
  4. Load demo schemas from schemas/ into the Git-backed registry.
  5. Seed 15 minutes of synthetic events across 3 tenants.
  6. Port-forward the UI to http://localhost:8080 and Grafana to http://localhost:3000.

Teardown

bash
./scripts/teardown.sh

Repository Layout

PathPurpose
README.mdThis file
docs/Architecture, schema, storage, security, ops docs
docs/architecture.mdControl plane + data plane deep dive
docs/schema-validation.mdEnvelope, schema registry, 3-stage validation, performance
docs/kafka-design.mdTopic naming, partitioning, consumer groups, KRaft choice
docs/storage-tiering.mdHOT/WARM/COLD design, ILM, retrieval SLAs
docs/multi-tenancy.mdIsolation model, quotas, noisy neighbor
docs/failure-scenarios.mdFailure mode catalog
docs/observability.mdMetrics, logs, traces, SLOs, alerting
docs/security.mdmTLS, OIDC, RBAC, KMS, tamper detection, HIPAA controls
docs/portability-matrix.mdLocal <-> AWS/Azure/GCP mapping
docs/trade-offs.mdDesign decisions and alternatives
docs/cost.mdCost model, tiering savings, self-host vs managed
docs/api.mdREST API reference
docs/ui.mdUI screens and UX flows
docs/local-setup.mdPer-OS setup, resource sizing
docs/oci-nerdctl-oras.mdOCI-native build + distribution (nerdctl / oras / cosign / helm OCI)
docs/autoscaling.mdKEDA event-driven autoscaling design and scaler inventory
docs/recon-dev-plan.mdRECON Dev Plan — design inputs feeding CSV (FR/NFR, tenancy, UX, sprints, traceability)
docs/roadmap.md7-day MVP + 2-month production plan
schemas/JSON Schema registry (envelope + per-event types)
schemas/envelope.schema.jsonCanonical envelope (Draft 2020-12)
schemas/events/*.jsonPer-event-type schemas
schemas/README.mdVersioning and registry contribution flow
charts/audit-platform/Helm chart (referenced, not in this tree yet)
scripts/bootstrap.sh / bootstrap.ps1One-command cluster + stack bringup

Documentation Index


Status & Scope

This repository currently focuses on design artifacts: architecture, schemas, and documentation sufficient to drive a 7-day MVP and a 2-month hardening plan (see docs/roadmap.md). The Helm charts and service code live alongside this tree and are referenced from the bootstrap scripts.

Nothing in this repo is a claim of formal compliance (HIPAA, HITRUST, SOC 2). The security doc enumerates controls that align with those frameworks to make an external audit tractable.