Distributed Medical IoT Audit Logs
A production-grade, multi-tenant audit logging platform for medical IoT devices. Designed for hospitals running thousands of infusion pumps, patient monitors, ventilators, and smart beds that emit continuous streams of safety-critical events.
- Scale target: 10,000 devices per hospital, ~10 events/sec/device, ~100,000 events/sec/hospital, multiplied across many tenants.
- Event taxonomy: ~20 event types (infusion rate change, alarm raised, vitals, heartbeat, admin actions, firmware update, etc.) with 5 common envelope fields.
- Deployment: Runs locally on k3s for development and demo, portable to AWS/Azure/GCP with zero application code changes via the abstraction layer documented in
docs/portability-matrix.md.
High-Level Architecture
See docs/architecture.md for the full control-plane / data-plane breakdown.
Features
- Canonical envelope with tenantId, hospitalId, deviceId, eventType, eventVersion, eventId (UUIDv7), eventTime, ingestionTime, sequenceNumber, schemaRef, payload, SHA-256 checksum, and criticality (see
schemas/envelope.schema.json). - Schema Registry (Git-backed, signed commits) with backward/forward compatibility enforcement per event type. See
docs/schema-validation.md. - 3-stage validation: edge (fast-fail, best-effort), ingestion (authoritative, blocking), consumer (defensive). Final recommendation and performance analysis in the validation doc.
- Hash-chained integrity: every event carries SHA-256 over canonicalized payload; a per-device sequenceNumber + prevHash chain makes undetected tampering infeasible.
- 3-tier storage: HOT OpenSearch (fast search, 0-30d), WARM cold nodes (30-90d, force-merged), COLD Parquet on MinIO/S3 with object lock (90d-2y). Details:
docs/storage-tiering.md. - Multi-tenant isolation with logical-by-default, physical upgrade path for premium tenants. See
docs/multi-tenancy.md. - Replay & re-validation pipeline for schema evolution and post-incident reprocessing.
- Full observability stack: Prometheus, Grafana, Loki, Tempo, with SLOs on ingestion p99, consumer lag, DLQ rate, and archive freshness. See
docs/observability.md. - Security: mTLS for service-to-service, OIDC for humans, per-tenant RBAC, AES-256-at-rest via KMS, WORM archive, ExternalSecrets/Vault. HIPAA control mapping (not a compliance claim) in
docs/security.md. - Portable across local k3s and all three major clouds with a clean abstraction boundary per
docs/portability-matrix.md. - Event-driven autoscaling via KEDA: Kafka-lag triggers for consumers (archive-writer, search-indexer, report-service, replay-service), Prometheus RPS + CPU triggers for HTTP services,
ScaledJobwith scale-to-zero for on-demand investigation bundle generation. Full design indocs/autoscaling.md. - REST API with async ingest, paginated query with opaque cursors, device timelines, investigation case builder, and export. See
docs/api.md. - Investigator UI with provenance badges (hot/warm/cold), saved searches, audit-of-audit-access trail. See
docs/ui.md.
Quickstart
Prerequisites
No Docker anywhere. Builds use
nerdctl+buildkit; the runtime iscontainerd(embedded in k3s); Helm charts, JSON schemas, and SBOMs travel as OCI artifacts viahelm push,oras push, andcosign. Seedocs/oci-nerdctl-oras.md.
| OS | Path | Notes |
|---|---|---|
| Linux (Ubuntu 22.04+, Fedora 39+) | Native k3s + nerdctl + ORAS | curl -sfL https://get.k3s.io | sh - — fastest path; nerdctl wired to /run/k3s/containerd/containerd.sock |
| macOS (13+, Apple Silicon or Intel) | Lima + k3s + nerdctl + ORAS (scripted) OR Rancher Desktop with containerd backend (GUI) | brew install lima nerdctl helm helmfile kubectl oras jq |
| Windows 11 | WSL2 (Ubuntu 22.04) + native k3s + nerdctl + ORAS (scripted) OR Rancher Desktop with containerd backend (GUI) | bootstrap.ps1 installs the WSL distro and delegates to bootstrap.sh inside it |
Minimum laptop: 8 GB RAM for minimal profile (Kafka + 1 OpenSearch node + 1 MinIO + ingest + validator). Recommended 16 GB RAM for full profile including Grafana/Loki/Tempo and the UI.
Full per-OS walkthrough in docs/local-setup.md.
One-command bootstrap
# Linux / macOS / WSL
./scripts/bootstrap.sh --profile minimal # or --profile full
# Windows (PowerShell, outside WSL)
./scripts/bootstrap.ps1 -Profile minimalThe bootstrap script will:
- Create a k3s cluster (native on Linux, inside a Lima VM on macOS, inside a WSL2 distro on Windows).
- Install Helmfile-managed releases: Kafka (Bitnami, KRaft mode), OpenSearch, MinIO, Postgres (metadata), Redis (rate limits), kube-prometheus-stack, Loki, Tempo.
- Install the platform Helm chart (
charts/audit-platform). - Load demo schemas from
schemas/into the Git-backed registry. - Seed 15 minutes of synthetic events across 3 tenants.
- Port-forward the UI to
http://localhost:8080and Grafana tohttp://localhost:3000.
Teardown
./scripts/teardown.shRepository Layout
| Path | Purpose |
|---|---|
README.md | This file |
docs/ | Architecture, schema, storage, security, ops docs |
docs/architecture.md | Control plane + data plane deep dive |
docs/schema-validation.md | Envelope, schema registry, 3-stage validation, performance |
docs/kafka-design.md | Topic naming, partitioning, consumer groups, KRaft choice |
docs/storage-tiering.md | HOT/WARM/COLD design, ILM, retrieval SLAs |
docs/multi-tenancy.md | Isolation model, quotas, noisy neighbor |
docs/failure-scenarios.md | Failure mode catalog |
docs/observability.md | Metrics, logs, traces, SLOs, alerting |
docs/security.md | mTLS, OIDC, RBAC, KMS, tamper detection, HIPAA controls |
docs/portability-matrix.md | Local <-> AWS/Azure/GCP mapping |
docs/trade-offs.md | Design decisions and alternatives |
docs/cost.md | Cost model, tiering savings, self-host vs managed |
docs/api.md | REST API reference |
docs/ui.md | UI screens and UX flows |
docs/local-setup.md | Per-OS setup, resource sizing |
docs/oci-nerdctl-oras.md | OCI-native build + distribution (nerdctl / oras / cosign / helm OCI) |
docs/autoscaling.md | KEDA event-driven autoscaling design and scaler inventory |
docs/recon-dev-plan.md | RECON Dev Plan — design inputs feeding CSV (FR/NFR, tenancy, UX, sprints, traceability) |
docs/roadmap.md | 7-day MVP + 2-month production plan |
schemas/ | JSON Schema registry (envelope + per-event types) |
schemas/envelope.schema.json | Canonical envelope (Draft 2020-12) |
schemas/events/*.json | Per-event-type schemas |
schemas/README.md | Versioning and registry contribution flow |
charts/audit-platform/ | Helm chart (referenced, not in this tree yet) |
scripts/bootstrap.sh / bootstrap.ps1 | One-command cluster + stack bringup |
Documentation Index
- Architecture
- Schema & Validation
- Kafka Design
- Storage Tiering
- Multi-Tenancy
- Failure Scenarios
- Observability
- Security
- Portability Matrix
- Trade-offs
- Cost
- API Reference
- UI
- Local Setup
- OCI / nerdctl / ORAS Workflow
- Roadmap
Status & Scope
This repository currently focuses on design artifacts: architecture, schemas, and documentation sufficient to drive a 7-day MVP and a 2-month hardening plan (see docs/roadmap.md). The Helm charts and service code live alongside this tree and are referenced from the bootstrap scripts.
Nothing in this repo is a claim of formal compliance (HIPAA, HITRUST, SOC 2). The security doc enumerates controls that align with those frameworks to make an external audit tractable.