Architecture Decision Records
ADRs document significant architectural decisions and their context.
Agents
| ADR | Decision |
|---|---|
| 001 - Background Agents | Kubernetes-native agent execution with sandbox isolation |
| 002 - OpenHands Agent Sandbox | OpenHands as the agent runtime framework |
| 003 - Context Forge | IBM Context Forge as the MCP gateway |
| 004 - Autonomous Agents | Design for fully autonomous agent workflows |
| 005 - Role-Based MCP Access | Role-based access control for MCP tool servers |
| 006 - OIDC Auth MCP Gateway | OAuth 2.1 / OIDC authentication for remote MCP access |
| 007 - Agent Run Orchestration Service | Dedicated service for dispatching and tracking agent job runs |
| 008 - Cluster Patrol Loop Resilience | Crash recovery and per-sweep supervision for cluster_agents loops |
| 009 - Automated Test Generation Bots | Agent-driven test generation pipeline |
| 010 - Recipe-Driven Agent Registry | Goose recipe YAML as the source of truth for agent definitions |
| 011 - Agent MCP v1 Follow-ons | Deferred self-improvement loop scope after v1 MCP surface shipped |
| 011 - Cloudflare Managed OAuth | Cloudflare-managed OAuth for the MCP gateway (duplicate number) |
| 012 - Knowledge Gardener Model Pipeline | Two-tier model pipeline for the knowledge gardener |
| 013 - Knowledge Gardener Gemma4-Only | Single-model pipeline replacement for the gardener |
| 014 - AX + Substrate Agent Runtime | Split-roles adoption of google/ax + agent-substrate, retiring orchestrator + cluster_agents |
| 015 - Temporal as Orchestration Substrate | Adopt Temporal for workflow execution + scheduling; supersedes ADR 014 |
| 016 - NATS as Canonical Event Stream | NATS JetStream as the system-wide event bus between independently-owned components |
| 017 - Domain Event Schema | Event envelope schema + tombstone semantics across the system |
| 018 - Event-Driven Gardener Triggering | Monolith pushes gardening sessions via remote-trigger run on note edits; drops cron + queue |
| 019 - Substrate Executor + AgentWorkflow over Argo | Thin Substrate executor interface (agent-sandbox warm pool impl #1) under Argo; revisits 015's warm-pool dismissal for caller-blocked dispatch |
| 020 - Deprecate Context Forge | Remove the MCP gateway; serve the monolith's MCP directly (auth stays at the Cloudflare edge). Supersedes 003. Validated plan, deferred execution |
| 021 - Discord-Triggered AgentWorkflow with Fast Hosted Model | Discord bot (qwen gate) as a new AgentWorkflow consumer riding 019's submit path; fast hosted model (Gemini 3.5 Flash) over an OpenAI-compatible seam; snapshot/resume for smooth many-thread work. Draft |
| 022 - Firecracker Snapshot/Restore Controller for AgentWorkflow | Build a thin k8s controller (no turnkey OSS exists); FC-direct, porting E2B's architecture; controller owns idle-detect/snapshot/GC/restore-routing/affinity/reconnect, delegates snapshot to Firecracker. Feasibility derisked (28ms restore) |
| 023 - Egress Secret Proxy for Agent Sandboxes | Guest holds only placeholders; real secrets swapped in at the vsock 1025 egress hop (not eBPF: microVM has its own kernel, Go has no OpenSSL). Per-secret allowlist is the exfil control. v1 = TLS-terminating sidecar in the fc-agentd DaemonSet; CRD/operator deferred. Draft |
| 024 - Discord Agent, Hosted-Model Tiers, and Isolated Live Artifacts | Productive Discord loop on the 022/023 substrate: hosted model via OpenRouter (per-thread knob, key swapped via 023) to beat the 32k Qwen ceiling; two tiers (coding with a gh token vs zero-secret artifact); agent publishes HTML to S3 served in a sandboxed-iframe opaque origin on jomcgi.dev (no subdomain) with ETag hot-reload. Supersedes 021's Argo framing. Draft |
| 025 - Three-Layer Agent Stack (firecracker-substrate, goosecracker, discord-agent) | Name and separate the agent stack into projects/agents/{firecracker-substrate,goosecracker,discord}: firecracker-substrate (bare microVM primitives, goose-agnostic, opaque Exec per 019), goosecracker (generic agent manager owning goose recipes + AgentThread snapshot/resume + a generic config surface; depends on the substrate), and discord (the Discord agent's workload image/source only). goosecracker stays consumer-agnostic; the Discord agent deploys as a generic goosecracker values config and its runtime glue stays in the monolith bot. Re-bins 022/024 components and re-homes 024 Task 1 tier env into the manager layer; no mechanism change. Draft |
| 026 - Hot Git Mirror for goosecracker Agent Workspaces | Hot in-cluster git mirror on node-4 feeding goosecracker agent workspaces: warm base stays repo-agnostic, guests partial-fetch a local always-fresh mirror instead of cold-cloning GitHub on every spin-up; a goosecracker-layer concern per ADR 025, keeping GitHub off the spin-up hot path with no token needed to clone. Draft |
| 027 - Agent GitHub App Roles: Implementer and Reviewer | Split the agent's single gh identity into two GitHub Apps mapped to roles: a claude/*-scoped, no-merge implementer (lower-trust model) and an approve-and-merge reviewer (Opus or better, adversarial spec-vs-impl). Merge gated by a reviewer-controlled agent-review/gate status check since App bots cannot be CODEOWNERS; @jomcgi stays the human owner. Role token injected per-thread via the 023 egress swap. Draft |
| 028 - Elastic Agent-MicroVM Capacity and State-Preserving Reclaim | Run the disposable agent-microVM tier elastically on a shared node: reaffirm FC-direct (sub-100ms golden-template restore; kata-fc has no CRI snapshot); honest burst via dynamically-sized Guaranteed in-place pod resize + a headroom watermark controller (k8s off the hot path, Infeasible resize = spill signal); critical-tenant-safe reclaim via an idle-first victim ladder + graceful drain (checkpoint -> park -> rehydrate) driven by a free-allocatable cushion + node-pinned-Pending watch; and a filesystem-as-durability-contract state model (/repos git branches to a state remote, /data to S3, rest recomputed) checkpointed as a consistent tuple. Drops per-thread VM snapshots; ballast pods + sharding considered and deferred. Draft |
| 030 - fc-invoke, a Single Configurable Surface for Running Workloads in Firecracker | One host daemon that implements ADR 019's stubbed Substrate.Exec seam plus the orchestration around it, absorbing semgrep-scand and fc-agentd's VM lifecycle. HTTP ingress -> HTTP-over-vsock reverse proxy into a guest; workloads are named Helm-values entries (seven generic knobs: image/resources/concurrency/egress/warmBase/sessioned/requestTimeout); warm-base (platform VM snapshot) and state hydration (guest-side shim capabilities) split on the litmus line; all durable state owned by orchestrators (stateless daemon, 2-way door); a shared bazel-baked guest shim (HTTP server + git/object-store capabilities) keeps the rich guest toolkit out of the thin host contract. Home projects/firecracker/{substrate,goosecracker,semgrep}. Supersedes ADR 025 in part. Draft |
| 031 - Control-Plane / Data-Plane Split for the Agent Substrate (cluster + node) | Split the substrate Go code into a cluster/ control plane (ingress, catalog, future placement + fleet/in-place-resize) and a node/ data plane (invoker, fcvm driver, vsockhttp, egress), bracketed by a neutral substrate.NodeExecutor seam so neither plane imports the other. One binary/Deployment now (in-process local executor); the physical split to a central agent plus per-node DaemonSet later is a wiring change behind the seam, not a rewrite. Accepted |
Docs
| ADR | Decision |
|---|---|
| 001 - Static Docs Site | VitePress for architecture documentation (superseded by 002) |
| 002 - Retire Standalone Web Frontends, Docs into Monolith | Delete projects/websites/ + trips/hikes Pages frontends; serve docs from the monolith at jomcgi.dev/docs/* |
Networking
| ADR | Decision |
|---|---|
| 001 - Cloudflare Envoy Gateway | Cloudflare Tunnel + Envoy Gateway for ingress |
Platform
| ADR | Decision |
|---|---|
| 001 - Obsidian Vault Monolith Migration | Migrate Obsidian vault into the monolith on TigerFS |
| 002 - CDN-Cached Data Fetching | Public JSON endpoints cache at the Cloudflare edge; clients poll cached |
003 - CDN Cache Rule Scoped to public.jomcgi.dev | Scope CDN cache rule to public.jomcgi.dev (supersedes 002 partially) |
| 004 - Iceberg-on-SeaweedFS Lakehouse with Hot-Swap Quack Serving | Event-sourced lakehouse; NATS → Iceberg → Quack hot-swap; partially evolves 001 |
| 005 - Per-PR Preview Environments | Ephemeral monolith previews: CoW Postgres clone, muted side effects, ApplicationSet PR generator |
| 006 - Decommission Obsidian via a Postgres Interim | Kill Obsidian now: note body authoritative in Postgres, web UI editor; interim ahead of 004 |
| 007 - SeaweedFS Bucket Provisioning via COSI | Declarative buckets + lifecycle + per-app creds via COSI; replaces create-only weed-shell Jobs |
| 008 - Monolith Module Boundaries | Internal module boundaries within the monolith |
| 009 - Post-Merge Chart Versioning and Kargo Promotion | Bump versions post-merge on main, not on branches; Kargo dev->prod promotion with synthetic gates |
| 010 - Memory Oversubscription via Burstable QoS + PriorityClasses | Reserve steady-state via requests and peaks via limits; PriorityClass hierarchy with agent microVMs as designated OOM victims |
Security
| ADR | Decision |
|---|---|
| 001 - Bazel Semgrep | Semgrep SAST integrated via Bazel rules |
| 002 - Semgrep Rule Generation via RL | RL-finetuned Qwen 3.5 9B for generating Semgrep rules from CVEs |
| 003 - gVisor RuntimeClass | User-space kernel isolation for agent sandbox pods via runsc |
| 004 - Public Read-Only Service Isolation | Separate read-only public service on a replica, isolated from private data and secrets |
| 005 - Public Chat Adversarial Hardening | Defense-in-depth for anonymous GPU-backed chat: Turnstile sessions, reserved-headroom semaphore, server-side limits, DB-confined retrieval |
Services
| ADR | Decision |
|---|---|
| 001 - Discord History Backfill | One-time backfill of Discord channel history into pgvector |
| 002 - Discord Chat Automation | Scheduling, triggers, and proactive posting for the Discord bot |
| 010 - FastMonolith Modular Framework | Privilege-typed, data-isolated domain modules composed into per-tier binaries |
| 011 - Grimoire Hot-Tier Schema | Typed CTI schema + grant-overlay visibility on monolith Postgres, checked out from Loom |
Tooling
| ADR | Decision |
|---|---|
| 001 - OCI Tool Distribution | Multi-arch OCI image for developer tools, eliminating local Bazel |
| 002 - Service Deployment Tooling | Copier template to scaffold new services, eliminating per-service boilerplate |
| 003 - Spec-First CLI and Skills | OpenAPI as source of truth; CLI commands and Claude skills are derived |
| 004 - OCaml Rules for Semgrep | Scale the custom bazel/ocaml ruleset (not obazl); ppx first, per-arch native toolchains |
| 005 - tOyCaml Demonstrator | Engine-shaped demonstrator exercising the ruleset before Semgrep lands |
| 006 - Multi-arch OCaml Toolchains | Data-driven arch registry; per-arch toolchain registration gated on pool verification |
| 007 - OCaml BUILD Generation | Gazelle-based BUILD file generation for OCaml sources |
| 008 - CLI Multi-platform Distribution | One Bazel graph, native execution platforms (cloud arm64, self-hosted darwin); no cross-compilation, QEMU, or wasm |
| 009 - Bazel-native Package Classification | Tag/visibility per-package over central globs and gazelle:exclude; lint the old pattern out |
| 010 - Hermetic Visual Regression | Move public-page screenshot capture/diff into cached Bazel actions on an apko chromium image; non-frontend PRs become cache hits |