ADR 001: OCI-Based Tool Distribution
Author: Joe McGinley Status: Draft Created: 2026-03-07
Problem
The homelab repo uses Bazel for everything — builds, tests, formatting, image pushing, and developer tool management. The bazel_env rule in tools/BUILD builds ~20 tools (helm, crane, go, node, pnpm, buildifier, etc.) and symlinks them into a bin/ directory that .envrc adds to $PATH. This creates three problems:
Bazel is required for basic development — Even viewing Helm templates or running
formatrequires a working Bazel installation, toolchain download, and repository rule resolution. On an M-series Mac,bazel run //tools:bazel_envtakes ~45 seconds on a warm cache and minutes on a cold one.No tool parity between environments — Local development (macOS/aarch64), CI (Linux/x86_64), and Goose agent sandboxes (Linux/x86_64) each resolve tool versions independently. The Goose agent image (
charts/goose-agent/image/apko.yaml) packages its own copies ofgo,node,pnpm, etc. — there's no guarantee these match whatbazel_envprovides locally.Local Bazel execution is inefficient — Running
bazel test //...orbazel build //...locally downloads the full Bazel toolchain, resolves all external dependencies, and executes on a single machine. BuildBuddy remote execution is faster (parallelism, shared caching, beefy runners) and already runs on every PR and push to main viabuildbuddy.yaml.
Proposal
Eliminate local Bazel entirely. Distribute developer tools as a multi-arch OCI image built in CI and pulled locally via crane export. All builds, tests, and formatting run remotely via BuildBuddy — triggered by pushing code rather than executing locally.
Before and After
| Aspect | Today | Proposed |
|---|---|---|
| Developer tool setup | bazel run //tools:bazel_env (~45s warm) | crane export + extract (~5s) |
| Tool versions | Resolved independently per environment | Single multi-arch OCI image, identical everywhere |
| Running tests | bazel test //... (local execution) | Push → BuildBuddy remote execution → MCP to observe results |
| Formatting | bazel run //bazel/tools/format:fast_format (local) | Push → CI format job → auto-commit fixes back |
| Build graph queries | bazel query (local) | bb query (remote via BuildBuddy) or BuildBuddy MCP tools |
| Goose tool availability | Separate apko image with its own tool versions | Same OCI tools image, shared versions |
| Claude Code version | Installed independently per machine | Pinned in tools image, identical across all environments |
| Bazel on dev machine | Required | Not required |
Architecture
OCI Tools Image
A single multi-arch (x86_64 + aarch64) OCI image containing all developer tools. Wolfi packages install to standard paths (/usr/bin/, /usr/lib/), and the full image filesystem is extracted locally — no symlink indirection layer.
ghcr.io/jomcgi/homelab/bazel/tools/image:latest
/usr/bin/
├── helm # Helm CLI (from multitool)
├── crane # OCI image tool (from multitool)
├── kind # Local K8s clusters (from multitool)
├── argocd # ArgoCD CLI (from multitool)
├── buildifier # Starlark formatter (from multitool)
├── buildozer # BUILD file editor (from multitool)
├── op # 1Password CLI (from multitool)
├── go # Go toolchain
├── node # Node.js runtime
├── pnpm # Node package manager
├── python3 # Python runtime (+ stdlib in /usr/lib/python3.x/)
├── prettier # Code formatter (symlink → /usr/local/lib/node_modules/prettier/)
├── bb # BuildBuddy CLI (for remote query/execution)
├── scaffold # Scaffold code generator
├── copier # Project templating
└── claude # Claude Code CLI (via npm, @anthropic-ai/claude-code)Built via apko (consistent with all other images in the repo), pushed to GHCR on every merge to main.
Why full extraction instead of a symlink layer: Earlier designs created a /tools/bin/ directory with symlinks to /usr/bin/ and extracted only that subtree. This broke on macOS — the symlinks resolved to the host's /usr/bin/ (e.g., Xcode shims) instead of the image's binaries. Extracting the full filesystem avoids this entirely. Tools like Python also depend on their stdlib (/usr/lib/python3.x/), which only works when the full image is present.
Claude Code is included as a first-class tool — it's installed via pnpm add -g @anthropic-ai/claude-code during the image build. This gives:
- Pinned versions across local dev, CI, and in-cluster agents
- Goose replacement path — in-cluster agents could run Claude Code directly instead of Goose, using the same skills, hooks, and CLAUDE.md from the repo
- Consistent MCP configuration — the
.mcp.jsonand.claude/configs ship with the repo, and the CLI version matches what was tested against them
Tool Pull Mechanism
Local development is macOS-only, so the bootstrap assumes crane is available (installable via brew install crane). crane export extracts a filesystem tarball from an OCI image without needing a container runtime — no Docker, no Podman, no platform mapping.
Local .envrc replaces the bazel_env integration:
TOOLS_DIR="$PWD/.tools"
if [[ ! -d "$TOOLS_DIR/usr/bin" ]]; then
log_error "Run './bootstrap.sh' to install dev tools"
else
PATH_add "$TOOLS_DIR/usr/bin"
fibootstrap.sh extracts the full image filesystem into .tools/:
crane export "$TOOLS_IMAGE" - | tar -xf - -C "$TOOLS_DIR"No --strip-components, no path filtering. The image's /usr/bin/go becomes .tools/usr/bin/go, and the full dependency tree (stdlib, shared libs) is preserved at relative paths.
The .tools/ directory is gitignored. Tools are refreshed daily or on demand.
Bootstrap: Run ./bootstrap.sh on first clone. The script is macOS-only — it checks for Homebrew, installs crane if missing, and pulls the tools image. After that, direnv allow handles automatic refreshes via .envrc.
Agent Integration
The tools image includes Claude Code, making it a viable base for in-cluster agents. Two integration paths:
Option A — Tools image as agent base: The homelab-tools image already contains node, pnpm, go, git, gh, and claude. An in-cluster agent pod mounts the repo, injects a CLAUDE_AUTH_TOKEN via OnePasswordItem, and runs claude directly. This could replace Goose entirely — Claude Code natively understands the repo's skills, hooks, and CLAUDE.md.
Option B — Shared tools, separate agent image: Keep a separate agent image but copy tools from homelab-tools to guarantee version parity. Agent-specific packages (e.g., Goose runtime) are added on top.
Option A is preferred — it eliminates the Goose ↔ LiteLLM ↔ Claude indirection and lets in-cluster agents use the exact same tool that local development uses.
Remote Execution Workflow
All Bazel operations execute remotely. The developer workflow becomes:
Edit code
│
▼
git push + open/update PR
│
├──▶ BuildBuddy CI triggers on PR (or push to main)
│ ├── Format check (formatters + gazelle)
│ ├── Test suite (bazel test //...)
│ └── Image push (main branch only)
│
▼
Observe results via:
├── BuildBuddy MCP tools (buildbuddy-mcp-get-invocation, get-log, get-target)
├── GitHub PR checks (gh pr checks)
└── BuildBuddy web UI (links in BES output)Remote Format Workflow
Today, format runs locally via bazel run //bazel/tools/format:fast_format. In the new model:
- Developer pushes to a branch
- CI "Format check" job runs formatters + gazelle
- If changes are needed, CI commits them back to the branch
- Developer pulls the format fixes
This is slower than local formatting but eliminates the need for local Bazel entirely. The CI format job already exists in buildbuddy.yaml — it just needs to commit fixes back instead of failing.
Build Graph Queries
For bazel query (dependency inspection, target discovery):
bb query— The BuildBuddy CLI can execute queries remotely via BuildBuddy RBE. Includebbin the tools image for this.- BuildBuddy MCP tools —
buildbuddy-mcp-get-targetprovides target-level information from recent invocations. bazel queryin CI — For complex queries, push a script that runs the query in CI and outputs results.
Most query needs are covered by bb query running remotely.
Missing Tools & Workflow Gaps
The OCI tools image is an opportunity to close gaps in the current bazel_env setup. These tools are either used in workflows but not distributed, or represent workflow gaps that should be filled:
Currently missing from bazel_env
| Tool | Current status | Why it should be in the tools image |
|---|---|---|
gh | In goose-agent apko image, not in bazel_env | Essential for PR workflow (gh pr create, gh pr merge --auto --rebase) |
ruff | Only runs via Bazel lint aspect | Useful for quick local Python linting without remote execution |
shellcheck | Only runs via Bazel lint aspect | Useful for quick local shell script linting |
eslint | Only runs via Bazel lint aspect | Useful for quick local JS linting |
agent-run | Custom Go binary in tools/agent-run/ | CLI for triggering Goose agent tasks — needs to be available locally |
hf2oci | Custom Go binary in tools/hf2oci/ | CLI for HuggingFace model → OCI conversion |
claude | Installed independently per machine via npm | Claude Code CLI — pinning version ensures skills, hooks, and MCP config are tested against a known version. Enables in-cluster agents to run Claude Code directly. |
Workflow gaps (new tooling needed)
| Workflow | Gap | Proposed solution |
|---|---|---|
| ArgoCD app diffing | rules_helm/app.bzl has a diff rule referencing argocd-live-diff.sh, but the script doesn't exist. No way to preview what a values.yaml change will do to the live cluster. | Create the diff script. Include argocd CLI in the tools image (already in multitool). The script should: render local Helm template → diff against live ArgoCD app manifests. Can also be exposed as an MCP tool via Context Forge. |
| Manifest preview | render_manifests requires Bazel to run helm template. Without local Bazel, developers can't preview rendered manifests before pushing. | helm is already in the tools image. Create a standalone render script that calls helm template directly with the right flags, without Bazel wrapping. |
| Lint without Bazel | Linters (ruff, shellcheck, eslint) only run via Bazel aspects. Can't lint a single file quickly. | Include linter binaries in the tools image. Add a lint script that runs them directly on changed files. |
Implementation
Phase 1: OCI Tools Image
- [x] Create
tools/image/apko.yamlwith all tools currently inbazel_envplus missing tools (gh,ruff,shellcheck,eslint) - [ ] Include custom Go binaries (
agent-run,hf2oci) — built in CI, copied into image - [x] Create
tools/image/BUILDwith apko build + push rules (no symlink layer — full image extraction) - [ ] Add
homelab-toolsimage push tobuildbuddy.yamlCI pipeline (push on main) - [ ] Add ArgoCD Image Updater config for automatic digest updates
- [ ] Verify multi-arch (x86_64 + aarch64) build works
Phase 2: Local Bootstrap
- [x] Update
.envrcto PATH_add.tools/usr/bin(full image extraction, no symlink indirection) - [ ] Add
.tools/to.gitignore - [x] Create
bootstrap.sh— extracts full image filesystem viacrane export(no path filtering) - [ ] Update
README.bazel.mdto reflect new workflow - [ ] Remove
bazel_envrule fromtools/BUILD(or deprecate)
Phase 3: Standalone Workflow Scripts
- [ ] Create
argocd-live-diff.sh— renders local Helm template, diffs against live ArgoCD app manifests - [ ] Create standalone
renderscript — callshelm templatedirectly without Bazel - [ ] Create standalone
lintscript — runs ruff/shellcheck/eslint on changed files - [ ] Wire diff script into
rules_helm/app.bzl(completing the existinggenerate_diffinfrastructure)
Phase 4: Remote-Only Execution
- [ ] Update
buildbuddy.yamlformat check to auto-commit fixes back to the branch - [ ] Update CLAUDE.md "Essential Commands" section — remove local
bazel build/testreferences - [ ] Update
.claude/skills/bazel/SKILL.md— rewrite for remote-first workflow - [ ] Update
.claude/settings.json— removeBash(bazelisk:*)permission, addBash(bb query:*)if needed - [ ] Add PreToolUse hook to block local
bazel build/test/runcommands (redirect to push + trigger) - [ ] Verify
bb queryworks remotely for common query patterns
Phase 5: In-Cluster Agent Convergence
- [ ] Create agent sandbox template that uses
homelab-toolsimage directly withclaudeas entrypoint - [ ] Inject
CLAUDE_AUTH_TOKENviaOnePasswordIteminto agent pods - [ ] Configure
.claude/settings for headless/non-interactive mode - [ ] Validate Claude Code runs in-cluster with MCP access to Context Forge (ClusterIP, no auth needed)
- [ ] Evaluate whether Goose agent can be deprecated in favor of direct Claude Code execution
Security
No deviations from docs/security.md:
- OCI image — Built with apko (non-root uid 65532, minimal base, no shell in final image unless needed)
- GHCR auth — Uses existing
GHCR_TOKENin BuildBuddy secrets for push; pull is public (homelab repo is public) - No secrets in image — Tools image contains only binaries, no credentials
- Remote execution — BuildBuddy auth via
bb login(existing setup), no credential changes
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| GHCR outage blocks tool pull | Low | Medium | Cache .tools/ locally with 24h TTL. Tools persist across outages. |
| Format auto-commit race | Medium | Low | CI format job creates a separate commit. Developer must pull before pushing again. Standard git workflow. |
| Remote query latency | Medium | Low | bb query adds network round-trip (~1-2s). Acceptable for infrequent queries. BuildBuddy MCP covers common cases. |
| Tool version drift | Low | Medium | Single source of truth (apko.yaml). ArgoCD Image Updater pins digests. Version changes are tracked in git. |
crane not installed | Low | Low | Single prerequisite: brew install crane. Documented in README and .envrc error message. Once tools are pulled, the image itself contains crane for future updates. |
| apko can't package all tools | Medium | Medium | Some tools (like bb itself) may not be in Wolfi repos. Fallback: download binary in a build step and copy into image. |
Open Questions
formatauto-commit strategy — Should CI push format fixes directly to the branch, or create a separate fixup PR? Direct push is simpler but may surprise developers who have local uncommitted changes.Tool staleness check — The 24h TTL in
.envrcis simple but coarse. Should we pin to a specific digest in a lockfile (e.g.,.tools.lock) and only update when the lockfile changes? This would give reproducible tool versions but adds a manual update step.bbpackaging — The BuildBuddy CLI is distributed as a standalone binary, not a Wolfi package. How should it be included in the apko image — download in a pre-build step, or maintain a local apko package?Transition period — Should we support both
bazel_envand OCI tools during migration, or cut over atomically? Parallel support avoids breakage but doubles maintenance.Claude Code in-cluster auth — Claude Code authenticates directly to Anthropic via a token from
claude setup-token, stored in aOnePasswordItem. Including Claude Code in the tools image enables in-cluster agents to run it directly (potentially replacing Goose), using the same token.Claude Code version pinning — Claude Code releases frequently. Should the tools image pin to a specific version (e.g.,
@anthropic-ai/claude-code@1.x.y), or track latest? Pinning avoids surprises but requires manual bumps. ArgoCD Image Updater can't help here since it's an npm package, not an image tag.
References
| Resource | Relevance |
|---|---|
| BuildBuddy CLI | bb CLI for remote query and execution |
| apko | OCI image build tool used throughout the repo |
| rules_apko | Bazel rules for apko image builds |
tools/BUILD bazel_env rule | Current tool distribution mechanism being replaced |
| BuildBuddy Workflows | CI pipeline definition in buildbuddy.yaml |
| docs/security.md | Cluster security model (this ADR is fully compliant) |