Skip to content

ADR 006: Stars grid ingest via a dedicated job writing to the monolith DB

Author: Joe McGinley Status: Accepted Created: 2026-06-13


Problem

The monolith stars domain currently plots a hand-curated list of ~30 Scottish dark-sky destinations (projects/monolith/stars/seed.py). That list is useful (named, drivable spots) but it is sparse: it cannot answer "is it dark and clear near me right now" for an arbitrary location, and it under-represents large swathes of the country.

The original stargazer pipeline produced a dense grid of ~500 sample points by combining the DJ Lorenz light-pollution atlas, OSM road data, and a DEM, then filtering to genuinely dark, reachable locations. That grid lived only in the standalone stargazer service. When the dynamic half of stargazer (weather fetch plus astronomy scoring) was absorbed into the monolith and the rest was slated for retirement, the grid was dropped: porting its geospatial stack (rasterio/GDAL, geopandas, osmium) into the monolith API image would balloon the image and bring 2Gi+ memory spikes into the request-serving pod, which is unacceptable for a latency-sensitive API.

We want the grid's coverage back without paying that cost in the API pod, and without resurrecting the whole standalone pipeline as a runtime dependency.


Decision

Reintroduce the grid as data ingested into the monolith Postgres by a dedicated job that is separate from the API pod. The monolith remains the source of truth: a small stars sites table holds the grid points, and the existing in-monolith weather-refresh job scores them into stars.site_hours exactly as it scores the curated seed today (subject to the adaptive-darkness threshold work).

The heavy geospatial computation (light-pollution atlas plus roads plus DEM into a filtered point set) is near-static: light pollution and road networks change on the order of months to years, not nightly. So it does not belong on the hot weather-refresh path. It runs in its own Bazel-built image, on its own infrequent cadence, and its only output is a few hundred rows written to the monolith DB. The API pod never imports a geospatial dependency.

AspectTodayDecided
Site source~30 curated points in seed.py~500 grid points in stars.sites (DB)
CoverageNamed destinationsDense Scotland-wide grid
Geospatial deps (rasterio/osmium/geopandas)Not present anywhere in the monolithOnly in the separate ingest job image
Who computes the gridN/A (static list)Dedicated job, infrequent (grid is near-static)
Who scores weatherIn-monolith refresh job (every 3h)Unchanged: in-monolith refresh job
Source of truthMonolith PostgresMonolith Postgres (unchanged)
Map renderingAll ~30 points at onceViewport-scoped loading (grid is too dense for one payload)

Architecture

mermaid
graph LR
    subgraph offline["Grid ingest job (separate image, infrequent)"]
        LP[LP atlas + OSM roads + DEM] --> GRID[Filter to dark, reachable points]
    end
    GRID -->|write ~500 rows| SITES[(stars.sites)]
    subgraph api["Monolith API pod (light, no geospatial deps)"]
        REFRESH[weather-refresh job every 3h] -->|read sites| SITES
        REFRESH -->|fetch + score met.no| HOURS[(stars.site_hours)]
        PRUNE[hourly prune] --> HOURS
        EP[/api/stars/sites?bbox=/] -->|read future hours| HOURS
    end
    EP --> FE[/app/stars map: viewport-scoped]

The ingest job writes site metadata (id, lat, lon, light-pollution zone, and any reachability attributes); the refresh job joins that against met.no forecasts and writes scored future hours; the read endpoint serves viewport-scoped slices to the map.


Alternatives Considered

  • Keep the curated ~30 sites only. Rejected: the coverage gap is the whole reason for this ADR. (It remains a fine fallback if the grid never ships.)
  • Port the geospatial stack into the monolith API image. Rejected: image bloat and 2Gi+ memory spikes do not belong in the request-serving pod. This is the exact cost the original absorption avoided.
  • Hybrid: pull ranked output from a still-running standalone stargazer over the cluster network. Rejected (also rejected when the domain was first built): it keeps the entire standalone pipeline, its CronJob, PVC, and NGINX alive as a permanent cross-namespace runtime dependency, the opposite of retiring it.
  • Expand the curated seed list to ~150 sites by hand. Not the endgame (still manual, still not a true grid), but a low-effort interim that needs no new infrastructure if the job slips.

Security

Baseline per docs/security.md. The ingest job runs non-root (uid 65532), needs write credentials to the monolith database (via the 1Password operator, never hardcoded), and reads only public datasets (the light-pollution atlas and OSM). It adds no new external exposure: its sole output is rows in the existing Postgres, reached over the in-cluster network. Cross-namespace DB access, if the job runs outside the monolith namespace, must follow the Linkerd-meshed policy rules (no raw Kubernetes NetworkPolicies in meshed namespaces).


Risks

RiskLikelihoodImpactMitigation
~500 points multiplies met.no fetch volume on the refresh jobHighMediumThe refresh already rate-limits (~15/s under met.no's 20/s cap); ~500 points is ~33s per 3h run, well within budget
Serving all grid points in one payload bloats the pageHighMediumViewport-scoped loading on the read endpoint (bbox query), mirroring the ships global-feed roadmap
Geospatial job image is large and slow to buildMediumLowSeparate image built infrequently; not on the API build/deploy hot path
Static grid goes stale as light pollution shiftsLowLowRe-run the job on dataset updates; cadence measured in months
Migration-as-data: a bulk seed in the chart migrations ConfigMap would blow the 256 KiB annotation capMediumHighThe grid is loaded out-of-band by the job, never embedded in chart/migrations/*.sql (see the monolith anti-patterns)

Open Questions

  1. Grid resolution and the light-pollution-zone cutoff (how dark is "dark enough" to include a point), and whether to also gate on reachability (road proximity) as the original pipeline did.
  2. Table shape: a new stars.sites table joined to stars.site_hours, versus folding site metadata into the hours rows. (The current domain keeps metadata in seed.py; the grid forces it into the DB.)
  3. Whether to retain the curated ~30 as a "featured / named" subset layered over the grid, or replace them entirely.
  4. Job cadence and trigger: a low-frequency CronJob versus a one-shot plus manual re-run on dataset refresh.
  5. The viewport read API contract (bbox parameters, max points per response, clustering/zoom behaviour) and how it interacts with the CDN cache key.

References

ResourceRelevance
docs/decisions/platform/002-cdn-cached-data-fetching.mdThe read-endpoint caching model the viewport API must fit
projects/monolith/stars/seed.pyThe curated list this grid supersedes/augments
projects/stargazer/backend/The original geospatial grid computation to lift into the job
docs/plans/2026-06-13-stars-into-monolith.mdThe self-contained stars design this extends