Platform12 min read

How Axiom Locus Works

80+ public records, H3 hexagons, and causal scoring — the architecture behind the platform

Axiom Intelligence2026-05-14

Most CRE intelligence tools either resell the same proprietary feeds everyone else has, or scrape a single dataset (permits, demographics, traffic counts) and call it intelligence. Locus is built on a different premise: that hundreds of public-record streams, normalized and joined at the block level, contain a richer signal than any private dataset can match — if you can do the spatial joins, dedupe the entities, and resolve the messy strings.

This is the architecture post. If you've ever opened the Explorer and wondered what is actually under the score, this is the answer.

The data backbone

Locus ingests from 80+ public sources across federal, state, and municipal systems. The volume is meaningful but the breadth is what matters — the goal is triangulation across independent signal sources, not depth on any single one.

Records by domain (selected sources)
Building permits
929K
POI / business venues
612K
Commuter flows (LEHD)
454K
EPA cleanups + grants
287K
311 complaints (daily)
198K
Liquor licenses
72K
Job postings
61K
FDA / NIH / CT.gov
10K

Every record lands in Supabase Postgres with PostGIS for geometry and the APRS envelope from Axiom Codex (record_id URN, source_uri, schema_version, acl_tier, occurred_at). The envelope is what lets us tell — three pipelines downstream — that a permit, an FDA letter, and a Section 108 grant all reference the same address.

Why H3 hexagons (and not census tracts)

Tracts are political artifacts: variable in area, drawn for population counts, and frozen between decennial updates. They're terrible for spatial analytics. We score on H3 hexagons — Uber's hierarchical hex index — at resolution 8 (≈460k m², roughly a city block) for primary analysis, and resolution 6 for metro summaries.

Hexagons solve three problems tracts can't: (1) uniform area, so densities are comparable across cities; (2) clean hierarchical aggregation, so a metro view is just an h3ToParent() away; (3) consistent neighbor relationships, which matter when you're running density clustering.

Every cell-level table in the database carries an h3_index TEXT column computed app-side with h3-js v4. The h3 Postgres extension isn't available on Supabase, so we precompute and index instead — fast lookups, no PL/pgSQL dependency.

Permit clustering: HDBSCAN on 929K records

Building permits are the rawest signal Locus has. Every cosmetic remodel, restaurant build-out, and ground-up tower files one. The challenge is that permits as point data are noise. The intelligence is in spatial-temporal clusters: where are permits arriving in dense bursts that don't fit the city's existing pattern?

We run HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) on the permit corpus, parameterized per metro to account for very different baseline densities. HDBSCAN was chosen over k-means and DBSCAN for two reasons: it doesn't require pre-specifying cluster count, and it handles wildly variable cluster densities — which is exactly what real urban geography looks like.

MetroPermits processedClusters detectedDensity vs metro mean
NYC187,4021,2434.1×
LA143,818 9783.8×
Chicago 92,640 7123.5×
Houston 81,294 6543.2×
Phoenix 68,117 5413.6×
Nashville 24,930 2873.9×

Causal scoring across 8 dimensions

A cluster of permits tells you something is happening. The scoring layer tells you whether what's happening matters and to whom. Every H3 cell gets evaluated across 8 dimensions, each backed by independent signal:

DimensionPrimary signalsUpdate cadence
Demographic momentumIRS SOI migration, ACS 5-yr, USPS NCOAAnnual + quarterly
Commercial activityBuilding permits, BLS QCEW, liquor licensesWeekly
EnvironmentalEPA brownfields, FEMA flood, Sentinel-2Monthly
Safety & livability311 complaints, FBI NIBRS, crash dataDaily–weekly
MobilityLEHD origin-destination, GTFS transitQuarterly
EducationGreatSchools, NCES enrollmentAnnual
WalkabilityOSM density, POI clustering, sidewalksMonthly
Job marketJob postings, salary gap, BLS LAUSWeekly

The scores aren't a black-box average. Each dimension produces an explainable signal vector — a list of the specific records (with source_uri back to the original) that moved the needle on that cell this quarter. When the Explorer shows you that a hex's commercial score jumped 18 points, you can click through to the 47 permits and 9 liquor licenses that caused it.

Self-healing pipelines

Public records APIs break constantly. Schemas drift. Endpoints rate-limit without warning. NYC OpenData changes a column name. Phoenix moves to a new ArcGIS feature server. The scout agent — a 57-source autonomous collector running on Railway — is built around this reality.

Every loader carries: (1) a schema fingerprint computed at last successful run; (2) row-count tolerance bands; (3) a quarantine path for records that fail validation; (4) an LLM-mediated triage that classifies failures as transient (retry), structural (alert + auto-PR), or definitional (escalate to a human). The result is that source breakage rarely propagates to scoring — we get a Resend digest on Monday with what's drifted and a draft fix already in review.

The unified intelligence timeline

The data backbone's most valuable output isn't any single score — it's axiom_events: a 517K-row unified timeline where every permit, FDA letter, vessel arrival, sanctions designation, and infrastructure grant lives in one temporally-ordered table with consistent geometry. Locus and Overwatch (our maritime product) both read from it. Codex normalizes it. Drift will run causal inference on it.

If you've ever wondered what 'platform leverage' actually means: it's that adding USPTO patent grants to Locus's facility intelligence also made Overwatch's port-of-arrival enrichment more accurate. Same backbone, two products, one event stream.