Axiom Locus currently tracks over 929,000 building permits across 147 jurisdictions. But simply throwing a million points on a map creates a 'heatmap' that is visually interesting but analytically useless. To provide distinct edge, we needed to algorithmically define commercial development hot zones. We accomplished this with our new HDBSCAN-powered Permit Clustering Pipeline.
Why HDBSCAN?
Traditional algorithms like k-means require you to know the number of clusters in advance. DBSCAN struggles with varying densities. HDBSCAN finds clusters of varying densities automatically and isolates noise—which is exactly how real-world construction operates.
Commercial development is uniquely dense and occurs over shifting timescales. By applying HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) against our dataset, we filter out isolated residential renovations and purely speculative filings. We isolate contiguous commercial corridors where capital is actively concentrating.
The Pipeline Architecture
| Component | Technology | Function |
|---|---|---|
| Ingestion Layer | Groundswell / Edge | Extracts and standardizes feeds |
| Geospatial DB | PostGIS | Stores geometries and distances |
| Clustering | Python / HDBSCAN | Computes density hierarchy |
| Materialization | Supabase | Caches multipolygons for client |
The compute overhead of running pairwise distance matrices for 929K points is non-trivial. Our data engineering team implemented a two-pass approach: applying H3 indexing (resolution 8) as a spatial pre-filter to chunk the permit space, and then running HDBSCAN inside and across those chunks.
Looking Ahead
Our Permit Clustering Pipeline is now fully integrated into the 'Development Pipeline' signal in Axiom Locus. This isn't just a backend update; it fundamentally increases the predictive power and signal-to-noise ratio of our platform. Next up: integrating the same clustering logic into real-time SEC filing disclosures.