By the time the third specialty coffee shop opens, the institutional money is already in. By the time housing prices spike on Zillow, retail rents have re-quoted twice. The gentrification trade is won 18 months before the first national news article, and the signals are public — they're just scattered across federal cleanup databases, municipal permit portals, satellite imagery archives, and 311 call logs that nobody reads together.
The Environmental + Social + Growth Index (ESGI) is the part of Locus designed to read them together. This is the methodology piece — what goes in, how it's combined, what we've backtested, and where it has been wrong.
The four signal streams
| Stream | Source | What it captures | Lead time |
|---|---|---|---|
| Environmental remediation | EPA brownfields, Superfund, ECHO | Public capital flowing in | 18–24 mo |
| Permit density velocity | Municipal building permits + HDBSCAN | Construction commitment | 9–18 mo |
| Satellite ground truth | Sentinel-2 NDVI + impervious surface | Actual physical change | 3–9 mo |
| Crime velocity inversion | FBI NIBRS + 311 disorder complaints | Behavioral shift | 6–12 mo |
The trick isn't any one stream. EPA cleanup grants on their own are noisy — most never become anything. Permits alone get gamed by speculative developers who file and never break ground. Satellite imagery sees change but can't tell new gentrification from existing wealth. Crime velocity inverts late. The ESGI score requires three of the four to move in concert before it fires.
Why EPA data is the leading edge
Brownfield grants, Section 108 loans, and Superfund delistings are public capital allocations toward neighborhoods. They are the earliest public signal that institutional money has decided a place is worth fixing. They are also massively underused: most CRE platforms don't ingest EPA data at all, and the ones that do treat it as a negative signal (contamination risk) rather than a positive one (remediation underway).
Our EPA bulk ingestion brought in 287K records spanning brownfield assessments, cleanup grants, ECHO enforcement, and Superfund site status. A neighborhood-string-resolution utility maps the messy address fields onto H3 cells. The result is a per-hex environmental-investment signal that leads everything else.
HDBSCAN on permits: density is the signal
Total permit count is a vanity metric. What predicts gentrification is permit clustering — bursts of small-to-mid renovations in spatial proximity, especially when the cluster spans permit types (commercial change-of-use + residential remodel + sidewalk work).
HDBSCAN identifies these clusters without us pre-specifying how many to expect. The algorithm's minimum-cluster-size parameter is tuned per metro: NYC's baseline density forces us to a min of 25 permits; Nashville works at 8. The output is a cluster ID and a stability score per H3 cell.
Sentinel-2 as ground truth
Permits get filed; not everything gets built. Sentinel-2 imagery — free, every five days, 10m resolution — closes the loop. We pull change-detection between t-12mo and t-0 on impervious surface index and NDVI loss within each ESGI candidate cell. If permits filed but no physical change appears, the ESGI signal is downweighted. If satellite shows construction with no permit filed, that's its own (often more interesting) signal.
Across 22 metros, Sentinel-2 sees 15–20% more active construction than permit records suggest. Some is legal (small jobs under permit thresholds); some isn't. Either way, satellite-only signal flags the gap.
Crime velocity: inversion as a confirming signal
Annual crime rate is a lagging, often misleading indicator. What predicts neighborhood change is crime velocity — the year-over-year delta in 311-reported disorder (graffiti, abandoned vehicles, noise) and NIBRS Part 1 offenses. A neighborhood mid-gentrification typically shows: declining disorder complaints (residents leaving, new residents not calling), declining nuisance offenses, but flat or slightly rising property crime (more wealth to take).
This pattern — diverging directions across crime categories — is what we look for. Uniform decline often means the population is leaving entirely. Uniform increase means the area is destabilizing. The inflection is in the divergence.
Backtested performance
Top-decile ESGI cells outperformed the metro baseline by ~6.5×. The model isn't perfect — 14% of top-decile fires were false positives, mostly in metros with strong rent control (where the rent signal can't actually move). The structural failure mode is policy, not data.
Where it gets it wrong
Three classes of false positive recur: (1) institutional anchor builds — a single mega-project (hospital expansion, university dorm) trips permit and environmental signals without organic neighborhood shift; (2) rent-controlled jurisdictions where the prediction is right but the rent never moves; (3) disaster recovery (post-hurricane rebuild) which mimics the permit/satellite/environmental pattern but represents replacement, not appreciation.
The Explorer surfaces these as 'flagged but down-weighted' so analysts see them rather than getting silently filtered. Transparency in the failure mode is part of the score.