Drug shortages cost the U.S. healthcare system an estimated $230 million per year in labor alone — not counting the clinical harm from delayed treatments, canceled surgeries, and rationed medications. The pharmaceutical supply chain is fragile, opaque, and poorly monitored.
We asked a simple question: can public FDA data predict which manufacturers are about to have a shortage?
The answer is yes — with surprising clarity.
The Discovery: 40.6% Correlation
Using our lead/lag survival engine, we analyzed 10,000 FDA warning letters and 10,000 drug shortage records. The result:
40.6% of companies that receive an FDA warning letter experience a related drug shortage within one year.
The signal is strong and it builds over time:
| Lag Window | Probability | Interpretation |
|---|---|---|
| 7 days | 5.0% | Immediate impact — facility shutdowns or recalls |
| 30 days | 14.8% | Short-term supply disruption propagating |
| 90 days | 23.8% | Remediation delays causing production gaps |
| 365 days | 40.6% | Systemic manufacturing fragility exposed |
This means a pharma supply chain manager could get a 3-12 month early warning on potential shortages — enough time to qualify alternative suppliers, adjust safety stock, or notify clinical teams.
Why Nobody Found This Before: The Entity Resolution Problem
The reason this correlation was invisible is not because the data is hidden — both warning letters and shortage notices are published by the same agency (FDA). The problem is that the same company appears differently in each database.
FDA warning letters might list "Pfizer Inc" while the shortage database lists "Pfizer." Or "Teva Pharmaceutical Industries Ltd" vs "Teva Pharmaceuticals USA." Or a warning letter addresses a specific manufacturing site while the shortage lists the parent company.
With exact name matching, our engine found zero connections. The probability at every lag window was 0.00.
We solved this with a cascading entity resolution strategy:
| Match Strategy | Priority | How It Works |
|---|---|---|
| Product match | 1 (strongest) | Warning letter subject mentions the same drug/generic name as the shortage |
| Normalized name | 2 | Strip Inc/LLC/Corp/Pharmaceuticals, lowercase, compare |
| Fuzzy Jaccard | 3 (catch-all) | Token-level similarity ≥ 0.6 after normalization |
The product-level match is the tightest causal link: a warning letter about a facility that manufactures Drug X, followed by a shortage of Drug X, is not a coincidence — it's a supply chain event propagating through the system.
The fuzzy name match catches the long tail: corporate restructuring, subsidiary vs parent mismatches, and the hundreds of ways companies format their legal names across government filings.
The Bigger Pattern: Entity Resolution Is the Moat
This finding illustrates a structural gap in public-data intelligence that extends far beyond FDA:
- ●EPA enforcement → OSHA violations: the same facility appears with different names in EPA and OSHA databases
- ●Building permit surge → business openings: permits use addresses, POIs use Google Place IDs
- ●Federal rule proposed → permit activity change: rules reference CFR sections, permits reference local zoning codes
- ●Crime spike → business closure: crime uses H3 cells, businesses use street addresses
Every one of these correlations requires solving the same fundamental problem: determining that two records in different databases refer to the same real-world entity. The data is free and public. The insight is locked behind entity resolution.
This is why we built the Axiom platform with entity resolution as a core capability, not an afterthought. Our cascading match engine — exact ID, normalized name, fuzzy similarity, spatial proximity — runs across every data source we ingest.
What This Enables
The 40.6% finding is just the first event pair we've analyzed. We're now computing lead/lag survival curves across every cross-agency combination in our platform:
- ●Biomanufacturing Fragility Index: real-time supply-risk scoring for 1,263 pharmaceutical companies
- ●Facility Twin profiles: unified lifecycle graphs for 13,103 industrial facilities across EPA, OSHA, and FDA
- ●Permit sequence prediction: what permit type comes next at this address, with probability and timeframe
- ●Regulatory overhang tracking: how long from proposed rule to operational impact
The lead/lag engine transforms public records from a static archive into a predictive system. Not "what happened" — but "what usually happens next, and how long does it take?"