Drug shortages cost the U.S. healthcare system an estimated $230 million per year in labor alone — not counting the clinical harm from delayed treatments, canceled surgeries, and rationed medications. The pharmaceutical supply chain is fragile, opaque, and poorly monitored.

We asked a simple question: can public FDA data predict which manufacturers are about to have a shortage?

The answer is yes — with surprising clarity.

The Discovery: 40.6% Correlation

Using our lead/lag survival engine, we analyzed 10,000 FDA warning letters and 10,000 drug shortage records. The result:

40.6% of companies that receive an FDA warning letter experience a related drug shortage within one year.

The signal is strong and it builds over time:

Lag Window	Probability	Interpretation
7 days	5.0%	Immediate impact — facility shutdowns or recalls
30 days	14.8%	Short-term supply disruption propagating
90 days	23.8%	Remediation delays causing production gaps
365 days	40.6%	Systemic manufacturing fragility exposed

This means a pharma supply chain manager could get a 3-12 month early warning on potential shortages — enough time to qualify alternative suppliers, adjust safety stock, or notify clinical teams.

Why Nobody Found This Before: The Entity Resolution Problem

The reason this correlation was invisible is not because the data is hidden — both warning letters and shortage notices are published by the same agency (FDA). The problem is that the same company appears differently in each database.

FDA warning letters might list "Pfizer Inc" while the shortage database lists "Pfizer." Or "Teva Pharmaceutical Industries Ltd" vs "Teva Pharmaceuticals USA." Or a warning letter addresses a specific manufacturing site while the shortage lists the parent company.

With exact name matching, our engine found zero connections. The probability at every lag window was 0.00.

We solved this with a cascading entity resolution strategy:

Match Strategy	Priority	How It Works
Product match	1 (strongest)	Warning letter subject mentions the same drug/generic name as the shortage
Normalized name	2	Strip Inc/LLC/Corp/Pharmaceuticals, lowercase, compare
Fuzzy Jaccard	3 (catch-all)	Token-level similarity ≥ 0.6 after normalization

The product-level match is the tightest causal link: a warning letter about a facility that manufactures Drug X, followed by a shortage of Drug X, is not a coincidence — it's a supply chain event propagating through the system.

The fuzzy name match catches the long tail: corporate restructuring, subsidiary vs parent mismatches, and the hundreds of ways companies format their legal names across government filings.

The Bigger Pattern: Entity Resolution Is the Moat

This finding illustrates a structural gap in public-data intelligence that extends far beyond FDA:

●EPA enforcement → OSHA violations: the same facility appears with different names in EPA and OSHA databases
●Building permit surge → business openings: permits use addresses, POIs use Google Place IDs
●Federal rule proposed → permit activity change: rules reference CFR sections, permits reference local zoning codes
●Crime spike → business closure: crime uses H3 cells, businesses use street addresses

Every one of these correlations requires solving the same fundamental problem: determining that two records in different databases refer to the same real-world entity. The data is free and public. The insight is locked behind entity resolution.

This is why we built the Axiom platform with entity resolution as a core capability, not an afterthought. Our cascading match engine — exact ID, normalized name, fuzzy similarity, spatial proximity — runs across every data source we ingest.

What This Enables

The 40.6% finding is just the first event pair we've analyzed. We're now computing lead/lag survival curves across every cross-agency combination in our platform:

●Biomanufacturing Fragility Index: real-time supply-risk scoring for 1,263 pharmaceutical companies
●Facility Twin profiles: unified lifecycle graphs for 13,103 industrial facilities across EPA, OSHA, and FDA
●Permit sequence prediction: what permit type comes next at this address, with probability and timeframe
●Regulatory overhang tracking: how long from proposed rule to operational impact

The lead/lag engine transforms public records from a static archive into a predictive system. Not "what happened" — but "what usually happens next, and how long does it take?"

Explore the Platform →View API Docs