Resource
From molecule to market: introducing DrugCore and RegulatoryCore
Introducing DrugCore and RegulatoryCore: drugs and FDA/EMA approvals, harmonized and linked to the trials and literature you already query.
🆕 Two new Cores are live in the Amass API. DrugCore collapses the tangle of names, codes and salt forms behind a drug into one canonical record. RegulatoryCore puts FDA and EMA authorizations on a single, comparable schema. Both link straight into the clinical trials and literature you already query, so you can follow a molecule from its first paper to its approvals in either market without leaving the API.
Why we built them
Two questions sound simple and turn out not to be:
- "Give me everything on this drug."
- "Is it approved in the US and the EU and is the status the same in both?"
What makes them hard has little to do with the science but everything to do with the shape of the data. Drug identity is scattered across brand names, research codes, and synonyms. Regulatory data lives in different agencies that describe the same things with different vocabularies. The two new Cores exist to make those questions one query each.
DrugCore: one record per molecule
The same drug shows up under a dozen labels. Pembrolizumab is Keytruda on the shelf, MK-3475 in early research, Lambrolizumab in older papers, CHEMBL3137343 in ChEMBL — plus salt forms and fixed-dose combinations.
DrugCore harmonizes all of that into a single canonical record — name, modality, highest clinical stage reached, structure — and then links it out to the evidence behind it.
One record, 22K+ molecules deep, sourced from ChEMBL — with drugType, maxClinicalStage, structure (InChIKey, SMILES) and every synonym folded in.
RegulatoryCore — two agencies, one schema
A drug's regulatory life is split across the FDA in the US and the EMA in the EU. They publish different documents (FDA labels and review packages; EMA EPARs and SmPCs), use different procedure names (NDA/BLA vs. centralised authorisation), and run different expedited programs (Breakthrough Therapy vs. PRIME). Comparing a drug across both markets normally means reconciling two vocabularies by hand.
RegulatoryCore parses both agencies onto one shared schema — a unified authorization status, a common procedure field, expedited programs mapped onto shared comparison axes, and a single orphan flag — so US and EU records sit side by side without being forced to mean the same thing.
The Cores are linked
Neither Core is an island. Every record carries the Amass IDs of its related records in the other Cores, so one lookup fans out to the matching trials, papers and approvals — no keyword-matching, no name disambiguation.
A worked example: pembrolizumab, from molecule to market
The fastest way to see what this unlocks is to walk one real drug end to end. We'll use pembrolizumab — the PD-1 checkpoint inhibitor better known as Keytruda.
1. Start at the molecule
One search returns the canonical record:
curl "https://api.amass.tech/api/v1/cores/drugcore/records?query=pembrolizumab" \
-H "Authorization: Bearer amass_YOUR_KEY"
{
"data": [{
"amassId": "AMDC_SXww…",
"name": "PEMBROLIZUMAB",
"drugType": "ANTIBODY",
"maxClinicalStage": "APPROVAL",
"tradeNames": ["Keytruda", "…"],
"synonyms": ["MK-3475", "Lambrolizumab", "SCH-900475", "…"],
"description": "Antibody drug, Approval stage — 9 approved and 146 investigational indications."
}]
}
Every alias you might have searched — Keytruda, MK-3475, Lambrolizumab — resolves to this one record. Ask for its cross-core links and the molecule fans out to its whole evidence base:
curl "…/drugcore/records/AMDC_SXww…?include=referencesRegulatoryCore&include=referencesTrialCore" \
-H "Authorization: Bearer amass_YOUR_KEY"
# referencesTrialCore → 2,408 trial IDs (resolve in TrialCore)
# referencesRegulatoryCore → 3 authorization IDs (resolve in RegulatoryCore)
💡 Coverage varies by field — always check the array length before relying on it. Here the trial links are dense (2,408) and there are 3 regulatory links, while biomed links aren't recorded for this molecule yet. An empty list means "no links recorded," not "no evidence."
2. Follow it to market
Resolving those three referencesRegulatoryCore IDs gives the molecule's approvals across both agencies — already normalized, so you read them in one view:
| Agency | Product | Procedure | Status | Authorized |
|---|---|---|---|---|
| FDA · US | Keytruda | BLA | ACTIVE | 2014-09-04 |
| EMA · EU | Keytruda | Centralised | ACTIVE | 2015-07-17 |
| FDA · US | Keytruda Qlex | BLA | ACTIVE | 2025-09-19 |
Both markets are active, and the US authorization led the EU by about ten months — a fact you read straight off authorizationsByAgency, the cross-market link that every record carries. The 2025 Keytruda Qlex row is the newer fixed-dose combination, captured on the same schema.
3. Compare expedited programs on shared axes
Keytruda reached the market fast, on three FDA expedited programs. RegulatoryCore maps each agency-native program onto a shared comparison axis where the native name is preserved, but the FDA program lines up with its EMA counterpart instead of being buried in agency-specific jargon.
| Shared axis | FDA program (Keytruda) | EMA equivalent |
|---|---|---|
REVIEW_ACCELERATION | Priority Review | Accelerated Assessment |
DEVELOPMENT_SUPPORT | Breakthrough Therapy | PRIME |
EARLY_ACCESS_BASIS | Accelerated Approval | Conditional MA |
So "which drugs reached the market on an accelerated or breakthrough pathway?" becomes a filter (hasDesignation) that works across both markets at once.
4. Go deeper — search inside the source documents
Structured fields only scratch the surface of a regulatory file. The substance lives in the prose: the boxed warnings, the immune-mediated reactions, the surrogate endpoints behind an accelerated approval. RegulatoryCore parses the full text of every label, review, SmPC and EPAR into addressable sections — the FDA Keytruda record alone carries 376 parsed sections (70 from the label, 306 from the review package).
A plain query sweeps all of that text, not just the metadata:
curl "…/regulatorycore/records?query=immune-mediated%20hepatitis" \
-H "Authorization: Bearer amass_YOUR_KEY"
# → surfaces Keytruda even though "hepatitis" is nowhere in its structured indication —
# the match lives in FDA_LABEL §5.2, "Immune-Mediated Hepatitis", returned with the exact text.
A clinical phrase that exists only deep in a warnings section still surfaces the drug, with the matching section and a direct link to the source PDF.
Why it matters: one field, two markets
The clearest payoff is a question that used to take a research afternoon. The same query across both agencies returns each market's status on a single field — so a US/EU divergence is impossible to miss:
| Product | FDA · US | EMA · EU |
|---|---|---|
| Aduhelm (aducanumab) | ACTIVE | WITHDRAWN_DURING_REVIEW |
| Leqembi (lecanemab) | ACTIVE | ACTIVE |
The same product, active in one market and withdrawn in another — visible in a single field, with no second query.
Two Alzheimer's drugs, same class, opposite regulatory outcomes in Europe. authorizationsByAgency is always populated and carries each market's own status, so you can tell instantly which is which.
Get started
Both Cores are live now, behind the same auth, error format and query patterns as BiomedCore and TrialCore:
- DrugCore —
GET /v1/cores/drugcore/records· 22K+ drugs and molecules from ChEMBL - RegulatoryCore —
GET /v1/cores/regulatorycore/records· 7K+ cross-agency FDA + EMA authorizations
Full reference and schemas are in the API docs. Base URL: https://api.amass.tech/api/v1.
The bigger picture
DrugCore and RegulatoryCore slot into a platform that already spans the literature and the clinic. Four Cores, one API, one connected graph — query any record and traverse to its related records in the others: a drug to its trials, papers and approvals, and back again.
Book a technical discussion