TypeScript clients for NCBI APIs — PubMed, PMC, BLAST, SNP, ClinVar, PubChem, Datasets, and more.
Disclaimer: This is an unofficial, community-maintained SDK. It is not affiliated with, endorsed by, or related to the National Center for Biotechnology Information (NCBI) or the NCBI GitHub organization. For official NCBI tools and resources, visit ncbi.nlm.nih.gov/home/develop.
The National Center for Biotechnology Information (NCBI), part of the U.S. National Library of Medicine (NLM), maintains the world's largest collection of biomedical databases. These include PubMed (37M+ article citations), PubMed Central (PMC, 9M+ full-text articles), MeSH (controlled medical vocabulary), BLAST (sequence alignment), dbSNP (genetic variation), ClinVar (clinical variants), PubChem (chemical compounds), and many more. Researchers, clinicians, and developers rely on NCBI's public APIs to search, retrieve, and analyze biomedical data programmatically.
ncbijs provides typed, zero-dependency TypeScript clients for these APIs. This entire project is built and maintained by AI using Claude Code — no human-written code is accepted. See CONTRIBUTING.md for details.
It is designed for two audiences:
- Developers and researchers building biomedical applications, literature review tools, or clinical decision support systems.
- LLM and AI agents that need structured, programmatic access to biomedical literature for retrieval-augmented generation (RAG), entity extraction, and citation management.
Built for LLM consumption. Every package follows consistent naming, consistent interfaces, and has a self-documenting API with full JSDoc. The MCP server exposes 27 tools that any LLM agent can call directly. The workflow table below and the "Which package do I need?" decision tree make it easy for agents to discover the right package without reading source code. 40 of 43 packages run in the browser — ideal for agentic web apps that query NCBI without a backend.
| Workflow | Packages |
|---|---|
| Search PubMed and retrieve article metadata | @ncbijs/pubmed + @ncbijs/eutils |
| Fetch full-text articles from PMC | @ncbijs/pmc + @ncbijs/jats |
| Extract genes, diseases, chemicals from articles | @ncbijs/pubtator |
| Generate formatted citations (RIS, MEDLINE, CSL-JSON) | @ncbijs/cite |
| Convert between PMID, PMCID, and DOI | @ncbijs/id-converter |
| Expand MeSH terms for comprehensive searches | @ncbijs/mesh |
| Chunk full-text articles for RAG pipelines | @ncbijs/jats (toChunks) |
| Look up genes, genomes, and taxonomy | @ncbijs/datasets |
| Parse FASTA nucleotide/protein sequences | @ncbijs/fasta |
| Run BLAST sequence alignments | @ncbijs/blast |
| Look up SNP/variant data from dbSNP | @ncbijs/snp |
| Query clinical variant significance from ClinVar | @ncbijs/clinvar |
| Retrieve compound, substance, and assay data | @ncbijs/pubchem |
| Fetch protein sequences in FASTA or GenBank format | @ncbijs/protein |
| Fetch nucleotide sequences in FASTA or GenBank format | @ncbijs/nucleotide |
| Parse GenBank flat file records locally | @ncbijs/genbank |
| Look up genetic disorders from OMIM | @ncbijs/omim |
| Query medical genetics concepts from MedGen | @ncbijs/medgen |
| Search genetic tests from GTR | @ncbijs/gtr |
| Search gene expression datasets from GEO | @ncbijs/geo |
| Query structural variants from dbVar | @ncbijs/dbvar |
| Search sequencing experiment metadata from SRA | @ncbijs/sra |
| Look up 3D molecular structures from MMDB/PDB | @ncbijs/structure |
| Search conserved protein domains from CDD | @ncbijs/cdd |
| Search NCBI Bookshelf entries | @ncbijs/books |
| Look up journal/serial records from NLM Catalog | @ncbijs/nlm-catalog |
| Convert variant notations (HGVS, SPDI, VCF) | @ncbijs/snp |
| Get full compound annotations (GHS, patents) | @ncbijs/pubchem |
| Chain search-fetch pipelines via History Server | @ncbijs/eutils |
| Search clinical trials by condition/intervention | @ncbijs/clinical-trials |
| Get citation metrics and impact scores | @ncbijs/icite |
| Normalize drug names and find drug classes | @ncbijs/rxnorm |
| Look up drug labels, SPLs, and NDC packaging | @ncbijs/dailymed |
| Find literature linked to genetic variants | @ncbijs/litvar |
| Get annotated text with entity recognition | @ncbijs/bioc |
| Autocomplete ICD-10, LOINC, SNOMED codes | @ncbijs/clinical-tables |
| Store NCBI data locally in DuckDB | @ncbijs/store |
| Query stored data with the same package API | fromStorage() on domain packages |
| Build data pipelines (Source → Parse → Sink) | @ncbijs/pipeline |
| Load any NCBI dataset with one function call | @ncbijs/etl |
| Watch NCBI sources for updates and re-sync | @ncbijs/sync |
| Expose all tools to LLM agents via MCP | @ncbijs/http-mcp |
| Query local NCBI data via MCP | @ncbijs/store-mcp |
The most effective way to consume ncbijs through an AI agent is to point the agent at this repository directly — clone it locally so the agent's working directory sits inside the repo, or give the agent read access on GitHub.
Every package ships two docs side by side:
README.md— human-facing, what npm renders.CLAUDE.md— agent-optimised deep reference: full API surface, type metadata, cross-package wiring, common pitfalls, and "when NOT to use this".
Claude Code (and any agent that honours nested CLAUDE.md discovery) auto-loads packages/{name}/CLAUDE.md when the agent's working directory is inside that subtree. The root CLAUDE.md is the global router — workflow table, packages index, decision tree — so the agent lands on the right package without scanning the whole repo.
Clone, drop your agent in, and discovery just works.
| Package | Description | Version |
|---|---|---|
@ncbijs/bioc |
BioC API client — annotated PubMed and PMC articles (named entity… | |
@ncbijs/blast |
NCBI BLAST sequence alignment client. Async submit/poll/retrieve w… | |
@ncbijs/books |
Typed client for NCBI Bookshelf — search and fetch biomedical book… | |
@ncbijs/cdd |
Typed client for NCBI Conserved Domain Database (CDD) — search and… | |
@ncbijs/cite |
Citation formatting in 4 styles (RIS, MEDLINE, CSL-JSON, NLM Citat… | |
@ncbijs/clinical-tables |
Typed client for the NLM Clinical Tables Search API. Autocomplete… | |
@ncbijs/clinical-trials |
Typed client for the ClinicalTrials.gov v2 REST API. Search interv… | |
@ncbijs/clinvar |
NCBI ClinVar clinical variant data — search and fetch variants via… | |
@ncbijs/dailymed |
Typed client for the FDA DailyMed REST API v2 — drug name search,… | |
@ncbijs/datasets |
Typed client for the NCBI Datasets API v2 — gene reports, taxonomy… | |
@ncbijs/dbvar |
Typed client for NCBI dbVar — search and fetch structural variatio… | |
@ncbijs/etl |
Pre-wired NCBI dataset loaders. One function call to download, par… | |
@ncbijs/eutils |
Spec-compliant client for all 9 NCBI E-utilities (esearch, efetch,… | |
@ncbijs/fasta |
Zero-dependency FASTA format parser. Pure synchronous function: st… | |
@ncbijs/genbank |
Zero-dependency parser for the NCBI GenBank flat-file format. Spli… | |
@ncbijs/geo |
Typed client for NCBI Gene Expression Omnibus (GEO) — search and f… | |
@ncbijs/gtr |
Typed client for the NCBI Genetic Testing Registry (GTR) — search… | |
@ncbijs/http-mcp |
Model Context Protocol server exposing ncbijs domain packages as L… | |
@ncbijs/icite |
Typed client for the NIH iCite API. Retrieve citation metrics — Re… | |
@ncbijs/id-converter |
Batch conversion between PMID, PMCID, DOI, and NIH Manuscript ID v… | |
@ncbijs/jats |
Parser for JATS XML (NISO Z39.96) full-text articles with markdown… | |
@ncbijs/litvar |
LitVar2 client — links genetic variants (rsIDs) to PubMed/PMC lite… | |
@ncbijs/medgen |
Typed client for NCBI MedGen medical-genetics concepts — search an… | |
@ncbijs/mesh |
NLM Medical Subject Headings (MeSH) vocabulary — tree traversal, q… | |
@ncbijs/nlm-catalog |
Typed client for NLM Catalog — search and fetch journal and serial… | |
@ncbijs/nucleotide |
Typed client for the NCBI Nucleotide database. Fetches DNA/RNA seq… | |
@ncbijs/omim |
Typed client for NCBI OMIM (Online Mendelian Inheritance in Man) —… | |
@ncbijs/pipeline |
Composable streaming ETL primitive — Source → Parse → Sink. Zero d… | |
@ncbijs/pmc |
PMC full-text article retrieval over three NCBI surfaces (E-utilit… | |
@ncbijs/protein |
Typed client for the NCBI Protein database. Fetches sequences via… | |
@ncbijs/pubchem |
Typed client for the PubChem PUG REST and PUG View APIs — compound… | |
@ncbijs/pubmed |
High-level PubMed search and retrieval client. Fluent query builde… | |
@ncbijs/pubmed-xml |
Spec-compliant pure parser for PubMed/MEDLINE XML and MEDLINE plai… | |
@ncbijs/pubtator |
Client for the PubTator3 text-mining API — biomedical entity autoc… | |
@ncbijs/rate-limiter |
Zero-dependency token-bucket rate limiter and retry-aware fetch he… | |
@ncbijs/rxnorm |
Typed client for the NLM RxNav RxNorm REST API. Resolve drug names… | |
@ncbijs/snp |
NCBI dbSNP Variation Services API client (RefSNP reports, allele p… | |
@ncbijs/sra |
Typed client for NCBI SRA (Sequence Read Archive) — search and fet… | |
@ncbijs/store |
Storage interfaces (Storage / ReadableStorage / WritableStorage /… | |
@ncbijs/store-mcp |
Model Context Protocol server exposing locally stored NCBI data (D… | |
@ncbijs/structure |
Typed client for NCBI Structure (MMDB / PDB) — search and fetch ma… | |
@ncbijs/sync |
NCBI update detection and scheduled re-sync. Polls upstream source… | |
@ncbijs/xml |
Zero-dependency regex-based XML reader for NCBI formats — no HTTP,… |
ncbijs is a data access layer for biomedical RAG pipelines. It provides the structured inputs that RAG systems need at every stage — you bring the embeddings model, vector database, and LLM:
| RAG stage | What ncbijs provides | Packages |
|---|---|---|
| Ingest | Fetch full-text articles, chunk into passages, extract entities | pmc, jats, pubtator, mesh |
| Retrieve | Expand queries via MeSH hierarchy, enrich with gene/compound metadata | mesh, datasets, pubchem |
| Generate | LLM calls MCP tools to verify claims, look up data, format citations | http-mcp, cite, pmc, id-converter |
ncbijs does not do embeddings, vector search, or re-ranking — those are infrastructure concerns that belong in your vector database and LLM orchestrator.
See RAG Integration Guide for a full architecture walkthrough with code examples and diagrams.
ncbijs includes a composable pipeline system for processing bulk NCBI data. Wire any source, parser, and sink together with a single pipeline() call. The pipeline package is 100% browser-compatible — every export uses standard Web APIs (fetch, DecompressionStream).
import { pipeline, createHttpSource, createSink } from '@ncbijs/pipeline';
import { parseMeshDescriptorXml } from '@ncbijs/mesh';
// Download from NCBI HTTP → parse → write to any destination
await pipeline(
createHttpSource('https://nlmpubs.nlm.nih.gov/projects/mesh/MESH_FILES/xmlmesh/desc2026.xml'),
(xml) => parseMeshDescriptorXml(xml).descriptors,
createSink(async (records) => {
console.log(`Received ${records.length} MeSH descriptors`);
}),
);Or skip the wiring entirely with @ncbijs/etl — one function call to download, parse, and sink any dataset:
import { load, loadAll } from '@ncbijs/etl';
import { createSink } from '@ncbijs/pipeline';
// Load a single dataset
await load(
'mesh',
createSink(async (records) => {
console.log(`${records.length} MeSH descriptors`);
}),
);
// Load all 6 datasets into any sink
await loadAll((dataset) =>
createSink(async (records) => {
console.log(`${dataset}: ${records.length} records`);
}),
);The pipeline has three phases: load, sync, and query:
Phase 1: Initial Load Phase 2: Watch & Sync Phase 3: Query via MCP
NCBI FTP ──→ DuckDB Poll NCBI → re-load store-mcp ──→ Claude
(one-time bulk download) (long-running process) (zero rate limits)
import { load, loadAll } from '@ncbijs/etl';
import { DuckDbFileStorage } from '@ncbijs/store';
const storage = await DuckDbFileStorage.open('ncbi.duckdb');
// Load a single dataset
await load('clinvar', storage.createSink('clinvar'));
// Or load all 6 datasets at once
await loadAll((dataset) => storage.createSink(dataset));Once loaded, start a watcher to poll for upstream changes and re-load only what changed. createCheckers() picks the best detection strategy per dataset: MD5 checksums for ClinVar, Taxonomy, and PubChem; HTTP Last-Modified for all others.
import { createCheckers, load } from '@ncbijs/etl';
import { SyncScheduler, InMemorySyncState } from '@ncbijs/sync';
const scheduler = new SyncScheduler(new InMemorySyncState(), createCheckers(), {
checkIntervalMs: 3600_000,
datasets: ['clinvar', 'genes'],
onUpdate: async (dataset) => {
await load(dataset, storage.createSink(dataset));
},
});
await scheduler.start(); // checks immediately, then every hourOnce data is loaded, expose it to Claude (or any MCP-compatible agent) with @ncbijs/store-mcp:
{
"mcpServers": {
"ncbijs-store": {
"command": "npx",
"args": ["-y", "@ncbijs/store-mcp"],
"env": {
"NCBIJS_DB_PATH": "/absolute/path/to/ncbi.duckdb"
}
}
}
}Now your agent can query the local data directly:
- "Search for pathogenic BRCA1 variants in ClinVar"
- "Look up the MeSH descriptor for Alzheimer's disease"
- "What genes are on chromosome 17 in the local store?"
- "Convert PMID 33024307 to a DOI"
No network, no rate limits, no API keys. See @ncbijs/store-mcp for the full list of 13 query tools.
See examples/data-pipeline/ for complete scripts covering all three phases.
@ncbijs/pipeline— Composable Source/Sink primitives built onAsyncIterable. HTTP and composite sources, streaming, backpressure, abort signals. Browser + Node.js.@ncbijs/etl— Pre-wired loaders for 6 NCBI bulk datasets.load('mesh', mySink)is all you need. Also exportscreateCheckers()for sync.@ncbijs/store— Storage interfaces with a DuckDB reference implementation. Node.js only.@ncbijs/sync— Watches NCBI FTP for updates via MD5 checksums or HTTPLast-Modified. Pluggable checkers, configurable interval, abort signal.
See Data Pipeline Guide for the full API walkthrough, streaming parsers, error handling, and sync scheduling.
ncbijs ships two MCP servers that give AI agents direct access to NCBI data. Pick the one that fits your use case — or use both:
Live API (http-mcp) |
Local data (store-mcp) |
|
|---|---|---|
| Setup | Zero — just add the config | Load data first (Phases 1-2) |
| Network | Required (queries NCBI APIs in real time) | Offline after initial load |
| Rate limits | NCBI limits apply (3-10 req/s) | None |
| Data freshness | Always current | As fresh as last sync |
| Tools | 27 | 13 |
Query NCBI APIs in real time — PubMed, PMC full text, BLAST, ClinVar, PubChem, MeSH, and more. No data loading required.
{
"mcpServers": {
"ncbijs": {
"command": "npx",
"args": ["-y", "@ncbijs/http-mcp"],
"env": {
"NCBI_API_KEY": ""
}
}
}
}27 tools covering: PubMed search, PMC full text, PubTator entity recognition, gene/genome/taxonomy lookup, BLAST alignment, SNP/ClinVar variant queries, PubChem compounds, citation formatting, ID conversion, MeSH vocabulary, iCite metrics, RxNorm drug data, and LitVar variant-literature linking.
Example prompts:
- "Search PubMed for recent CRISPR gene therapy reviews"
- "Get the full text of PMC7886120 and summarize the methods"
- "What genes and diseases are mentioned in PMID 33024307?"
- "Run a BLAST search for the sequence ATCGATCGATCG"
See @ncbijs/http-mcp for details. Get a free API key at ncbi.nlm.nih.gov/account/settings.
Query your local DuckDB database — MeSH, ClinVar, genes, taxonomy, PubChem, and ID mappings. No network needed after loading.
Phase 1: load data ──→ Phase 2: sync ──→ Phase 3: query via store-mcp
(see Data pipelines) (optional) (this section)
{
"mcpServers": {
"ncbijs-store": {
"command": "npx",
"args": ["-y", "@ncbijs/store-mcp"],
"env": {
"NCBIJS_DB_PATH": "/absolute/path/to/ncbi.duckdb"
}
}
}
}13 tools available: store-lookup-mesh, store-search-mesh, store-lookup-variant, store-search-variants, store-lookup-gene, store-search-genes, store-lookup-taxonomy, store-search-taxonomy, store-lookup-compound, store-search-compounds, store-convert-ids, store-search-ids, store-stats.
Example prompts:
- "Search for pathogenic BRCA1 variants in the local ClinVar data"
- "What compounds have an InChI key starting with BSYNRYMUT?"
- "How many records are loaded in each dataset?"
See @ncbijs/store-mcp for details. See Data pipelines above to load the data.
40 of 43 packages work in both browsers and Node.js. Only 3 infrastructure packages require Node.js:
| Runtime | Packages | Why |
|---|---|---|
| Browser + Node.js | All HTTP clients, parsers, rate-limiter, xml, fasta, genbank, pipeline, etl, sync (40 packages) | Only uses fetch, DecompressionStream, and pure computation |
| Node.js only | @ncbijs/store |
Requires @duckdb/node-api (native binding) |
| Node.js only | @ncbijs/store-mcp, @ncbijs/http-mcp |
MCP server CLIs (stdio transport) |
Use ncbijs directly in frontend apps — search PubMed, look up genes, query MeSH, and more with zero server-side code:
import { PubMed } from '@ncbijs/pubmed';
import { Datasets } from '@ncbijs/datasets';
const pubmed = new PubMed();
const articles = await pubmed.search({ term: 'CRISPR therapy', retmax: 10 });
const datasets = new Datasets();
const gene = await datasets.geneBySymbol('BRCA1');npm install @ncbijs/pubmedimport { PubMed } from '@ncbijs/pubmed';
const pubmed = new PubMed({
tool: 'my-research-app',
email: 'you@university.edu',
});
const articles = await pubmed
.search('CRISPR gene therapy')
.dateRange('2023/01/01', '2024/12/31')
.freeFullText()
.limit(10)
.fetchAll();
for (const article of articles) {
console.log(`${article.pmid}: ${article.title}`);
}I want to...
│
├── Search biomedical literature
│ ├── High-level PubMed search ────────────────→ @ncbijs/pubmed
│ ├── Low-level Entrez queries ────────────────→ @ncbijs/eutils
│ └── Find literature by genetic variant ──────→ @ncbijs/litvar
│
├── Retrieve full-text articles
│ ├── PMC open-access articles ────────────────→ @ncbijs/pmc
│ └── Annotated text with NER ─────────────────→ @ncbijs/bioc
│
├── Extract entities from text
│ ├── Genes, diseases, chemicals ──────────────→ @ncbijs/pubtator
│ └── Annotated passages (BioC format) ────────→ @ncbijs/bioc
│
├── Work with citations
│ ├── Format citations (RIS, CSL, etc.) ───────→ @ncbijs/cite
│ ├── Convert PMID/PMCID/DOI ──────────────────→ @ncbijs/id-converter
│ └── Citation impact metrics (RCR) ───────────→ @ncbijs/icite
│
├── Work with genes and sequences
│ ├── Gene/genome metadata ────────────────────→ @ncbijs/datasets
│ ├── Protein sequences ───────────────────────→ @ncbijs/protein
│ ├── Nucleotide sequences ────────────────────→ @ncbijs/nucleotide
│ ├── Sequence alignment (BLAST) ──────────────→ @ncbijs/blast
│ ├── Parse FASTA format ──────────────────────→ @ncbijs/fasta
│ └── Parse GenBank format ────────────────────→ @ncbijs/genbank
│
├── Work with variants and clinical data
│ ├── SNP/variant lookup (dbSNP) ──────────────→ @ncbijs/snp
│ ├── HGVS/SPDI/VCF conversion ────────────────→ @ncbijs/snp
│ ├── Clinical significance (ClinVar) ─────────→ @ncbijs/clinvar
│ ├── Genetic disorders (OMIM) ────────────────→ @ncbijs/omim
│ └── Medical genetics (MedGen) ───────────────→ @ncbijs/medgen
│
├── Work with drugs and chemicals
│ ├── Compound properties ─────────────────────→ @ncbijs/pubchem
│ ├── Compound annotations (GHS, etc.) ────────→ @ncbijs/pubchem
│ ├── Drug normalization (RxCUI) ──────────────→ @ncbijs/rxnorm
│ ├── Drug classes (ATC, VA, MEDRT) ───────────→ @ncbijs/rxnorm
│ ├── NDC code lookup ─────────────────────────→ @ncbijs/rxnorm
│ └── Drug labels and SPLs ────────────────────→ @ncbijs/dailymed
│
├── Autocomplete medical codes
│ ├── ICD-10, LOINC, SNOMED ───────────────────→ @ncbijs/clinical-tables
│ └── RxTerms drug names ──────────────────────→ @ncbijs/clinical-tables
│
├── Search clinical trials ──────────────────→ @ncbijs/clinical-trials
│
├── Work with vocabularies
│ └── MeSH term expansion ─────────────────────→ @ncbijs/mesh
│
├── Search other NCBI databases
│ ├── Gene expression (GEO) ───────────────────→ @ncbijs/geo
│ ├── Structural variants (dbVar) ─────────────→ @ncbijs/dbvar
│ ├── Sequencing data (SRA) ───────────────────→ @ncbijs/sra
│ ├── 3D structures (MMDB/PDB) ────────────────→ @ncbijs/structure
│ ├── Protein domains (CDD) ───────────────────→ @ncbijs/cdd
│ ├── Genetic tests (GTR) ─────────────────────→ @ncbijs/gtr
│ ├── Books/textbooks ─────────────────────────→ @ncbijs/books
│ └── Journal records (NLM Catalog) ───────────→ @ncbijs/nlm-catalog
│
├── Store NCBI data locally ─────────────────→ @ncbijs/store
│
├── Query stored data with same API ─────────→ fromStorage() on domain packages
│
├── Data pipeline (Source → Parse → Sink) ───→ @ncbijs/pipeline
│
├── Load any NCBI dataset in one call ───────→ @ncbijs/etl
│
├── Watch NCBI sources for updates ──────────→ @ncbijs/sync
│
├── Expose tools to LLM agents (live API) ───→ @ncbijs/http-mcp
│
└── Query local data via MCP ────────────────→ @ncbijs/store-mcp
| Capability | Packages |
|---|---|
| Supports API key | eutils, pubmed, pmc, clinvar, snp, datasets, omim, medgen, gtr, geo, dbvar, sra, structure, cdd, books, nlm-catalog, protein, nucleotide (optional, for higher rate limits) |
| No API key needed | All others (non-NCBI APIs) |
| Rate-limited | eutils, datasets, blast, snp, clinvar, pubchem, clinical-trials, icite, rxnorm, dailymed, + all that depend on rate-limiter |
| Zero dependencies | pipeline, sync, cite, id-converter, mesh, fasta, genbank, litvar, bioc, clinical-tables |
| Async iterators | eutils (efetchBatches, searchAndFetch, searchAndSummarize), pubmed (batch), clinical-trials (searchStudies), cite (citeMany), pipeline (Source, streamParser) |
| XML parsing | eutils, pubmed-xml, jats, pubtator, xml |
| Bulk parsers | mesh, cite, id-converter, clinvar, datasets, pubchem, snp, icite, clinical-trials, litvar, medgen, cdd, pmc |
| Data pipelines | pipeline (Source → Parse → Sink), store (DuckDbSink), sync (update detection) |
pnpm install
pnpm build # Build all packages
pnpm test # Run all tests
pnpm lint # Lint all packages
pnpm typecheck # Type-check all packagespnpm nx run @ncbijs/pubmed:build
pnpm nx run @ncbijs/pubmed:testE2E tests hit real NCBI APIs and require an API key:
cp .env.example .env
# Add your NCBI API key to .env
pnpm nx run ncbijs-e2e:e2eGet an API key at ncbi.nlm.nih.gov/account/settings.