Skip to content
FonteumThe Graph

By use case

Exclusion & monitoring (self-serve)Free roster screen — no accountExclusion & sanctions screeningCredentialing & provider-data enrichmentAudit evidence & defensible programsProvider data for AI / RAGM&A & network diligence

By buyer

Compliance & riskDevelopers & AI teams

By industry

HealthcareProviders & facilitiesFederal contractingSAM · USASpending · FAPIIS

The capability layer

APIREST + bulk accessMCP serverCallable by AI agentsFHIR R4 APIBulk exportAttestation & audit packReconciliationSource-vs-source diffsEntity graphSnapshotsPoint-in-time, bitemporal

The differentiator

Coverage & sourcesThe catalogFreshnessMethodologyCare CompareFacility qualityBrowse all datasets →
Research

The dev on-ramp

DocsAPI referenceMCP — connect your agentOne-paste installFHIR sandboxLive API surfaceQuickstartStatusChangelogSDKs & integrations
Pricing
Sign inFree roster screen →Get a signed certificate →

Solutions

Exclusion & monitoring (self-serve)Exclusion & sanctions screeningCredentialing & provider-data enrichmentAudit evidence & defensible programsProvider data for AI / RAGM&A & network diligenceCompliance & riskDevelopers & AI teamsHealthcareFederal contracting

Platform

APIMCP serverFHIR R4 APIBulk exportAttestation & audit packReconciliationEntity graphSnapshots

Data

Coverage & sourcesFreshnessMethodologyCare CompareBrowse all datasets →
Research

Developers

DocsAPI referenceMCP — connect your agentFHIR sandboxQuickstartStatusChangelogSDKs & integrations
Pricing
Sign inFree roster screen →Get a signed certificate →
Bulk dataset downloads · Reference

Anonymous, auditable, gz-compressed. One URL per source per snapshot.

Fonteum mirrors registered source snapshots to S3 with 90-day rolling retention. The bulk surface gives buyer dev teams, academic researchers, and AI agents one canonical URL per source family + per snapshot date — full 14-tuple provenance in response headers, sidecar manifest at manifest.json, and a cross-link to /verify/[snapshot-id] for SHA-256 hash-match against Fonteum’s integrity attestation.

Top-level manifest → Data catalog → Citation format →

1. Endpoints

Per-source + top-level. All anonymous.

Three endpoints per source family, plus one top-level discovery index. All anonymous (no Authorization header required), all returning the canonical X-Fonteum-* response headers.

  • GET /api/v1/bulk/<source>/latest.csv.gz — 302 redirect to the most recent S3-cached snapshot. Cache-Control: max-age=300 (rolls daily as new snapshots ingest).
  • GET /api/v1/bulk/<source>/<YYYY-MM-DD>.csv.gz — 302 redirect to a specific dated snapshot. Immutable per snapshot — Cache-Control: max-age=86400, immutable. Pin a date when you need reproducibility (cite specific snapshot in a paper, replay an analysis).
  • GET /api/v1/bulk/<source>/manifest.json — sidecar JSON listing every cached snapshot for the source with sha256 + size + verify_url + cached_at + retention_expires per row.
  • GET /api/v1/bulk/manifest.json — top-level index across all registered source families. One URL for DataCite harvesters, dbt source registration, evaluation scripts.
2. Source families

25 registered source families.

One row per source family registered in the cron-sources registry. Each maps to a source_id value usable in the URL paths above.

  • cms-pecos — CMS PECOS PPEF (Provider Enrollment, Chain & Ownership) · Weekly (Sunday) · license: US-Government-Works
  • oig-leie — OIG LEIE (List of Excluded Individuals/Entities) · Monthly (1st of month) · license: US-Government-Works
  • hrsa-hpsa — HRSA HPSA (Health Professional Shortage Areas) · Quarterly (1st of Jan / Apr / Jul / Oct) · license: US-Government-Works
  • bls-oews — BLS OEWS (Occupational Employment & Wage Statistics) · Annual (mid-May) · license: US-Government-Works
  • bea-regional — BEA Regional Economic Accounts (state GDP) · Annual (mid-October) · license: US-Government-Works
  • cms-nppes — CMS NPPES NPI Registry · Quarterly (per specialty, operator-triggered) · license: US-Government-Works
  • cms-care-compare — CMS Care Compare (per facility type) · Quarterly (per facility type, operator-triggered) · license: US-Government-Works
  • cms-open-payments — CMS Open Payments (Sunshine Act — General Payments) · Annual (operator-triggered ~July, post-publication) · license: US-Government-Works
  • cms-hcris-hospital-2552-10 — CMS HCRIS Hospital Cost Reports (form CMS-2552-10) · Annual (operator-triggered ~November) · license: US-Government-Works
  • cms-qpp-mips — CMS QPP MIPS Individual + Group Scores · Annual (operator-triggered ~July, post-performance-year scoring) · license: US-Government-Works
  • cms-provider-utilization — CMS Medicare Provider Utilization & Payment Data (Physician & Other Practitioners by Provider and Service) · Annual (operator-triggered, post mid-June release) · license: US-Government-Works
  • cms-inpatient-utilization — CMS Medicare Inpatient Hospitals by Provider and Service · Annual (operator-triggered, post mid-June release) · license: US-Government-Works
  • cms-outpatient-utilization — CMS Medicare Outpatient Hospitals by Provider and Service · Annual (operator-triggered, post mid-June release) · license: US-Government-Works
  • hrsa-uds — HRSA Uniform Data System (UDS) · Annual (May, post-grant-year reporting) · license: US-Government-Works
  • cms-pos — CMS Provider of Services (POS) — iQIES Facility Registry · Quarterly (operator-triggered; CMS publishes Q1–Q4) · license: US-Government-Works
  • cms-ncci-ptp-edits — CMS NCCI Procedure-to-Procedure (PTP) Edits · Quarterly (operator-triggered; CMS posts Q1–Q4) · license: US-Government-Works (edit data); CPT codes © AMA
  • cms-ncci-mue-edits — CMS NCCI Medically Unlikely Edits (MUE) · Quarterly (operator-triggered; CMS posts Q1–Q4) · license: US-Government-Works (edit data); CPT codes © AMA
  • sec-edgar — SEC EDGAR (public-company entities + filing index) · Daily (re-check; SEC EDGAR posts filings continuously intraday) · license: US-Government-Works
  • ofac-sdn — OFAC SDN + Consolidated (Sanctions List Service) · Daily (OFAC publishes the SLS files daily; intraday on enforcement actions) · license: US-Government-Works
  • eu-sanctions — EU Consolidated Financial Sanctions (FSF) · Daily (the EU regenerates the FSF export on each restrictive-measure change) · license: CC-BY-4.0
  • sec-iapd — SEC IAPD — Investment Advisers (Form ADV) · Daily (the IAPD compilation feeds regenerate on each business day) · license: US-Government-Works
  • un-sanctions — UN Security Council Consolidated Sanctions List · Daily (the UN regenerates the consolidated export on each committee listing change) · license: LicenseRef-UN-SC-Consolidated-List
  • uk-sanctions — UK Sanctions List (OFSI Consolidated List) · Daily (OFSI republishes the consolidated list on each designation change) · license: OGL-UK-3.0
  • fec — FEC Campaign Finance (committees, candidates, individual contributions) · Weekly (the FEC refreshes the bulk-download files weekly) · license: US-Government-Works
  • faa-registries — FAA Airmen + Aircraft Registries · Daily re-check (aircraft daily; airmen monthly) · license: US-Government-Works
3. Format

Phase 1 ships gzipped CSV. Parquet + JSON Lines queued.

Each archive is the upstream source CSV exactly as captured at ingestion time, gzip-compressed (application/gzip). Header rows + column ordering match the upstream source — Fonteum does not normalize, dedupe, or transform.

Phase 2 format alternatives queued separately: Parquet (§sprint3-bulk-export-parquet), JSON Lines (§sprint3-bulk-export-jsonl), FHIR Bundle (§sprint3-bulk-export-fhir-bundle), partitioned per-state / per-vertical (§sprint3-bulk-export-partitioned).

4. Response headers

14-tuple provenance + hash-match cross-link.

Every 302 redirect carries:

  • X-Fonteum-Source — source_id of the dataset
  • X-Fonteum-Snapshot-Date — ISO-8601 date of the snapshot
  • X-Fonteum-SHA256 — 64-char lowercase hex; matches snapshot_attestations.content_hash
  • X-Fonteum-License — SPDX identifier (e.g. US-Government-Works, CC-BY-4.0)
  • X-Fonteum-Cite — citation format URL (/cite)
  • X-Fonteum-Verify — hash-match endpoint (/verify/<snapshot-id>)
  • Link — sidecar manifest URL with rel="describedby"
5. Hash-match flow

Hash-match against the snapshot attestation.

The bulk surface is content-addressable: every snapshot has one SHA-256, that hash is attested in snapshot_attestations at ingestion time, and both the response header and the /verify/[snapshot-id] endpoint return the same value. Defense in depth — one consumer, three independent hash-match paths.

# 1. Read the manifest to find the latest snapshot date + SHA-256
curl -s https://fonteum.com/api/v1/bulk/cms-pecos/manifest.json \
  | jq '.snapshots[0] | {snapshot_date, sha256, cache_url}'

# 2. Download the gzipped CSV (302-redirect resolves to S3)
curl -L -o pecos-latest.csv.gz \
  https://fonteum.com/api/v1/bulk/cms-pecos/latest.csv.gz

# 3. Recompute the SHA-256 locally and compare to the header value
EXPECTED=$(curl -sI https://fonteum.com/api/v1/bulk/cms-pecos/latest.csv.gz \
  | awk -F': ' 'tolower($1)=="x-fonteum-sha256" {print tolower($2)}' \
  | tr -d '\r')
ACTUAL=$(shasum -a 256 pecos-latest.csv.gz | awk '{print $1}')
[ "$EXPECTED" = "$ACTUAL" ] && echo "ok" || echo "MISMATCH"

# 4. Cross-check against the /verify endpoint (defense in depth)
SNAPSHOT_ID=$(curl -sI https://fonteum.com/api/v1/bulk/cms-pecos/latest.csv.gz \
  | awk -F'/' 'tolower($1)~/x-fonteum-verify/ {print $NF}' | tr -d '\r')
curl -s -H 'Accept: text/plain' https://fonteum.com/verify/$SNAPSHOT_ID
# returns the 64-char hex hash; should equal $EXPECTED + $ACTUAL
6. Python (3.10+ stdlib)

urllib + gzip + hashlib.

# Python 3.10+ (stdlib only — urllib + gzip + hashlib)
import gzip
import hashlib
import urllib.request

URL = "https://fonteum.com/api/v1/bulk/cms-pecos/latest.csv.gz"

# 302 redirect resolves automatically; capture headers from the response
req = urllib.request.Request(URL)
with urllib.request.urlopen(req) as resp:
    raw = resp.read()
    expected_sha = resp.headers.get("X-Fonteum-SHA256", "").lower()
    snapshot_date = resp.headers.get("X-Fonteum-Snapshot-Date", "")

# Verify the hash matches what Fonteum signed
actual_sha = hashlib.sha256(raw).hexdigest()
assert actual_sha == expected_sha, f"SHA mismatch: {actual_sha} != {expected_sha}"

# Decompress + iterate
import io, csv
with gzip.GzipFile(fileobj=io.BytesIO(raw), mode="rb") as gz:
    reader = csv.DictReader(io.TextIOWrapper(gz, encoding="utf-8"))
    for row in reader:
        # ... your analysis here ...
        pass

print(f"Hash-matched snapshot {snapshot_date} ({len(raw):,} bytes gzipped)")
7. R (4.0+)

readr + digest + httr.

# R 4.0+ (readr + digest + httr)
library(readr)
library(digest)
library(httr)

url <- "https://fonteum.com/api/v1/bulk/cms-pecos/latest.csv.gz"

# httr::GET follows 302 by default + exposes headers
res <- GET(url)
stop_for_status(res)
expected_sha <- tolower(headers(res)[["x-fonteum-sha256"]])
raw <- content(res, "raw")

# Recompute SHA-256 over the gzipped bytes (same as the header)
actual_sha <- digest(raw, algo = "sha256", serialize = FALSE)
stopifnot(expected_sha == actual_sha)

# readr can read gzipped CSV directly from a connection
df <- read_csv(rawConnection(raw))
message(sprintf("Hash-matched snapshot — %d rows, %d cols", nrow(df), ncol(df)))
8. Node.js (18+)

stdlib fetch + crypto + zlib.

// Node 18+ (built-in fetch + crypto + zlib)
import { createHash } from "node:crypto";
import { gunzipSync } from "node:zlib";

const URL = "https://fonteum.com/api/v1/bulk/cms-pecos/latest.csv.gz";

const res = await fetch(URL, { redirect: "follow" });
if (!res.ok) throw new Error(`HTTP ${res.status}`);
const expected = res.headers.get("x-fonteum-sha256")?.toLowerCase();
const buf = Buffer.from(await res.arrayBuffer());

const actual = createHash("sha256").update(buf).digest("hex");
if (actual !== expected) throw new Error(`sha mismatch ${actual} != ${expected}`);

const csv = gunzipSync(buf).toString("utf-8");
console.log(`Hash-matched ${csv.split("\n").length} rows`);
9. License + citation

Per-source SPDX. Cite Fonteum when used in publications.

Federal sources (CMS, OIG, HRSA, BLS, BEA) are US-Government-Works — public domain, redistribution allowed. Fonteum-derived datasets carry CC-BY-4.0 requiring attribution. The X-Fonteum-License header surfaces the SPDX value on every response.

For papers, theses, dashboards, or commercial products: cite Fonteum per /cite (APA + AMA + BibTeX). Pin the snapshot_date when reproducibility matters; a dated URL is immutable.

Phase roadmap

Phase 1 ships gzipped CSV. Phase 2 — formats + partitions.

  • Phase 1: registered source families × (latest + dated + manifest) endpoints + top-level manifest + /data distribution surfacing.
  • §sprint3-bulk-export-parquet (queued): Parquet format alternative — same URLs with .parquet suffix.
  • §sprint3-bulk-export-jsonl (queued): JSON Lines format — .jsonl.gz.
  • §sprint3-bulk-export-partitioned (queued): per-state + per-vertical splits.
  • §sprint3-bulk-export-fhir-bundle (queued): FHIR Bundle format for FHIR-aligned consumers.
  • §sprint3-datacite-bulk-listing (queued): DataCite metadata harvester registration so federal catalogs auto-discover the bulk surface.

Built on the authoritative federal record

The primary sources, named on every page.

These are the federal agencies whose public datasets Fonteum ingests and attributes — the issuing authorities, not customers or partners. Every figure on the site links back to one of them.

  • CMS
  • HHS-OIG
  • HRSA
  • FDA
  • NLM
  • NUCC
  • Census
  • BLS
  • BEA

See the full source registry, with license and refresh cadence for each →

Reproducible by design

Every figure traces to its federal source.

14-tuple provenance

Every rendered fact ties to a source URL, dataset ID, snapshot date, row key, and SHA-256 — the full chain-of-custody record.

Reproducible SQL

Each study ships the exact query behind its figures, run against the cited federal snapshot. Re-run it yourself.

Daily count checks

Published counts are checked against the upstream federal datasets on a daily cadence, with drift logged.

Named medical review

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

Read the full provenance and attestation methodology →

Two doors

Use the free API and open data

Query providers, facilities, sanctions, and quality scores — each field carrying its federal source. Self-serve, no call to start.

Explore the API →Browse the data catalog →

Talk to us

Managed pilots, enterprise terms, and audit-ready, signed attestation packages for compliance, risk, and research teams.

Talk to us →
Fonteum
Platform
Platform overviewAPIMCP serverFHIR R4 APIBulk exportAttestation & audit packReconciliationEntity graphSnapshots
Solutions
All solutionsExclusion & sanctions screeningCredentialing & enrichmentAudit evidenceProvider data for AI / RAGM&A & network diligenceCompliance & riskDevelopers & AI teams
Data & sources
Coverage & sourcesBrowse all datasetsState Medicaid exclusionsFreshnessMethodologyCare CompareSanctionsOwnershipStaffingDeficienciesSpecial Focus Facilities
Federal contracting
OverviewAwards during active exclusionFederal debarment scorecardProcurement questionsContractor lookup8(a) certification guide
Developers
Developer hubDocsAPI referenceQuickstartStatusChangelogSDKs & integrationsWebhooks
Research & guides
Research hubGuidesHealthcare provider dataExclusion & sanctions screeningProvider credentialing dataHealthcare data for AIHospital margin gapProvider access gapsGlossaryComparisonsCitationsWhy Fonteum
Company
AboutPressCustomersPricingContactEditorial policyCorrections
Trust & legal
TrustTrust markQualitySecurityPrivacy policyTerms of serviceAPI & MCP termsMedical disclaimer

Reviewed by Jennifer Montecillo, MD, medical reviewer. Non-practicing medical reviewer.

© 2026 Fonteum LLC. All rights reserved.

·hello@fonteum.com

The U.S. healthcare graph AI can cite — every fact carries its source.

Every fact Fonteum serves carries a signed, re-checkable trust mark — source, as-of date, and an Ed25519 signature travel with the data. Re-check any fact at fonteum.com/verify · the trust-mark standard (W3C Verifiable Credentials 2.0, C2PA-aligned).
Request access→

The substrate, by the numbers

9.2Mgraph entitiesProviders, organizations, owners, and facilities
15.7Mlinked identifiersNPIs, CCNs, LEIs and more, resolved to entities
5Mgraph edgesSource-attested relationships between entities
44federal source familiesDistinct CMS, OIG, HRSA, FDA and peer datasets
35dataset pagesCitable, downloadable /data catalog pages
13reproducible studiesEach shipping the SQL behind its figures