Editorial standards

How Virisource works

Virisource is a self-expanding longevity knowledge graph. This page describes the pipeline that builds it, the scoring model that weights evidence, and the editorial policies that govern what appears publicly.

Methodology last updated: 2026-04-21

What it is

A knowledge graph, not a curated list

The Virisource catalog is not written by hand. It is assembled by an automated pipeline that ingests primary sources — peer-reviewed papers, podcast transcripts, and researcher publications — extracts structured claims, scores them by evidence quality, and promotes entities to public visibility only when sufficient evidence has accumulated.

The pipeline runs continuously. New papers published today can produce new compound or researcher entries tomorrow. Contradictions are detected automatically and surfaced to users. No human operator needs to be watching — the system does the discovery. This design is deliberate: it ensures the directory stays current without manual effort, and it means the catalog can scale beyond what any editorial team could maintain.

The pipeline

10 pipeline stages, ~20 scheduled crons

Each stage has a dedicated cron job (or several). They run in dependency order: ingestion before extraction, extraction before weighting, weighting before promotion.

Source ingestion

Raw content arrives from three source types: PubMed API (biomedical literature), RSS feeds (journals, preprint servers, longevity blogs), and podcast transcript feeds (Huberman Lab, FoundMyFitness, Lifespan, The Drive, and others). Each item is hashed to prevent re-processing.

ingest-pubmedingest-rssingest-podcasts

Pre-filtering

Items pass a keyword pre-filter before expensive extraction. A compound or food term must appear in the title, abstract, or transcript chunk for the item to proceed. This keeps the extraction step focused on genuinely relevant content.

pre-filter

Claim extraction

A language model reads each item and extracts structured claims: what compound or entity is discussed, the direction (supporting / contradicting / neutral), evidence type (human RCT, animal study, in vitro, observational, researcher opinion), and a verbatim excerpt. Claims are linked back to the source item for provenance.

extract-claims

Claim weighting

Each claim receives a numeric weight. The formula is: source_authority × evidence_type_weight × temporal_decay. A peer-reviewed human RCT from Nature Medicine carries more weight than a podcast comment. Claims older than 36 months decay linearly. Contradicting claims apply negative weight to the entity's composite score.

weight-claims

Entity linking

Claims are linked to catalog entities (compounds, foods, researchers) by slug and synonym matching. A claim mentioning "NR" is linked to the Nicotinamide Riboside compound record. Unresolved entities enter a candidate queue for human review before being added to the catalog.

link-entities

Promotion gating

Entities advance through tiers as evidence accumulates. The promotion gate runs after each extraction batch. Tier thresholds are intentionally conservative to prevent thin or speculative content from reaching Tier 2+ (publicly visible).

publish-gate

Content generation

When an entity reaches Tier 2, a content generation cron synthesizes a definition, mechanism description, evidence summary, FAQ set, and open research questions from the accumulated claims. This content is stored in auto_content_json and rendered on entity pages.

content-regenfaq-generator

Enrichment

Researcher records are enriched with h-index, citation count, institutional affiliation, and biography from OpenAlex (CC0). Compound records are enriched with CAS numbers, molecular structure references, and synonym lists from public chemistry databases.

enrich-researchersenrich-compounds

Contradiction detection

When a new contradicting claim arrives for an entity that already has strong supporting evidence, a contradiction flag is set. The entity page displays a review banner. Contradicted entities are excluded from "trending" and "featured" surfaces until the contradiction is resolved.

detect-contradictions

Publication & indexing

New Tier 2+ entities surface immediately in the sitemap (revalidated every 5 minutes via ISR) and in the /llms.txt index (regenerated hourly). Search engines and LLM crawlers discover them through these surfaces without requiring a redeploy.

sitemap-refreshllms-txt-refresh

Evidence scoring

Claim weight formula

claim_weight = source_authority × evidence_type_weight × temporal_decay(age_months)

Each extracted claim receives a numeric weight between 0 and 1. Weights are summed across all claims for an entity to produce a composite confidence score. Contradicting claims apply their weight as a negative. Temporal decay is linear: claims older than 36 months lose 50% of their weight; older than 60 months, 80%.

Source authority weights by evidence type

Human RCT (peer-reviewed)

Nature Medicine, NEJM, Cell

1.0

Human observational study (peer-reviewed)

JAMA, PLOS ONE

0.7

Animal / model organism study

Cell Metabolism, Aging Cell

0.5

In vitro / cell culture

Biochemistry journals

0.3

Credible researcher synthesis

Sinclair, Attia, Patrick podcasts

0.6

Preprint (unreviewed)

bioRxiv, medRxiv

0.25

Expert commentary / editorial

Journal editorials

0.2

Promotion gates

Tier 0 through Tier 4

Entities are not manually approved for publication. They advance automatically when their composite evidence score crosses a tier threshold. Tiers 0 and 1 are never shown to users — they exist to prevent thin or speculative entries from entering the public catalog.

Tier 0 — Candidatenot public

Entity has been extracted from at least one source but has fewer than 3 weighted claims and no human verification. Not publicly visible.

Tier 1 — Provisionalnot public

Entity has 3+ weighted claims from at least 2 distinct sources. Still not publicly visible. Eligible for content generation.

Tier 2 — Published

Entity has 6+ weighted claims, appears in at least one peer-reviewed source, and has auto-generated content. Publicly visible and indexed.

Tier 3 — Well-evidenced

Entity has 12+ weighted claims, multiple peer-reviewed sources, and human-reviewed content. Eligible for featured placement.

Tier 4 — Core

Entity has 25+ weighted claims across diverse source types, replicated human trials, and manually curated content. The highest confidence tier.

Interpreting the data

Confidence scores and contradiction flags

Confidence: high / medium / low

A compound or food tagged high confidence has a composite evidence score above the Tier 3 threshold, with multiple peer-reviewed sources and no active contradiction flags. Medium indicates Tier 2 evidence with fewer or lower-weight sources. Low indicates Tier 2 with predominantly non-RCT sources, or sources older than 5 years without recent replication.

Contradiction flags

When a new claim contradicts an entity's prevailing evidence direction, a contradiction flag is set and a banner appears on the entity page. The entity is excluded from featured and trending surfaces until the flag is resolved (either by new supporting evidence or by manual review). Contradiction flags are not permanent — they reflect the current state of evidence.

Evidence section on entity pages

Each compound and researcher page shows the raw claims extracted from source material — sorted by claim weight, labeled by direction and evidence type, with links to the source item. This is not a curated citation list; it is the raw output of the extraction pipeline. Users can evaluate the quality of each claim directly.

Primary sources

Where the data comes from

Biomedical literature

PubMed / NCBI
bioRxiv (preprints)
medRxiv (preprints)
PubMed retractions feed

Researcher output

Huberman Lab podcast
FoundMyFitness (Patrick)
Lifespan Podcast (Sinclair)
The Drive (Attia)
Longevity Podcast (Longo)

Database enrichment

OpenAlex (researcher metrics)
USDA FoodData Central
Phenol-Explorer
FooDB

Grower registry

USDA Farmers Market Directory
USDA Organic Integrity DB
State extension programs

Editorial policy

What we will and won't do

We do not manually seed data

If a compound or researcher is not in the catalog, it is because the pipeline has not yet ingested sufficient evidence — not because someone forgot to add it. We do not manually insert entries to shortcut the evidence process.

We do not suppress contradictions

When evidence contradicts a prevailing claim, the contradiction is surfaced publicly. We do not hide conflicting data to make an entity look more favorable.

We distinguish AI-generated content

Content generated by the AI synthesis step (definitions, mechanisms, FAQs) is labeled "Generated by AI from pipeline evidence." It is derived from extracted claims — not invented — but it is not the same as a human-written review.

We are not affiliated with tracked researchers

Virisource tracks Sinclair, Attia, Patrick, Huberman, Longo, and others as primary sources. We have no financial relationship, endorsement, or affiliation with any of them. Their positions are tracked as data, not as recommendations.

Corrections

If you identify a factual error — a wrong affiliation, an incorrectly attributed claim, a PubMed ID linking to the wrong paper — contact us at corrections@virisource.com. We investigate and correct within 14 days.

This is not medical advice

Nothing on Virisource constitutes medical advice. Longevity research is an active and contested field. What appears here reflects the current state of published evidence, not clinical recommendations. Consult a licensed physician before making dietary or supplementation decisions.