What it is
A knowledge graph, not a curated list
The Virisource catalog is not written by hand. It is assembled by an automated pipeline that ingests primary sources — peer-reviewed papers, podcast transcripts, and researcher publications — extracts structured claims, scores them by evidence quality, and promotes entities to public visibility only when sufficient evidence has accumulated.
The pipeline runs continuously. New papers published today can produce new compound or researcher entries tomorrow. Contradictions are detected automatically and surfaced to users. No human operator needs to be watching — the system does the discovery. This design is deliberate: it ensures the directory stays current without manual effort, and it means the catalog can scale beyond what any editorial team could maintain.
The pipeline
10 pipeline stages, ~20 scheduled crons
Each stage has a dedicated cron job (or several). They run in dependency order: ingestion before extraction, extraction before weighting, weighting before promotion.
Source ingestion
Raw content arrives from three source types: PubMed API (biomedical literature), RSS feeds (journals, preprint servers, longevity blogs), and podcast transcript feeds (Huberman Lab, FoundMyFitness, Lifespan, The Drive, and others). Each item is hashed to prevent re-processing.
Pre-filtering
Items pass a keyword pre-filter before expensive extraction. A compound or food term must appear in the title, abstract, or transcript chunk for the item to proceed. This keeps the extraction step focused on genuinely relevant content.
Claim extraction
A language model reads each item and extracts structured claims: what compound or entity is discussed, the direction (supporting / contradicting / neutral), evidence type (human RCT, animal study, in vitro, observational, researcher opinion), and a verbatim excerpt. Claims are linked back to the source item for provenance.
Claim weighting
Each claim receives a numeric weight. The formula is: source_authority × evidence_type_weight × temporal_decay. A peer-reviewed human RCT from Nature Medicine carries more weight than a podcast comment. Claims older than 36 months decay linearly. Contradicting claims apply negative weight to the entity's composite score.
Entity linking
Claims are linked to catalog entities (compounds, foods, researchers) by slug and synonym matching. A claim mentioning "NR" is linked to the Nicotinamide Riboside compound record. Unresolved entities enter a candidate queue for human review before being added to the catalog.
Promotion gating
Entities advance through tiers as evidence accumulates. The promotion gate runs after each extraction batch. Tier thresholds are intentionally conservative to prevent thin or speculative content from reaching Tier 2+ (publicly visible).
Content generation
When an entity reaches Tier 2, a content generation cron synthesizes a definition, mechanism description, evidence summary, FAQ set, and open research questions from the accumulated claims. This content is stored in auto_content_json and rendered on entity pages.
Enrichment
Researcher records are enriched with h-index, citation count, institutional affiliation, and biography from OpenAlex (CC0). Compound records are enriched with CAS numbers, molecular structure references, and synonym lists from public chemistry databases.
Contradiction detection
When a new contradicting claim arrives for an entity that already has strong supporting evidence, a contradiction flag is set. The entity page displays a review banner. Contradicted entities are excluded from "trending" and "featured" surfaces until the contradiction is resolved.
Publication & indexing
New Tier 2+ entities surface immediately in the sitemap (revalidated every 5 minutes via ISR) and in the /llms.txt index (regenerated hourly). Search engines and LLM crawlers discover them through these surfaces without requiring a redeploy.
Evidence scoring
Claim weight formula
Each extracted claim receives a numeric weight between 0 and 1. Weights are summed across all claims for an entity to produce a composite confidence score. Contradicting claims apply their weight as a negative. Temporal decay is linear: claims older than 36 months lose 50% of their weight; older than 60 months, 80%.
Source authority weights by evidence type
Interpreting the data
Confidence scores and contradiction flags
Confidence: high / medium / low
A compound or food tagged high confidence has a composite evidence score above the Tier 3 threshold, with multiple peer-reviewed sources and no active contradiction flags. Medium indicates Tier 2 evidence with fewer or lower-weight sources. Low indicates Tier 2 with predominantly non-RCT sources, or sources older than 5 years without recent replication.
Contradiction flags
When a new claim contradicts an entity's prevailing evidence direction, a contradiction flag is set and a banner appears on the entity page. The entity is excluded from featured and trending surfaces until the flag is resolved (either by new supporting evidence or by manual review). Contradiction flags are not permanent — they reflect the current state of evidence.
Evidence section on entity pages
Each compound and researcher page shows the raw claims extracted from source material — sorted by claim weight, labeled by direction and evidence type, with links to the source item. This is not a curated citation list; it is the raw output of the extraction pipeline. Users can evaluate the quality of each claim directly.
Primary sources
Where the data comes from
Biomedical literature
- PubMed / NCBI
- bioRxiv (preprints)
- medRxiv (preprints)
- PubMed retractions feed
Researcher output
- Huberman Lab podcast
- FoundMyFitness (Patrick)
- Lifespan Podcast (Sinclair)
- The Drive (Attia)
- Longevity Podcast (Longo)
Database enrichment
- OpenAlex (researcher metrics)
- USDA FoodData Central
- Phenol-Explorer
- FooDB
Grower registry
- USDA Farmers Market Directory
- USDA Organic Integrity DB
- State extension programs
Editorial policy
What we will and won't do
We do not manually seed data
If a compound or researcher is not in the catalog, it is because the pipeline has not yet ingested sufficient evidence — not because someone forgot to add it. We do not manually insert entries to shortcut the evidence process.
We do not suppress contradictions
When evidence contradicts a prevailing claim, the contradiction is surfaced publicly. We do not hide conflicting data to make an entity look more favorable.
We distinguish AI-generated content
Content generated by the AI synthesis step (definitions, mechanisms, FAQs) is labeled "Generated by AI from pipeline evidence." It is derived from extracted claims — not invented — but it is not the same as a human-written review.
We are not affiliated with tracked researchers
Virisource tracks Sinclair, Attia, Patrick, Huberman, Longo, and others as primary sources. We have no financial relationship, endorsement, or affiliation with any of them. Their positions are tracked as data, not as recommendations.
Corrections
If you identify a factual error — a wrong affiliation, an incorrectly attributed claim, a PubMed ID linking to the wrong paper — contact us at corrections@virisource.com. We investigate and correct within 14 days.
This is not medical advice
Nothing on Virisource constitutes medical advice. Longevity research is an active and contested field. What appears here reflects the current state of published evidence, not clinical recommendations. Consult a licensed physician before making dietary or supplementation decisions.