MIT's VaxSeer AI Outperforms WHO in Forecasting Dominant Flu Strains, Retrospective Study Shows

In the high-stakes world of influenza vaccine manufacturing, experts have long relied on surveillance and lab assays to pick which viral strains to include each season—a process that often yields vaccine effectiveness rates barely scraping 40 percent. Now, MIT researchers have thrown artificial intelligence into the mix with a model called VaxSeer that, in retrospective tests, consistently beat the World Health Organization’s own recommendations at identifying the most protective strains.

Twice a year, the WHO convenes global influenza experts to recommend which viral strains should go into the upcoming seasonal vaccines. These recommendations must be locked in months before the flu season begins so manufacturers can produce and distribute hundreds of millions of doses. The long lead time, combined with the virus’s notorious ability to mutate, means that even well-matched seasons often see vaccine effectiveness in the 40–60 percent range—and in mismatched years, protection can plummet into the low tens of percent.

VaxSeer is designed to address that gap. Developed by a team at MIT, the system uses a blend of protein language models, epidemiological simulations, and antigenicity prediction to produce a single forward-looking metric called a “coverage score.” That score tells vaccine planners how well a candidate formulation is likely to protect against the viruses that will actually be circulating months later.

How VaxSeer Works: A Fusion of AI and Epidemiology

VaxSeer isn’t just another evolutionary forecasting tool. It integrates three distinct technical components into one cohesive pipeline.

1. A Protein Language Model
The model is trained on decades of hemagglutinin (HA) protein sequences from circulating influenza viruses. HA is the primary target of the immune system’s neutralizing antibodies, and small changes—mutations in key amino acids—can allow a virus to escape immunity. Instead of treating each mutation in isolation, VaxSeer’s protein language model learns how combinations of mutations influence viral fitness and competitive success. This combinatorial approach captures subtleties that simpler models miss.

2. A Dominance Predictor
Knowing which mutations will arise isn’t enough; the model must also forecast which viral lineage will become prevalent. VaxSeer’s dominance predictor simulates competition among co-circulating variants, using ordinary differential equations to model how frequencies shift over time. It accounts for selection pressures and existing population immunity to estimate the probability that a given lineage will dominate.

3. An Antigenicity Estimator
The third piece approximates the result of a classic laboratory test: the hemagglutination inhibition (HI) assay. The HI assay measures how well antibodies from a vaccinated person or animal can block the virus from binding to red blood cells. VaxSeer’s antigenicity predictor estimates, in silico, the cross-reactivity between a candidate vaccine strain and any emerging virus. By anchoring to a standard lab metric, the output speaks directly to virologists and public health labs.

All three components feed into a mathematical framework that produces the coverage score. This score weighs antigenic similarity by the predicted future dominance of each circulating strain. A vaccine that covers a likely-dominant strain gets a higher score than one that only neutralizes rare variants. Scores closer to zero indicate a better expected antigenic match.

Retrospective Test Results: Nine Seasons Out of Ten

The MIT team evaluated VaxSeer against a decade of historical influenza surveillance data, and the findings are striking.

For influenza A/H3N2, the subtype notorious for rapid drift and frequent vaccine mismatches, VaxSeer’s strain selections outperformed those made by the WHO in nine of ten seasons. The metric used was an empirical coverage score derived from observed dominance and HI test results.
For A/H1N1, VaxSeer matched or outperformed the WHO’s retrospective choices in the majority of test seasons.
In one instructive example, VaxSeer identified a strain in 2016 that the WHO added to vaccine recommendations only the following year—an early-warning signal that could have closed a critical protection gap.

These numbers are more than academic. H3N2 has historically driven the worst vaccine effectiveness years. A model that can nudge H3N2 selection in the right direction across nearly every season suggests it has learned deep patterns invisible to conventional surveillance.

Furthermore, VaxSeer’s predicted coverage scores correlated well with independently measured vaccine effectiveness and public health outcomes across multiple surveillance systems. That linkage—from in-silico score to real-world hospitalizations—makes the case for operational adoption.

Why This Matters Operationally: Vaccine Timelines

Vaccine manufacturing is a race against time. After the WHO issues its February recommendations for the Northern Hemisphere, manufacturers must grow virus in eggs or cells, inactivate or attenuate it, purify, fill, and package. The process takes months, and if a late discovery reveals an antigenic mismatch, there’s little room to pivot without risking shortages or wasted doses.

VaxSeer’s forward-looking coverage score could enter the workflow early. National advisory committees could use ranked options to clarify trade-offs, procurement decisions could be made with quantified risk assessments, and manufacturers could plan alternative seed strains without committing to expensive changes. A model that provides earlier, evidence-based rankings could reduce the guesswork in a process that currently relies heavily on expert judgment and global consensus.

Critical Analysis: Strengths and What Could Go Wrong

Strengths
- Integration: Combining language models, epidemic dynamics, and antigenicity in a single end-to-end system is a principled advance. It addresses both “what will circulate” and “will it be recognized by the immune system.”
- Practical metric: The coverage score directly targets the operational decision vaccine planners face—not just which strain is likely, but which will provide population protection.
- H3N2 performance: Nine-of-ten successes against the hardest subtype to predict suggests the framework captures meaningful, robust biological signals.
- HI assay compatibility: By aligning with existing lab standards, VaxSeer’s output is immediately interpretable to virologists.
- Extensibility: The modular design could be adapted to other fast-evolving pathogens, such as coronaviruses, given sufficient data.

Limitations and Risks
- Retrospective ≠ prospective: Influenza evolution is path-dependent. A model that fits past data perfectly may fail when a novel, unforeseen variant emerges. Prospective validation is essential.
- Data biases: VaxSeer relies on global surveillance data. Sequencing is uneven; biases in lineage prevalence can skew dominance predictions. Under-sampled regions could lead to blind spots.
- HI assay flaws: The HI assay has well-known limitations, especially for H3N2 viruses and human polyclonal responses. Relying primarily on HI proxies may miss antigenic nuances captured by neutralization assays or antigenic cartography.
- Interpretability: Protein language models are often “black boxes.” Public health agencies and manufacturers may demand explanations of why a strain is predicted to dominate, especially if recommendations conflict with expert consensus.
- Regulatory and governance hurdles: WHO’s process is consensus-driven and transparent. Integrating AI recommendations requires open validation, reproducible methods, and mechanisms to resolve model–expert disagreements.
- Dual-use concerns: Advanced forecasting of viral evolution could, in principle, be misused. Governance controls and responsible disclosure are necessary.
- Overfitting to historical protocols: Training on decades of HI data may bake in quirks of past assay protocols that do not generalize to modern labs.

The Path to Real-World Integration

Moving VaxSeer from a promising paper to a routine part of vaccine composition meetings isn’t just a technical challenge. It requires a deliberate, step-by-step approach:

Prospective pilots: Run blind validations during real-time flu seasons in partnership with national influenza centers, comparing VaxSeer’s outputs against conventional selections ahead of official decisions.
Transparency: Publish model code, training data schemas, and evaluation scripts. External replication is the gold standard for building trust.
Stakeholder workshops: Engage WHO, national advisory groups, manufacturers, and regulators early to design workflows that complement existing decision-making, rather than replace it.
Data equity: Strengthen sequencing and antigenic characterization in under-sampled regions. Use statistical corrections and uncertainty quantification when data are sparse.
Regulatory confidence: Define criteria under which a model-informed recommendation could accelerate or support conventional processes, with clear risk thresholds for major decisions.
Safeguards: Apply ethical governance frameworks to limit access to any model components that could materially assist harmful manipulation of viral sequences.

Broader Implications: Beyond Flu Season

If VaxSeer or similar frameworks prove robust in prospective studies, the ripple effects could extend far:

Manufacturing resilience: Better forecasts could reduce last-minute production changes, cutting waste and stabilizing supply.
Targeted vaccines: Quantitative predictions of dominant lineages could one day inform region-specific formulations for high-risk groups.
Accelerated universal vaccine research: For next-generation broadly protective candidates, predictive models could prioritize antigen designs robust to likely evolutionary paths.
Cross-pathogen toolkit: The conceptual architecture—sequence-aware ML, dominance modeling, and antigenicity estimation—could be transferred to other rapidly evolving viruses, creating reusable infrastructure for epidemic preparedness.
Policy evolution: Introducing AI into strain selection will necessitate new international protocols for model validation, data sharing, and decision arbitration, reshaping global health governance.

Conclusion: From Reactive Guesswork to Informed Anticipation

VaxSeer represents a compelling fusion of modern machine learning and epidemiological modeling. Its retrospective ability to beat WHO’s strain selections in nine out of ten H3N2 seasons alone warrants serious attention. The framework’s practical, coverage-score approach directly addresses the operational bottleneck that has hampered influenza vaccine effectiveness for decades.

But the gap between retrospective triumph and real-world adoption is wide. Prospective validation, transparent methods, and careful integration with the WHO’s consensus-driven process are not optional—they are prerequisites. The history of influenza forecasting is littered with models that looked brilliant in the rearview mirror but stalled in prospective tests.

If VaxSeer’s promise holds, it could modestly but meaningfully improve vaccine match rates, reducing hospitalizations and deaths year after year. And perhaps more importantly, it could lay the groundwork for an era where AI transforms high-stakes public health decisions from reactive guesswork into data-driven anticipation.