
One of the World's Leading Teaching and Research Universities
On June 2, 2025, a team led by researchers including Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, and Yingzhou Lu from Stanford University, in collaboration with Genentech, Arc Institute, the University of California, San Francisco, Princeton University, and other leading research institutions, published a groundbreaking research paper titled “Biomni: A General-Purpose Biomedical AI Agent.”We present the first report on Biomni, a general-purpose biomedical AI agent, which is available for free registration and use at biomni.stanford.edu.
This system is capable of autonomously executing complex research tasks spanning multiple biomedical disciplines, including genetics, genomics, microbiology, pharmacology, and clinical medicine, marking a new phase in AI-driven scientific discovery.
Currently, biomedical research is facing unprecedented challenges: complex laboratory experiments, large-scale datasets, numerous analytical tools, and an explosive growth in the volume of literature. Traditional research workflows are often fragmented and highly repetitive, severely constraining the pace of discovery and hindering innovation. This underscores an urgent need for fundamentally new approaches—a novel paradigm capable of effectively augmenting scientific expertise, streamlining research workflows, and fully unlocking the potential of biomedical research.
Although artificial intelligence has already triggered revolutionary changes in fields such as software engineering, law, materials science, and healthcare, existing approaches in the biomedical domain primarily rely on specialized agent workflows tailored to specific tasks. This limitation severely constrains their generalizability across the broader biomedical landscape. Achieving effective handling of a wide range of biomedical tasks by AI agents still faces significant technical challenges, the most prominent of which is how to organically integrate advanced reasoning capabilities with the ability to execute highly specialized biomedical operations.
To address these challenges, the research team developed Biomni—a general-purpose AI agent specifically designed to automate and advance cross-disciplinary biomedical research. Biomni consists of two core components: Biomni-E1 (a unified biomedical software and data environment) and Biomni-A1 (an agent built upon this environment).
■ Biomni-E1: Unified Biomedical Software and Data Environment
To systematically construct a biomedical action space, the research team adopted an AI-driven approach. Based on the 25 disciplinary categories defined by bioRxiv, they selected the 100 most recently published papers from each category and then employed an action-discovery LLM agent to analyze these papers individually, extracting key tasks, tools, databases, and software resources required to reproduce or generate related research.
The Biomni-E1 environment integrates 150 specialized biomedical tools, 105 software packages, and 59 databases. All these tools have been rigorously validated by human experts, with a particular focus on those exhibiting complex characteristics, including intricate code implementations, domain-specific expertise, or specialized AI models. Regarding database integration, the team categorizes resources into two types: large relational databases accessed via web APIs (such as PDB, OpenTarget, and ClinVar), and databases downloaded into a data lake and preprocessed into structured formats.

■ Biomni-A1: General-Purpose Agent Architecture
Biomni-A1 incorporates multiple core innovative technologies to ensure its effective operation in the field of biomedical research. First, the system introduces a large language model (LLM)-based tool selection mechanism, specifically designed to address the complexity and specialization challenges of biomedical tools, enabling dynamic retrieval of customized resource subsets according to user objectives. Second, considering that biomedical tasks often require complex procedural logic, Biomni-A1 employs code as a universal action interface, allowing it to execute complex workflows involving loops, parallelization, and conditional logic. Third, the agent utilizes an adaptive planning strategy, formulating initial plans based on biomedical knowledge and continuously optimizing and refining them during execution.
■ Excellent Benchmark Performance
The research team evaluated Biomni on three challenging multiple-choice benchmarks: the Human Last Exam (HLE) and LAB-Bench, which comprises two key subtasks: Database Question Answering (DbQA) and Sequence Question Answering (SeqQA).
In the HLE benchmark, Biomni achieved an accuracy of 17.3% across 52 questions spanning 14 biomedical subfields, significantly outperforming base LLM models (6.0%), coding agents (12.8%), and literature agents (12.2%), thereby demonstrating its superior generalization capability in unfamiliar biomedical domains.
In the LAB-Bench evaluation, Biomni achieved an accuracy of 74.4% on the DbQA task, essentially matching expert human performance (74.7%); on the SeqQA task, it attained an accuracy of 81.9%, significantly surpassing the human level (78.8%).

■ Generalization Capability for Real-World Tasks
To evaluate Biomni’s generalization performance in real-world research tasks, the research team meticulously designed eight novel biomedical benchmarks spanning genetics, genomics, microbiology, pharmacology, and clinical medicine. These benchmarks specifically include: variant prioritization, GWAS causal gene detection, CRISPR perturbation screen design, rare disease diagnosis, drug repurposing, single-cell RNA sequencing annotation, microbiome disease–taxon association analysis, and patient gene prioritization.
Across all test tasks, Biomni demonstrates significant performance advantages: a 402.3% improvement over baseline LLM models, a 43.0% improvement over coding agents, and a 20.4% improvement over its simplified version, Biomni-ReAct, resulting in a remarkable average relative performance gain.

■ Wearable Sensor Data Analysis
In a real-world case study, researchers utilized Biomni to analyze 458 Excel files from 30 participants over several months, containing wearable sensor data (continuous glucose monitoring [CGM] and temperature records). Biomni autonomously generated and executed a 10-step analytical pipeline: inferring meal events from glucose peaks, extracting pre- and post-prandial temperature windows, performing cross-individual standardization, and comprehensively analyzing population-level trends. The agent successfully identified consistent patterns of postprandial thermogenic response, revealing an average temperature increase of 2.19°C, while also observing significant inter-individual variability, suggesting the presence of distinct metabolic phenotypes.
■ Multi-omics Research on Skeletal Development
Researchers utilized Biomni to analyze a recently published multi-omics dataset of human skeletal development, which comprises 336,162 single-nucleus RNA sequencing and ATAC sequencing data points. The system autonomously planned and executed a ten-stage analysis pipeline to predict transcription factor–target gene regulatory links and screened regulators based on motif enrichment and chromatin accessibility correlations. Biomni not only recapitulated known regulatory relationships among key osteogenic transcription factors (such as RUNX2 and HHIP) but also identified several previously unreported transcription factors, including AUTS2, ZFHX3, and PBX1.
■ Experimental Protocol Design and Validation
The research team further evaluated Biomni’s performance in real-world experimental design, focusing on its capabilities in a core molecular biology task: gene cloning. In an open-ended cloning benchmark co-designed with experts in gene editing research, the experimental protocols generated by Biomni matched human expert levels in both accuracy and completeness. More importantly, during actual wet-lab validation, scientists strictly followed the protocols designed by Biomni and successfully completed gene cloning, with sequencing results confirming perfect sequence alignment.

To ensure that every scientist can harness the powerful capabilities of Biomni, the research team has developed an intuitive web platform at biomni.stanford.edu. Users need only submit queries in natural language to receive analysis results fully powered by the Biomni agent system. Whether designing complex cloning experiments, querying multi-omics databases, or generating scientific hypotheses from wearable device data, scientists can now easily obtain professional assistance from a general-purpose biomedical AI agent without any programming expertise.
The advent of Biomni marks a significant breakthrough in the field of biomedical research. Its robust generalization capabilities across multiple subfields have laid a solid foundation for AI agents to become indispensable partners in scientific discovery. By automating complex workflows that previously required expert knowledge and programming skills, Biomni enables researchers to devote more energy to formulating innovative hypotheses, designing novel experiments, and fostering interdisciplinary collaboration.
In the field of drug discovery, Biomni can autonomously perform target prioritization, perturbation screening design, and drug repurposing analysis, paving new pathways for more efficient and cost-effective drug development. In clinical applications, its outstanding performance in gene prioritization and rare disease diagnosis heralds the advent of more precise, personalized medical insights and streamlined diagnostic processes. In the consumer health sector, Biomni’s ability to integrate wearable device data with multi-omics analysis paints a promising picture for real-time, personalized health monitoring and precision interventions.
Biomni and its subsequent versions are poised to become the core infrastructure of an AI-driven biomedical ecosystem, seamlessly collaborating with human experts to uncover novel insights into health and disease.This human–machine collaboration model has the potential to fundamentally reshape the landscape of biomedical research—automating hypothesis generation, scaling up discovery pipelines, and driving medical innovation forward at unprecedented speed and scale. General-purpose agents like Biomni can not only accelerate scientific breakthroughs but also redefine the future paradigm of scientific exploration.
KexinHuang*, Serena Zhang*, Hanchen Wang*, Yuanhao Qu*, Yingzhou Lu*, et al. "Biomni: A General-Purpose Biomedical AI Agent." bioRxiv, 2025. https://doi.org/10.1101/2025.05.30.656746.
Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, and Yingzhou Lu from Stanford University are co-first authors of this paper. The study was supervised by Jure Leskovec, Le Cong, and Michael Snyder from Stanford University, and Aviv Regev from Genentech.