The myth of Nüwa creating humanity is widely known. The naming of the “Nüwa” series of large language models for life sciences carries similar profound significance, symbolizing an ambitious quest to uncover the fundamental laws governing life sciences.
In the fields of life sciences and drug R&D, the Nuwa Life Science series of large models plays a crucial role, ranging from helping humans understand how proteins, RNA, DNA, and small molecules interact within biological systems to elucidating multi-protein structure–function relationships and facilitating drug development. Currently, “Nuwa” has achieved breakthrough progress in areas such as gene regulation analysis, biofluid simulation, and prediction of protein dynamic structures.
At the 2025 VBEF Future Healthcare & Pharma Top 100 Expo · Forum on the Development of Precision Medicine and Molecular Diagnostics Industries, Guo Xin, Principal Investigator at the Shanghai Institute for Scientific Intelligence, provided an in-depth presentation on how the Nüwa Life Science series of large models is bringing new hope to life sciences research and drug development.

Guo Xin, Principal Researcher at the Shanghai Institute for Scientific Intelligence
The deep integration of AI and life sciences is
Key Drivers Advancing Life Sciences Research
The bidirectional synergy and deep integration of AI and life sciences are key drivers propelling life science research, and their convergence can be summarized into two major components.
Part of it is the accumulation of big data in life sciences brought about by breakthroughs in various observational and testing technologies.For example, the emergence of a series of key biotechnologies—such as cryo-electron microscopy in 2013, next-generation sequencing in 2015, and epigenetic sequencing in 2017—has led to an explosive growth in data within the life sciences. These data encompass protein structures, gene sequences, epigenetic information, and more, providing abundant resources for the development of various large-scale life science models.
The other part is the continuous emergence of new AI technologies, enabling people to constantly refine their understanding of the principles governing life sciences.For example, following the deep learning boom in 2012, CNNs shone brightly in the field of image recognition; the Transformer architecture laid a crucial foundation for the emergence of large models such as ChatGPT.
From 2017 to 2018, AI and life sciences began their first deep integration. For instance, AlphaFold achieved protein structure prediction accuracy comparable to that of wet-lab experiments for the first time, providing researchers with a novel tool for elucidating protein structures. In 2022, the emergence of unsupervised pre-trained large language models, exemplified by ChatGPT, further accelerated the convergence of AI and life sciences, making it possible to learn biological principles from massive amounts of unlabeled data. By 2025, the release of the Evo2 large model enabled genomic design across different species.
Guo Xin stated that the ultimate goal of systems modeling in life sciences is to decipher the human body as a complex, multi-omics, multi-molecular, and dynamically evolving system. The human body comprises trillions of cells, each composed of billions of molecules, with cells undergoing continuous dynamic evolution. Determining how to approach this complex system is the primary challenge urgently needing resolution in the field of AI for life sciences.
The emergence and integration of the aforementioned data technologies and large models enable us to adopt a systematic perspective to understand the new ideas and approaches that AI offers for complex research in the life sciences, and to distill corresponding development trends therefrom.
At the algorithmic level, Guo Xin summarized three existing trends.First, the integration of generative artificial intelligence enables the de novo design of multiple groups of biomolecules, thereby generating novel molecules that do not exist in nature. This provides new rational approaches and foundational models for drug discovery and the design of other functional molecules. Second, large-scale models represented by AlphaFold3 can achieve high-precision spatial modeling of biomolecular complexes within a unified deep learning framework, bringing numerous breakthroughs to the study of interactions among complex multimolecular assemblies. Third, through synergy and iterative cycles between large-scale models and wet-lab experiments, more efficient and optimized directed evolution designs can be continuously refined.
Large language models (LLMs), exemplified by ChatGPT, have exerted a significant impact on life sciences research. On one hand, the modeling approaches of LLMs can integrate massive amounts of imaging and omics data, thereby enabling certain interpretations of cells—the fundamental units of life—and simulations of biological processes. On the other hand, LLMs themselves serve as composites of human knowledge, achieving a state analogous to that of human researchers. By exploring literature and simulating scientific debate, they can generate novel research hypotheses, as demonstrated by Google’s AI Co-Scientist project.
Micro + Macro:
Nüwa Life Science Series of Large Models Brings New Approaches to Life Sciences
The Nuwa Life Science series of large models, developed by Guo Xin’s team, fully leverages existing big data resources. By mining and analyzing massive datasets, the team has constructed a multi-omics training model, achieving breakthroughs in multiple tasks, including gene sequence design, gene expression regulation, and elucidation of disease mechanisms.
For instance, the Nuwa multi-omics sequence large model integrates genomic and RNA transcriptomic data at a scale exceeding one billion entries, achieving state-of-the-art performance in various downstream nucleic acid design tasks. Particularly in the field of innovative drug development, it provides significant support for siRNA drug design by leveraging the team’s self-constructed, world’s largest siRNA database.Meanwhile, the Nuwa Gene Navigation large model achieves high-precision, long-range prediction of gene regulatory relationships by modeling epigenomic data, thereby providing new methods and tools for disease diagnosis and treatment.
The Nüwa Life Science series of large models is grounded in the essence of cross-scale, multi-omics, and dynamic complex systems. Focusing on two fundamental scenarios—microscopic molecules such as genes and proteins, and macroscopic phenotypes—it aims to provide foundational model capabilities for product platforms including innovative gene-based drug development, dynamic protein design, and digital twin diagnostics and treatment, thereby offering new perspectives and methodologies for life science research. Leveraging the Nüwa Life Science series of large models, researchers can achieve de novo design of multi-omic biomolecules by combining large language models with generative artificial intelligence, and perform dynamic simulations across the full molecular scale by integrating physical principles with advanced computational architectures.
At the micro level,“Nüwa” provides an in-depth analysis of the origins and foundations of life by modeling gene sequences and gene expression. Through the design and optimization of gene sequences, nucleic acid therapeutics can be developed, offering new hope for the treatment of rare genetic disorders, hepatitis B, HIV, and other diseases. Meanwhile, modeling gene expression enables the characterization of gene activity across various diseases and physiological states, providing critical support for elucidating disease mechanisms and identifying novel therapeutic targets. This approach even holds the promise of rendering previously “undruggable” targets druggable, thereby addressing key challenges in innovative drug development.
At the macro level,“Nüwa” achieves a comprehensive analysis of human diseases by modeling human CT-PET imaging data in conjunction with molecular-level modeling results. For instance, by performing associative modeling of pathological images and spatial transcriptomics results, and by conducting correlation studies on MRI-CT imaging data from large biobanks alongside patient proteomic and genomic sequencing data, this large model provides new perspectives and methodologies for disease diagnosis and treatment.
The Nuwa Life Science series of large models has not only achieved significant results in life science research but also brought about a transformative shift in scientific paradigms. By integrating agent technology with large language models, it has constructed an automated research platform that enables the iterative advancement of scientific discovery.
Guo Xin emphasized that the Nüwa Life Science series of large models will continue to deepen research in fields such as gene regulation, biofluids, and biological structures, constantly improving modeling capabilities for cross-scale, multi-omics, and dynamic complex systems, thereby providing more robust support for life sciences research. Meanwhile, the platform will be open-sourced to the scientific community, promoting the translation and application of research findings through collaboration with clinical medicine, drug development, and other sectors, thus bringing new opportunities and challenges to scientific research.