Home GeneLLM™: A Lightweight and Precise Foundation Model Powering the Science-Centric Evolution of AI in Genomic Diagnostics

GeneLLM™: A Lightweight and Precise Foundation Model Powering the Science-Centric Evolution of AI in Genomic Diagnostics

Feb 24, 2025 09:46 CST Updated 09:46
OXTIUM

AI+Life Science Technology Developer

With the rapid advancement of artificial intelligence technologies, the field of biological sciences is undergoing a profound transformation.AI has not only introduced new tools and methods for drug discovery and disease diagnosis but has also redefined the fundamental research paradigm in biological sciences through data-driven approaches. However, when confronted with massive datasets and complex systems, the computational resources and technologies traditionally employed in scientific research are increasingly revealing their limitations. Leveraging its independently developed GeneLLM™ large language model, OXTIUM is systematically delivering lightweight and precise solutions to this challenge by innovating across multiple dimensions—including hardware, algorithms, architecture, optimization, and data—thereby driving the deep integration of AI and biological sciences.


In the DeepSeek Era, the Contradiction Between the Multidimensional Nature of Data and the Limitations of AI Computing Power Is Becoming Increasingly Prominent


Despite DeepSeGeneral AI platforms such as ek have achieved intensive training costs by leveraging underlying technologies and integrating tens of thousands of NVIDIA A100 GPUs (reducing costs compared to traditional distributed computing)approximately 50%), but domain-specific large models for AI in Bioscience still face severe computational bottlenecks and significant cost pressures when processing multidimensional data in the field of biological sciences. The core root of this contradiction lies in the fact that the complexity, diversity, and scale of biological science data impose exponentially growing performance demands on computational resources, presenting technical challenges that far exceed those of conventional language-based large models.


Taking genomics as an example, the raw data generated by whole-genome sequencing of a single human ranges from 100 GB to 200 GB (covering 3 billion base pairs and sequencing information), while large-scale cohort studies (such as UK Biobank) needs to process petabyte-scale data from over 500,000 samples. Proteomics data is even more complex; a single mass spectrometry experiment can generate tens of thousands to millions of peptide signals. If multi-omics data need to be loaded simultaneously for computationally intensive tasks such as RNA 3D structure prediction and molecular dynamics simulations, it often leads to a nonlinear increase in computational complexity, with memory requirements exceeding 1 TB.


Therefore, AI models for biological sciences and language models are not in the same competitive arena.They differ not only fundamentally in data types and computational requirements, but also significantly in storage methods, technical frameworks, algorithmic logic, and application scenarios. Language models primarily handle syntax, semantics, and contextual relationships in natural language, whereas bioscience models must address the interdisciplinary integration of biology, chemistry, physics, mathematics, and other fields while processing complex biological data, and even need to mine potential research paradigms from vast amounts of foundational biological information. Therefore,Applications of Models in Biological Sciences,A Greater Need for an Interdisciplinary “Scientific Revolution”, breaking the limitations of traditional computational methods and even fundamental scientific research methodologies, thereby driving a revolutionary leap in productivity.


From this perspective, onlyInnovative Proprietary Foundational Scientific ModelCapable of completing at a lower cost, more efficient research protocols.


JinSince its establishment in 2022, OXTIUM has gained deep insights into three major pain points in the field of biological scientific research: high demand for computational resources, insufficient model generalizability, and complex and diverse data.FromFrom five perspectives: lightweight architecture, dual-configuration chips, underlying algorithm optimization, expert-level data filtering, and efficient storage technology, pioneered the development of a lightweight multi-omics large language modelGeneLLM™’s Layout:


1. Pain Point 1: High Demand for Computational Resources:Training and inference for large-scale models require immense computational power, resulting in high costs that hinder widespread adoption. OXTIUM employs a multi-pronged strategy by deploying both cloud platforms and all-in-one inference appliances, with configurations leveraging both imported and domestically produced chips. Its core large language model, GeneLLM™, has been optimized to significantly reduce computational and storage requirements (with as few as 1.5 billion parameters). By utilizing intelligent resource scheduling technology, GeneLLM™ operates efficiently on both cloud platforms and desktop all-in-one inference appliances, substantially lowering the computational costs of scientific research.


2. Pain Point 2: Insufficient Model Generalizability:Existing bioscience models are often designed for specific tasks and lack generalizability across domains and scenarios. GeneLLM™ is trained on raw biological data (such as sequencing data) and, through efficient compression techniques, enables performance with as few as one hundred cases.Data alone can complete the analysis of a single disease.


3. Pain Point 3: Complex and Diverse DataThe diversity of biological data leads to suboptimal performance of deep learning algorithms in practical applications, making it difficult to meet the innovative research demands of biological sciences. GeneLLM™ addresses this challenge through adaptive learning and multimodal data integration technologies, complemented by Bioford™, which incorporates hundreds of biological models.The platform efficiently processes complex biological data, enhancing the robustness and accuracy of algorithms to meet the diverse needs of innovative research.


GeneLLM™: Peking University AI Genes Shared with DeepSeek


GeneLLM™ is a large language model that integrates OXTIUM’s core technologies. Aligned with the underlying architecture of DeepSeek, it emphasizes specialized applications in vertical scientific domains and is dedicated to advancing the industry.Innovation. Professor Sha Lei, Co-founder of OXTIUM, graduated from the Institute of Computational Linguistics at the School of Computer Science, Peking University. He has long been dedicated to the optimization and industrial application of AI algorithms, having served as a Research Associate at the University of Oxford and as a Senior NLP Scientist at Apple Inc. Professor Sha not only provided substantial technical support for the development of GeneLLM™, but his outstanding contributions to the field of artificial intelligence also give this project unique advantages in promoting scientific and technological progress. It is worth mentioning that Professor Sha and the DeepSeek team’sThe core R&D personnel share the same academic mentor, maintaining close scholarly ties; they were senior fellow students of Luo Fuli and Dai Daimai.


Bioford™, a one-stop biological research platform built on GeneLLM™, has integrated hundreds of large-scale biological science models. It covers multiple fields, including basic research, medical diagnostics, drug development, biomanufacturing, biological breeding, and environmental monitoring. Featuring a user-friendly modular interactive interface and providing an extensible model framework for individual tasks, it supports the adaptation and integration of cross-disciplinary tasks, significantly enhancing the versatility and practicality of the models across diverse application scenarios.


The core advantages of GeneLLM™ include:


1. Multi-domain, multi-dimensional data integration: Capable of processing multi-dimensional data from genomics, transcriptomics, proteomics, metagenomics, and epigenomics, it deeply integrates technological advancements in artificial intelligence algorithms, genomics, and bioinformatics to provide comprehensive research support.


2. Cross-Domain Knowledge Transfer: Through pre-training and fine-tuning, the platform model can adapt to diverse task requirements in basic research, medical diagnosis, bio-manufacturing, biological breeding, environmental monitoring, and disease treatment, demonstrating high flexibility. Furthermore, it provides lightweight inference devices and customized solutions tailored to different customer needs, thereby lowering the technical barriers for small and medium-sized research institutions.


3. Efficient Inference Capability: GeneLLM™ can complete few-shot fine-tuning for a single disease within weeks, significantly enhancing research efficiency.


Built upon GeneLLM™, OXTIUM has also leveraged AI algorithms and models to accelerate the industrial application of biological science research. For instance, the Jinling series of reagent kits has achieved import substitution for high-end biological reagents through AI technology. This strategic initiative not only meets the localized demands of biological science research but also further addresses the cost challenges faced by small and medium-sized research institutions in terms of equipment investment and technology adoption.


AI Foundation for Biological Sciences — GeneLLM™ Leading the “Scientific Revolution” in Basic Research


Going forward, GeneLLM™ will continue to provide efficient research tools for the life sciences sector, fostering innovation in basic scientific research. For instance, GeneLLM™ has already demonstrated significant potential in cancer genomics and early risk assessment for Alzheimer’s disease. As the technology matures, it is expected to enable breakthroughs in basic research for a wider range of diseases, thereby substantially enhancing innovation efficiency for enterprises and research institutions while reducing commercialization costs.


In the long term, OXTIUM aims to establish the Bioford™ platform as standard infrastructure in the field of biological sciences, serving as a core support system for researchers and enterprises worldwide. By continuously expanding its capabilities, Bioford™ will encompass a broader range of application scenarios, including drug screening, environmental monitoring, and biological breeding. As a pioneer at the intersection of AI and biological sciences, OXTIUM has not only achieved significant breakthroughs in technological innovation but also demonstrated strong strategic foresight in grasping industry prospects.


Moving forward, OXTIUM will be dedicated to fostering cross-disciplinary innovation in AI and biosciences through collaboration with global partners. For instance, its strategic partnership with Peking University First Hospital has not only validated the clinical value of GeneLLM™ but also provided new solutions for early risk assessment of gastric and colorectal cancers. Furthermore, OXTIUM emphasizes corporate social responsibility by ensuring ethical compliance in technology applications and promoting a balance between technological innovation and social well-being. Looking ahead, OXTIUM will continue to uphold its mission of “Exploring the Mysteries of Life with AI Technology.” By deepening the application of GeneLLM™, the company aims to accelerate the development of new-quality productive forces and drive the intelligent transformation of global bioscience research.


Leading the pack is just the first step; our goal is to drive the rise of a trillion-dollar market sector.


>>>>

About OXTIUM

 

OXTIUM is dedicated to providing one-stop AI-powered solutions for biological scientific research. Its self-developed multi-omics large language model, GeneLLM™, has completed pre-training with 1.5 billion parameters and 3.5 trillion base pairs of sequence data. Built upon GeneLLM™, OXTIUM has launched BioFord™, a one-stop scientific service platform that focuses on six core scenarios: medical diagnostics, drug development, biomanufacturing, basic research, biological breeding, and environmental monitoring. The BioFord™ platform encompasses nine specialized bioscience model libraries: a multi-omics foundation model, a protein model, an RNA 3D structure prediction model, a biomedical text processing model, a biomedical image processing model, a chemistry foundation model, a CRISPR-related prediction model, a single-cell analysis model, and a time-series prediction model. It provides advanced “AI for BioScience” bioinformatics computing services, cloud platform services, and inference all-in-one machines to both academic researchers and industry users. OXTIUM’s clients include leading domestic institutions such as BGI Group, Baidu PaddlePaddle, the Cancer Hospital of the Chinese Academy of Medical Sciences (CAMS), Shanghai Children's Medical Center affiliated with Shanghai Jiao Tong University School of Medicine, and the Chinese Research Academy of Environmental Sciences.


OXTIUM has established R&D centers in Shenzhen and Beijing. Its founding team is led by four University of Oxford alumni and brings together top scientists and engineers in the fields of artificial intelligence, bioinformatics, and bioengineering, who have published more than 60 papers in prestigious journals such as Nature and Nature Communications.


Guided by its mission to “Explore the Mysteries of Life with AI Technology,” OXTIUM will continue to push the technological boundaries of AI + bioscience, providing innovative momentum for biological research and industrial applications, thereby supporting national scientific and technological innovation and industrial upgrading.


图片1.png