

The pioneer of large models in life sciences, opening up an open and win-win future for the industry.Today,BioMap Announces the Open Source of Its Leading Protein Language Model in xTrimo V2xTrimoPGLM,Seven models with different parameter quantities have been released on Hugging Face and GitHub for global users to freely access and use.https://huggingface.co/biomap-research
https://github.com/biomap-research/xTrimoPGLM
xTrimoPGLM isThe World's First Protein Language Model with Trillion Parameters, outperforming previously industry-leading protein models such as ESM-2 and ProGen2, and inDrug molecular design and optimization, antibody engineering and vaccine development, enzyme engineering and biocatalyst design...and other fields show broad application prospects.The opening of this field means that the advanced AI tools, which were previously only available to top pharmaceutical companies and leading laboratories, will now benefit more developers, bringing new development opportunities to the entire life sciences industry.This is an attempt to drive innovation with innovation, focusing on open source itself, and the trend of free exchange and open sharing in this era is gathering "collective intelligence."At present, artificial intelligence is still in its early stages, and its application in the life sciences is even more nascent. By fostering innovation through open-source ecosystems to expand the pie, ultimately driving the prosperity of the entire industry, this represents BioMap's strategic foresight as a leader.With the rise of the open-source wave of large models like DeepSeek-R1, the pursuit of ultimate performance optimization and the inclusive spirit of open source have sparked a profound movement of technological equality, and BioMap has chosen to sow the seeds for a more open and inclusive tomorrow.The open-source release of the trillion-parameter xTrimoPGLM marks the DeepSeek moment for the industry.Standing at the critical juncture of AI's deep integration into life sciences, BioMap, with a vision of inclusiveness, builds on technological innovation as its foundation and ecological collaboration as its ladder to deeply enhance the intelligence level of the industry.As BioMap presses the "accelerator" on technological implementation, the curtain on a major industry transformation is slowly rising.Deep Cultivation of Large Models: BioMap's Evolution TheoryThe field of protein is the most fruitful and prominent pearl in the application of AI in life sciences.
As an important molecule in the life system, proteins participate in almost all life processes, including metabolism, immunity, conduction, cell differentiation, and signal transduction. The complexity of their structure, function, interactions, and regulatory mechanisms has always been a key focus for scientists to explore.
From the debut of AlphaFold2 in 2020 to winning the 2024 Nobel Prize in Chemistry, AI has helped humanity decipher the "protein code" and moved from the laboratory to industrial applications, coveringNew Drug Development, Disease Diagnosis, Synthetic BiologyIn multiple fields, showing huge market potential.
Because of this,Protein models have gained far more attention and popularity in the AI + life sciences field than others., which is often the first battle for enterprises or research teams to prove their strength and gain industry recognition. Meta and DeepMind, both highly influential in the industry, have launched protein-related models.
As protein computing pioneer David Baker said:"Proteins are the machinery of life, and understanding their language will unlock the secrets of biology."
As one of the earliest companies globally to dedicate itself to the research and development of large models in the life sciences, BioMap's first academic open-source project, HelixFold-Single, focuses on the field of protein structure prediction and has been featured on the cover of a Nature sub-journal.

HelixFold-Single Model Framework Diagram
This model is globallyThe First High-Speed Protein Structure Prediction Model Not Dependent on MSA, achieving a breakthrough in "Folding with Large-scale Protein Language Model," with a speed increase of over a hundred times in evaluation tasks, bringing a new leap to the field of protein structure prediction.
In 2023, the company's protein language modelxTrimoPGLMA more successful pre-training method that integrates two different types of tasks: protein understanding and protein generation.Achieved SOTA in 15 out of 18 tasks, with overall performance surpassing the original SOTA task model as well as the pre-trained Meta ESM-2.
In the same year, BioMap's "ChatGPT Capable of Generating Proteins"has also been born. This AIGP (AI Generated Protein) platform, driven by xTrimo, is able to generate proteins in a targeted manner or design proteins through generative methods.

After years of沉淀, BioMap's large protein model has completed several rounds of self-evolution and improvement, with various data feeding back into the AI platform's training through an ecological循环, further enhancing the model's capabilities.

The First Large Protein Model with a Trillion Parameters:
Big is Powerful, Bridging Understanding and Generation
Firmly choosing the large model direction, BioMap has expanded the boundaries of biological computing with its profound technical practice.
In the field of natural language,Scaling LawHas become the golden rule, the most well-known principle in the artificial intelligence industry, and is regarded by Microsoft CEO Nadella as the true driving force behind the AI revolution.
The reason lies in the Scaling Law, which reveals the key to global AI competition —The Intrinsic Relationship Between Model Performance and Data Scale: Model performance should grow linearly as model parameters, data volume, and computation increase proportionally in an exponential manner.
Scaling Law has become the cornerstone of a host of large models like GPT, prompting all parties in the industry to build moats around data and computing power, and serving as the underlying belief for many elites who are convinced that AI can change the world.
Furthermore, the research results of xTrimoPGLM validate the Scaling Law:With the exponential growth of computational power in protein language models, the performance on downstream tasks will also increase linearly.。
This breakthrough achievement demonstrates the necessity of large models in handling complex biological tasks, providing theoretical support for the development of biological large models.
Under the rule of "bigger is stronger," BioMap, with the largest parameter scale in the industry, has secured a leading position.
Of course,BioMap's technical advantages are not only reflected in the breakthrough of model scale, but also in its deep understanding and precise grasp of the complex systems in life sciences.
Traditional protein language models are often limited by a single pre-training objective, either excelling in understanding tasks (such as the ESM series models, mainly used for protein structure prediction) or focusing on generation tasks (such as ProGen, which emphasizes protein generation), revealing shortcomings in task adaptability and generalization capabilities.
Based on a deep understanding of protein data, BioMap researchers innovated the xTrimoPGLM pre-training framework, successfully optimizing both understanding and generation tasks by combining the advantages of GLM (General Language Model) and MLM (Masked Language Model).
This unified framework enables xTrimoPGLM toUnderstand the task of providing precise amino acid and sequence-level representations, while in generation tasks, being able to produce novel protein sequences that are structurally similar to natural proteins.
By constructing an unprecedentedly large protein language training dataset and combining it with an innovative algorithm architecture to fully explore the potential value of massive parameters, BioMap's xTrimoPGLM demonstrates outstanding performance.
In protein understanding tasks, xTrimoPGLM excels in various evaluations, covering areas such as protein structure, function, interaction, and developability.15 out of 18 tasks surpass the previous SOTA model。
In addition, xTrimoPGLM has also demonstrated outstandingDe novo design of protein sequencesThe performance, which can generate structurally similar but sequence-divergent proteins, offers more possibilities for drug design and protein engineering.
By customizing specific structures and biophysical properties through supervised fine-tuning, the "super alignment" capability of xTrimoPGLM will further unleash its potential as a programmable model in exploring and synthesizing the vast protein space.
After continuous technological iteration and optimization, xTrimoPGLM has reached an internationally leading level in terms of model scale and performance metrics, establishing its benchmark status in the field of biological computing.
There is no doubt that the open-source release of xTrimoPGLM will provide strong momentum for both academia and industry. This choice echoes the practice of DeepSeek, promoting the widespread application of AI in life sciences and accelerating global research progress.
Technology Ideals Become RealityBioMap Empowers Global Customers, Leading the Innovation EcosystemThe development of large models is like a vigorously growing tree, with its roots in the continuous innovation of underlying technologies, and its lush branches and leaves symbolizing the thriving development of the entire ecosystem.The open-source release of xTrimoPGLM is just the beginning. Looking back at the journey of BioMap over the past five years, one cannot help but feel that the path of large-scale life science models, fraught with challenges, has already paved broad avenues in terms of technology, commerce, and ecosystem.Last year, BioMap released a model with 210 billion parameters, covering seven major mainstream modalities in life sciences, including proteins, DNA, RNA, etc.xTrimo V3`, becoming the world's largest and first life science large model to achieve full modality coverage.`
This large model family can actually be applied to all different environments in the life science industry chain.From early molecular R&D, production scale-up, to real-world clinical analysis, and finally to drug marketing and sales, achieving full-chain coverage.The construction of a full-modality system not only provides end-to-end technical support but also establishes an innovative paradigm for multi-modal fusion, demonstrating great potential in various scenarios.For example, in target discovery, the efficiency and accuracy can be significantly improved through multimodal collaboration at the cellular scale, combining proteins, cell characterization, and text-generated perturbation encoding, and finally verified with the assistance of a biological vision model.BioMap has successfully validated and licensed multiple immunotherapy combination targets or tumor-specific target-related achievements, with projects entering the preclinical research stage.Not only that, but with the help of a one-stop model platform, BioMap has built revolutionary infrastructure for the entire life sciences field in the AI era.On the training side:The company innovatively developed a unified multimodal training framework for biological data, achieving cross-modal pre-training.Full-stack support for fine-tuning on downstream tasks significantly enhances the model's generalization and adaptability.At the inference end:BioMap has customized a computing engine that deeply integrates biology and AI. Through algorithm optimization and hardware collaboration, it has achieved a tenfold increase in inference performance.This set of technical solutions has demonstrated significant application value in industrial practice, with the xTrimo platform being utilized in over 200 task models across fields such as AI target discovery, protein design, and strain modification.Supported customers in achieving more than 20 validated antibody/enzyme designs, and obtaining licenses for over 10 innovative targets, among other breakthrough results.
In the key proposition of promoting the implementation of AI solutions and improving service efficiency, BioMap has summarized a systematic methodology based on years of in-depth industry practice.Based on the world's most comprehensive life science AI model library, model customization platform, model workflow management platform, high-performance computing platform, and automated laboratory-data middleware — along with support from AI, bioinformatics, and structural biology experts — the company helps customers build core competitive advantages in key areas where AI can truly enhance efficiency.So far, BioMap has servedMore than 400 global users, 60 QS100 universities, with signed orders of potential value nearing 2 billion US dollars., covering top pharmaceutical companies, research institutions, and biomanufacturing enterprises, spanning multiple fields such as drug development, agrochemicals, and environmental protection.Indeed, xTrimo is not yet fully mature but has an open space for development waiting to be explored. It is precisely this characteristic that makes the platform vibrant, allowing more enterprises, research institutions, and developers to participate. BioMap continues to invest in ecosystem construction, attracting more innovative forces to converge into a tide, jointly promoting the iterative upgrading and value creation of the platform.In June last year, BioMap established its first international innovation center (BioMap InnoHub) in Hong Kong."BioMap BioX Innovation Acceleration Program", which is expected to support more than fifty cutting-edge life science early-stage R&D projects in the next five years.
For the selected projects, BioMap will utilize its own large life science model xTrimo to provide technical support from BioMap to researchers and entrepreneurs, and help them connect with global flagship companies and investors to explore more application scenarios.Position itself as"A world-leading provider of AI models in the life sciences"BioMap is gradually establishing full-chain support from underlying algorithms to development kits, application scenarios, and commercial and ecological aspects.First comes the technology, then the construction of a commercial closed loop and the connection of the ecosystem, ultimately building up a collaborative acceleration network of "basic research - technology development - industrial application" to achieve the optimal solution of economic value and social value, bringing intelligent solutions to global business partners.The rise of DeepSeek has been the hottest topic in the AI industry this year, and now this wave has surged beyond the shores of the IT industry, reaching the high ground of life sciences.As a leader in large life science models, BioMap has chosen to demonstrate its strategic foresight through action. The deep logic behind this open-source move is both well-timed and profound.The exploration of human understanding, design, and even generation of proteins is entering uncharted territory, and the construction of an open and collaborative ecosystem is the only way to drive industry transformation.After all, in the long race of decoding the secrets of life, openness and sharing may well be the best accelerator. Just as DeepSeek has broken the limitations of "small yards with high walls" through its open approach, BioMap’s open-source efforts similarly highlight the profound foundation of enterprise technological innovation, vividly illustrating the spirit of openness, inclusiveness, collaboration, and win-win outcomes rooted in Eastern culture.Today, BioMap is deeply rooting AI to build a full-modal, high-performance large model technology base; using a one-stop service platform as the backbone to achieve full-stack integration of large life science models, meeting diversified intelligent needs with systematic service capabilities.In the near future, the ecological leaves nurtured on the foundation of AI will extend far and wide, deeply synergizing with upstream and downstream partners, giving rise to a myriad of blossoms in AI + life sciences.

Recommended Reading


