As 2021 approached, Beijing-based ByteDance posted a new job opening on a recruitment website: Bioinformatics Engineer. With a monthly salary of RMB 20,000–40,000 and 15 months’ pay per year, this compensation package was competitive even by Beijing standards.
In the job description section, the primary responsibilities of this role include establishing NGS data pipelines, designing, promoting, and performing performance validation for tumor NGS testing products, as well as identifying potential productization opportunities and research directions within the data. Simply put, ByteDance is preparing to enter the NGS field, with the ultimate goal likely being to extract valuable insights from the data.
In late 2019, Huawei also posted two job openings related to biomedicine. One was for a Genomics R&D Algorithm Engineer, with core responsibilities more focused than those at ByteDance, directly targeting the development of deep learning algorithms for genomic data analysis. The other position was for a Drug Discovery Algorithm Engineer, focusing on small-molecule drug design using Computer-Aided Drug Design (CADD) methods.
At the beginning of 2021, Biotoptix, a company spearheaded by Baidu founder Robin Li, also launched its own talent initiatives—the “Million-Dollar Leading Talent Program” and the “Million-Yuan Young Leading Talent Program.” These programs aim to attract interdisciplinary professionals who integrate biotechnology with AI, offering annual salaries of $1 million and RMB 1 million, respectively, along with additional technical platform support.
With ByteDance, Huawei, and Baidu now all in the game, and considering Alibaba and Tencent’s deep footholds in cloud platforms and biopharma, as well as the High-Performance Computing Center of the Institute of Computing Technology at the Chinese Academy of Sciences, the mass entry of internet giants into the biopharmaceutical sector has become an unmistakable trend. But how exactly will these tech companies empower this aging, conservative industry?
The surge in the biopharmaceutical sector over the past two years has been evident to all. Internet giants, which have long eyed the healthcare industry, have naturally seized this prime opportunity. However, pain points in the biotechnology-driven biopharmaceutical industry continue to mount. In terms of outcomes, the “three highs” of R&D (high capital investment, long development timelines, and high failure rates) and the “three similarities” among products (similar trial data, similar indications, and similar efficacy) remain unresolved by advancements in biotechnological R&D. On the contrary, the limited scope of applications targeted by biotechnology has intensified industry competition. The solution lies in the realm of information technology, with artificial intelligence (AI) appearing to be the long-sought answer for the pharmaceutical sector. AI applications in new drug development have moved beyond the proof-of-concept stage into widespread implementation, empowering lifecycle management of pharmaceuticals. From underlying pharmaceutical databases to real-world studies at the data application level, and from early-stage compound screening to patient recruitment during clinical trials, the presence of AI technology is evident throughout.
The primary opportunity for High-Performance Computing (HPC) to penetrate the biopharmaceutical sector lies in the vast amount of data accumulated by the industry over many years. In 2020, the Fourth Plenary Session of the 19th Central Committee of the Communist Party of China recognized data as a new type of production factor, incorporating this concept into the "Opinions of the CPC Central Committee and the State Council on Building a More Complete System and Mechanism for Market-Based Allocation of Production Factors," thereby affirming the value of data at the national level. Furthermore, the "Opinions" explicitly called for "promoting the opening and sharing of government data," "enhancing the value of social data resources," and "strengthening data resource integration and security protection." Consequently, data organization and mining will become the central theme across various industries in the near future.
The healthcare industry, due to the unique nature of its diagnostic and therapeutic applications, has accumulated vast amounts of user/patient data. Structuring these data and leveraging deep learning algorithms for mining can yield valuable insights for the healthcare sector. This explains the sustained momentum in the medical big data industry in recent years.

Consequently, the output of AI-assisted new drug R&D enterprises has gradually increased over the past two years. The value of “HPC + AI + Medical Big Data” is beginning to materialize, and application scenarios are expanding from compound discovery to other domains. The new drug development process encompasses multiple stages, including molecular discovery, preclinical research, clinical trials, and post-marketing studies. Accordingly, companies specializing in AI-driven drug discovery are progressively expanding downstream from their initial focus on early-stage molecular discovery.
Currently, numerous AI-assisted drug discovery companies are exerting efforts across various stages. In these specialized scenarios of AI-driven new drug development, the application of High-Performance Computing (HPC) is widespread. Notably, over 95% of these companies utilize AI in the compound discovery phase of preclinical research, which is the most well-known application within the industry. Traditionally, compound discovery relied on researchers manually drawing molecular models, a process that was inefficient and costly. In contrast, under the paradigm of AI-driven new drug development, leveraging Computer-Aided Drug Design (CADD) technology and deep learning from drug molecule databases, AI algorithms can perform multi-layered screening based on molecular mechanisms and drug-likeness from vast molecular libraries. This approach reduces the time required for early-stage molecular screening—traditionally taking one to two years—to approximately one month.
AI-driven compound discovery in preclinical research, as a relatively mature sector, has attracted Chinese companies such as XtalPi, DeepIntel, and IceStone Biotech, which have entered this field and are now capable of providing services to global multinational pharmaceutical companies. For instance, XtalPi announced a strategic partnership with Pfizer as early as 2018, while DeepIntel signed a comprehensive strategic cooperation agreement with Sinopharm in 2019.
In fact, high-performance computing (HPC) has found extensive applications in healthcare and medical fields beyond drug R&D, providing comprehensive computational power support to various stakeholders in the healthcare industry, including medical institutions, pharmaceutical and medical device companies, and insurance providers. For instance, a large number of AI-based medical imaging products launched in 2020 have leveraged the integration of HPC and AI algorithms, resulting in successive product releases. During the product development phase, these solutions rely on HPC and deep learning algorithms to perform deep learning on medical imaging datasets.
However, in the field of drug development, only AI-driven compound discovery has achieved relative maturity.
Recently, there has been a new breakthrough in using AI to calculate protein folding. Enumerating every possible structure of a protein would take even longer than the age of the universe. With the support of powerful algorithms and computing power, DeepMind has reduced the computation time from months to hours. AI has brought an unprecedented efficiency revolution to biology, which is of epoch-making significance for humans to conquer difficult diseases such as cancer. This allows the industry to see opportunities for industrial revolutions brought by new technologies, greatly accelerating the variety and speed of drug discovery.
To achieve significant scientific breakthroughs in the era of data deluge, analyze genomic data, and apply these insights to drug development, disease detection, and personalized therapy, we must rely on novel technologies that enable faster and more convenient analysis and processing of large datasets. Over the past decade, the analytical and computational technologies at our disposal have lacked the power necessary to process these critical data. The milestone of protein structure prediction underscores that achieving breakthrough progress in the life sciences requires leading high-performance computing (HPC) systems capable of analyzing and computing complex, fragmented, and unstructured biomedical big data.
“Big data is not just about the sheer volume of data. Big data serves as the foundation, but it must be coupled with the capability to mine data in order to ultimately generate insights,” Zhao Yu, Deputy Director of the Turing-Darwin Laboratory and COO of Zheyuan Technology, told VCBeat. In the process of data mining, artificial intelligence (AI) technologies (algorithms) provide tools for data interpretation; however, the demand for improved algorithmic efficiency has become increasingly pressing, thereby highlighting another critical element: “computing power.” High-performance computing (HPC) is one of the primary sources of such computing power.
As a computing power infrastructure, High-Performance Computing (HPC) is widely applied in cloud computing and supercomputing centers. In specific applications, cloud computing is better suited for scenarios involving massive concurrent tasks with relatively low individual computational complexity, whereas supercomputing centers excel at solving single, highly complex problems.
Over the years, China’s supercomputing sector has ranked among the world’s leaders, with the top global supercomputing spot alternating between China and the United States. The field of supercomputing is gradually transitioning from the research phase to comprehensive application. For instance, two decades ago, the Institute of Computing Technology (ICT) of the Chinese Academy of Sciences designated “biomedical big data recognition” as a foundational strategic research direction, leveraging its world-class supercomputing technologies to drive transformation in the healthcare industry.
While it is not easy to step out of their comfort zones when most companies are focused on preclinical research, leading enterprises have already begun to make attempts in this regard. For instance, the globally renowned company Insilico Medicine is no longer confined to the niche area of compound discovery but has expanded outward to cover the more complex end-to-end drug development process.
In 2016, Insilico Medicine published a paper in *Molecular Pharmaceutics*, showcasing its research and application of deep neural networks. The study proposed that transcriptional response data could be used to predict categories of molecular therapeutics, which brought the company significant acclaim. From 2016 to 2019, Insilico Medicine consistently produced research outcomes and enjoyed smooth fundraising in the primary market. In 2018, WuXi AppTec led a strategic financing round for Insilico Medicine and entered into collaborations with the company in areas such as target identification, drug discovery, and anti-aging research.
(Image from the official website of Insilico Medicine)
Insilico Medicine’s business has now expanded from compound discovery to cover the entire end-to-end process of new drug development. Its operations are structured into three core segments: early-stage target discovery, drug molecule discovery, and clinical trial prediction. As a company that has reached the industry’s top tier, Insilico is well-positioned to provide comprehensive AI-driven drug discovery and development services to the pharmaceutical industry. It is precisely for this reason that the German giant Merck Group chose Insilico as its partner, integrating Insilico’s platform into its own drug discovery projects.
The traditional model of new drug development is becoming increasingly challenging.The vast amount of accumulated research data is difficult to comprehensively cover manually; drug target development and indication selection are constrained by limited human experience and knowledge; the potential effects and side effects of lead compounds are difficult to predict manually; and the labor costs for multi-center clinical trials are rising.
The human experience-driven logic of new drug development is gradually disintegrating under the growing demand for precision and efficiency. Especially in the current context of intensifying competition in the innovative drug sector, these challenges are compelling pharmaceutical companies to seek new technological breakthroughs. There is a noticeable trend of R&D personnel shifting from biotechnology to IT departments, with the expectation of leveraging external IT expertise to resolve industry bottlenecks.
As IT giants enter niche segments of the biopharmaceutical industry, their inherent platform DNA naturally comes into play, primarily by leveraging high-performance computing (HPC) to support R&D service platforms, thereby empowering new drug development and genomic data mining. In particular, preclinical research in the field of AI-driven drug discovery has reached a relatively mature stage, marking a period of robust demand for computational power.
Among the internet giants actively expanding into this sector, all except ByteDance—which has not yet entered this business area—have leveraged their own cloud computing infrastructure to build service platforms. These include Baidu, Huawei, Tencent, Alibaba, and the Institute of Computing Technology (ICT). Their solutions cover a wide range of application scenarios, including drug molecule discovery, drug target screening, molecular dynamics simulation, neoantigen prediction, and genomic interpretation.
At this juncture, Baidu’s strategy is more focused. In the healthcare industry, there are few overlapping needs among different industry stakeholders; consequently, solutions addressing these needs remain isolated due to their specific targeting of distinct roles. Baidu has chosen to decouple services tailored to different industry stakeholders, thereby concentrating its enablement capabilities. As a result, in the second half of 2020, Baidu launched Biotech Solutions (Baitu Shengke), entering the field through biological computing to empower the biopharmaceutical sector.
Baidu Life Sciences is not the first highly specialized healthcare product launched by Baidu. Its predecessor, Lingyi Zhihui, has already carved out its niche in hospital settings, achieving core breakthroughs particularly in fundus screening and primary care. Released in the second half of 2020, Baidu Life Sciences focuses on the pharmaceutical industry, positioning itself as a life sciences platform company driven by biological computing technologies. It is committed to accelerating the R&D of innovative drugs and precision life science products, such as those for early screening and diagnosis, by leveraging high-performance biological computing and multi-omics data technologies. The company aims to make more diseases predictable, controllable, and curable, thereby realizing the vision of human longevity and healthy aging up to 100 years.
Given the maturity of AI-driven drug development, several tech giants have leveraged their own cloud platforms to support this field. On one hand, these cloud platforms open up their computing power to AI drug discovery companies, helping them achieve their R&D goals more rapidly. On the other hand,Some tech giants are gradually no longer satisfied with reaching external markets through their partners, and have embarked on the path of building their own platforms to provide services directly to pharmaceutical companies.
In July 2020, Tencent officially launched its first AI-driven drug discovery platform, “Insilico Medicine.”Spun out of the Machine Learning Center at Tencent AI Lab, Insilico Medicine provides customized services on top of its platform offerings, catering to pharmaceutical companies’ personalized needs for specific targets or data systems.
Unlike other AI-driven drug discovery platforms, Insilico Medicine has added protein structure prediction services to its small-molecule drug discovery capabilities. In 2020, DeepMind’s AlphaFold system, developed by Google, made headlines at the 14th Critical Assessment of Protein Structure Prediction (CASP) competition, achieving prediction accuracy nearly comparable to experimental methods. In fact, Tencent AI Lab has been conducting research in this field for many years, and its collaborative findings were published in Nature Communications, a Nature portfolio journal, in November 2020. Tencent AI Lab named its protein structure prediction tool tFold, and the public beta version currently offered by Insilico Medicine is precisely this tool.
Huawei EI Health is not limited to new drug development; its three core focus areas—genomic analysis, drug R&D, and clinical trials—are all key scenarios where high-performance computing (HPC) is currently being applied.These three major application scenarios have been developing for many years and have reached a relatively mature stage. Drug R&D needs no further elaboration; in the field of genetic testing, Illumina and BGI both launched their respective genetic cloud platforms, BaseSpace and BGI Online, in China in 2018; in clinical research, imaging clouds have long become a standard feature of major cloud platforms, and biomarker discovery has also emerged as one of the hot topics in the genetic testing industry over the past two years.
Among the artificial intelligence platforms established by several major tech giants, BioMap, the youngest among them, has unveiled the most ambitious vision.BioMap has structured its development into two phases. In the first phase, it leverages cutting-edge AI technologies to build a comprehensive biological computing platform. By collaborating with startups and research institutions that provide new data axes, advanced data analytics, and drug design tools, BioMap aims to establish a robust biological computing ecosystem. This ecosystem will offer life sciences companies and scientific researchers rich tool capabilities and complete solutions, ensuring high-quality service. In the second phase, BioMap will deeply participate in or lead the research and development of novel precision medicines and diagnostic products. Working hand-in-hand with partners, it strives to contribute highly innovative precision life science products to society.
At first glance, BioMap’s objectives appear no different from the approaches other cloud platforms have taken to enter the biopharmaceutical sector. However, “multi-omics data technology” sets BioMap apart from other internet giants, particularly in terms of the computational power required for data mining. The demand for computing resources grows exponentially with multi-dimensional data analysis. When the data scope ultimately encompasses comprehensive information across all stages of patient diagnosis and treatment, high-performance computing (HPC) based solely on multi-CPU parallelization may no longer suffice for data mining needs; instead, “supercomputing” is essential to achieve comprehensive exploitation of medical data.
The Institute of Computing Technology (ICT) of the Chinese Academy of Sciences (CAS) was the earliest to establish its presence in this field, and its platform has already taken shape. As early as the late 1990s, ICT began strategically positioning itself in the life sciences sector. Starting with its participation in the 1% Human Genome Project, ICT has continuously accumulated expertise. Leveraging national research initiatives—including projects funded by the National Natural Science Foundation of China (NSFC), the National High-Tech Research and Development Program (863 Program), the National Basic Research Program (973 Program), major CAS projects, and the National Key R&D Program—ICT has deeply integrated information science with biomedicine, developing numerous core technologies. Under the leadership of Professor Tan Guangming, Director of the High-Performance Computing Center at ICT and Dean of the Western Advanced Technology Research Institute of CAS, the institute, acting as a national team, pioneered the concept of “Computational Medicine.” This approach is guided by systems theory, adopts data-intensive research paradigms, utilizes artificial intelligence as its methodology, and relies on high-performance computing as its foundation. Through a dual-drive mechanism combining knowledge models and data models, it contributes novel insights and solutions to the entire industrial chain of the biomedical and pharmaceutical sectors.
Zheyuan Technology is an artificial intelligence enterprise focused on the biopharmaceutical sector, incubated by the Institute of Computing Technology, Chinese Academy of Sciences. Its developed Computational Medicine Platform aims to establish a digital testing ground for drug discovery and development. Although the company has reserves of technology across the entire process, it has currently demonstrated value in three areas: 1) discovering novel drug targets; 2) establishing inclusion/exclusion criteria for clinical trials, designing combination therapy regimens, and rescuing failed Phase III clinical trials based on novel mechanistic biomarkers; and 3) expanding new indications for marketed drugs.
The services mentioned by Zheyuan precisely address the uncharted territory of novel drug biomarker development, an area that Insilico Medicine has yet to penetrate. Zheyuan is delving deep into the medical field to uncover disease mechanisms and investigate the compatibility between drugs and the human body in real-world settings. While most companies in the industry are still attempting to leverage biocomputing to extract insights directly from big data within specific niche sectors, Zheyuan employs its proprietary computational medicine platform to shift the focus from interpreting the functions of individual genes or proteins to elucidating systems biology—particularly cellular functions and signaling pathways. Through this approach, Zheyuan identifies pattern-based novel mechanistic biomarkers.
“A breakthrough has been achieved in transitioning from biological computing to computational medicine, significantly enhancing the capacity to generate novel insights from data. ‘Building on years of foundational work, our team has developed more than 400 foundational models of deterministic intracellular events, which can be combined to simulate countless distinct scenarios of tumor evolution, thereby enabling the construction of unique digital life equations for each disease,’ said Zhao Yu.”
Mechanistic biomarkers, as the name suggests, are not merely biomarkers but also reflect underlying mechanisms. Taking the hepatic arterial infusion (HAI) regimen of FOLFOX as an example, this protocol can significantly prolong overall survival (OS) in some patients with liver cancer; however, only 30% of patients exhibit clinical response. In this case, Zheyuan developed mechanistic biomarkers to precisely stratify the patient population and elucidate the mechanisms of drug resistance. Based on these insights, a novel combination therapy regimen was proposed (thereby providing new indications for the drugs), ultimately increasing the proportion of beneficiaries to 60–80%.
In the field of immunotherapy, Zheyuan has also demonstrated the multifaceted capabilities of its computational medicine platform, offering new perspectives for drug development. For instance, how can patients with EGFR mutation-positive non-small cell lung cancer (NSCLC) derive significant benefit from PD-1/PD-L1 monoclonal antibody therapy? This remains a "holy grail" question in the relevant fields. Based on a mechanistic understanding, Zheyuan proposed that combining PD-1/PD-L1 monoclonal antibodies with various other agents could help patients achieve clinical benefits. This capability to design combination regimens grounded in mechanistic insights provides solutions for pharmaceutical companies heavily concentrated in immunotherapy R&D. Guided by the computational medicine platform, different pharmaceutical companies can conduct more focused clinical trials targeting specific indications. This approach not only improves the success rate of clinical trials but also helps companies identify their own niche indications, facilitating patient recruitment and accelerating the regulatory submission process.
In response to market demands and industry pain points, major players are increasingly building their own service platforms through substantial investments in infrastructure. In summary, service platforms that can truly achieve breakthroughs in the industry must possess the following characteristics:
(1) Gain profound insights into the development trends of the healthcare industry and understand the significance of addressing pain points;
(2) Capable of understanding and digitally characterizing the nature of diseases and drug mechanisms, thereby transcending the limitations of human experience and knowledge;
(3) Capable of establishing an end-to-end AI algorithm platform that provides comprehensive tools for drug discovery and development, covering drug target identification, compound design, biomarker development, and optimal indication selection;
(4) Possess the capability to build high-performance computing (HPC) infrastructure, directly integrating computational architectures, platforms, and applications into medical practice;
Expansion of the knowledge graph is fundamental. When focusing solely on molecular discovery, a company’s knowledge graph need only cover research data related to potential drug molecules to adequately support the molecular discovery process. However, as its scope gradually expands to encompass clinical research, the knowledge graph must correspondingly evolve from a pharmaceutical knowledge graph into a more comprehensive medical knowledge graph.
The expansion and control of computing power are prerequisites. As knowledge graphs expand into the medical domain, the volume of data requiring analysis increases significantly. Therefore, during the data mining process, higher computational power support and supercomputing parallel optimization technologies are essential for implementation.
Iterative Algorithm Updates as a Methodology. Only after establishing a foundation of knowledge graphs and computational power can enterprises begin to uncover insights from big data and continuously iterate their algorithms through ongoing research.
Currently, major tech giants have established the computational infrastructure necessary for artificial intelligence applications through massive investments in cloud computing and high-performance computing (HPC) centers. They are dedicated to building cloud service platforms on top of this infrastructure. As medical big data continues to expand, pharmaceutical companies gain deeper insights, and demand grows for AI applications in highly specialized scenarios, the need for computational power will increase exponentially. This trend will further elevate expectations for the tool-like capabilities of service platforms, thereby necessitating continuous performance enhancements from cloud computing and HPC centers. In short, both technology giants and state-backed enterprises such as Zheyuan Technology will become vital forces in exploring and advancing various aspects of computational medicine, capitalizing on the trillion-dollar market opportunity in the biopharmaceutical industry.