Data has become a core economic resource and a fundamental factor of production in today’s world, and China has made corresponding strategic arrangements in this regard. The “Opinions of the Central Committee of the Communist Party of China and the State Council on Building a More Complete System and Mechanism for Market-Based Allocation of Factors of Production,” released in April 2020, explicitly recognized data as a new type of production factor for the first time, placing it on par with traditional factors such as land, labor, capital, and technology.
On November 25, 2021, the Shanghai Data Exchange was officially established. It will focus on addressing key common challenges in data trading, such as difficulties in confirming rights, pricing, establishing mutual trust, market access, and regulation. The exchange will advance the formulation of standards and the development of systems for defining data ownership, promoting open sharing, facilitating transaction circulation, and strengthening supervision and management.
Although the immense value embedded in healthcare data is widely recognized within the industry, it was not included among the initial categories of tradable data. One major reason is that data trading remains in its early exploratory stage; furthermore, there is significant room for improvement in the usability of healthcare data in China. In response, relevant domestic healthcare data sectors are taking concrete actions to unlock data value. For instance, ClinBrain, a leading enterprise in medical big data, has fully leveraged data value through “big data + artificial intelligence” solutions, thereby empowering healthcare institutions.
Generally, medical artificial intelligence leverages deep learning to process two types of data: imaging and text. Although imaging data has attracted greater interest in the capital markets and benefits from more mature technology, text-based data—primarily electronic health records (EHRs) and prescriptions—is ubiquitous in clinical scenarios and has been widely integrated into healthcare information systems.
For text-based data, the industry generally constructs knowledge graphs and develops natural language processing (NLP) technologies to apply artificial intelligence for the automatic identification, filling, supervision, correction, and analysis of such data. As these technologies mature, the industry is also exploring the use of text data to provide auxiliary diagnostic support for physicians.
By integrating and processing raw data scattered across healthcare information systems into standardized, structured data suitable for research and clinical use, and combining this with the valuable knowledge and experience of experts, AI companies embed these elements into algorithms to develop disease models, as well as specialized disease databases and disease research networks. This AI foundation can support medical research, clinical diagnosis and treatment, and hospital operational management.
It is evident that the essence of artificial intelligence lies in its function as a data processing tool, which requires substantial data to support machine learning. Consequently, medical big data, which aligns perfectly with this requirement, had already begun to take shape by 2016.
However, not all data are suitable for machine learning. On the contrary, artificial intelligence imposes stringent requirements on data quality. Due to historical and customary factors, China’s medical sector has long prioritized clinical practice over data management, resulting in healthcare data that are voluminous yet of poor quality, lacking unified standards, and fragmented into data silos across medical institutions. These issues have significantly hindered the development of big data in health and medicine.
Qin Xiaohong, Co-Founder of ClinBrain, believes that there are several challenges in successfully applying existing medical big data to artificial intelligence.
First, the volume of raw data in existing medical datasets is substantial, reaching the petabyte (PB) scale; however, there are significant issues with data standardization. The first issue is non-standardized data structures. Due to the lack of corresponding mandatory standards, data structures vary considerably across different vendors and hospitals. The second issue is non-standardized data content. In the absence of unified templates, physicians from different hospitals, or even within the same hospital, may use varying descriptions for the same disease when documenting medical records.
For example, the term “Grade 1” in “Grade 1 hypertension” may be documented in various formats, such as “Level 1” or “Class I.” While humans can easily categorize these different expressions as equivalent—or distinguish them under specific conditions—machine learning models are unable to make such distinctions autonomously and therefore require pre-processing at the data level.
Secondly, while China’s healthcare industry is experiencing rapid development, severe data silos persist among hospitals and departments, making the utilization of health and medical data extremely challenging. Although the government has initiated efforts to address this issue, a comprehensive solution will take time to achieve.
Finally, medicine is a highly specialized field. Even with data governance capabilities, lacking the corresponding medical background makes it virtually impossible to leverage these data to further empower clinical practice or research.
It is evident that the successful integration of medical big data with artificial intelligence presents a high barrier to entry. This requires big data enterprises not only to possess data mining capabilities but also to have profound expertise in data analysis and governance, along with a deep understanding of the characteristics and needs of the medical industry. In this regard, ClinBrain, as a pioneer in “Medical Big Data + AI,” enjoys inherent advantages.
Throughout the long-term governance of medical big data, ClinBrain has standardized non-standard data through mapping, performed post-structuring on unstructured data, and cleaned dirty data, thereby establishing a comprehensive industry-standard library and medical terminology database for healthcare. This approach has effectively resolved data quality issues.
Currently, authoritative Grade A tertiary hospitals, including West China Hospital, Ruijin Hospital, the First Affiliated Hospital of Naval Medical University, Southwest Hospital of Army Medical University, Fudan University Shanghai Cancer Center, and Shanghai Mental Health Center, have partnered with ClinBrain to build medical big data governance and application platforms, offering high praise for its solutions.

ClinBrain’s Three Major Data Center Product Layouts (Image from ClinBrain)
To break down data silos within hospitals, ClinBrain has developed the ClinData data middle platform product through continuous R&D efforts and accumulated experience, iterating it multiple times in practice. Even when corresponding HIS and EMR information systems do not have open interfaces, ClinBrain can seamlessly integrate data from hundreds of systems provided by dozens of vendors into a unified data middle platform without requiring any interface modifications, thereby achieving connectivity among internal “data silos” within hospitals.
Furthermore, ClinBrain has accumulated extensive experience in the healthcare industry and established a large-scale professional medical team. These medical experts have developed strong synergy with data developers, enabling effective collaboration in the extraction and processing of clinical data.
Currently, ClinBrain has established a strategic presence in specialized fields such as natural language processing, knowledge graphs, optical character recognition, automated machine learning, and clinical decision support systems, empowering hospitals through solutions for rare disease clinical decision-making and intelligent prevention and management of venous thromboembolism (VTE).
Core Foundation: Natural Language Processing and Knowledge Graphs
Since 2014, ClinBrain has been strategically positioning itself in the fields of medical big data and AI, exploring the integration of big data with artificial intelligence. At that time, the utilization of medical data faced numerous pain points, and medical AI was still in its infancy. ClinBrain’s forward-thinking philosophy and practical implementation approach gained industry recognition and support, enabling continuous iteration and optimization of its underlying technologies.
Over the past eight years, ClinBrain has refined its proprietary natural language processing (NLP) technology, developing a post-structured data processing system that covers various types of textual data, including medical records, CT reports, ultrasound reports, MR reports, and pathology reports.
ClinBrain has also optimized its products to better meet the unique needs of the healthcare sector. “Hospitals, driven by research requirements, desire internal tools for data structuring; however, the structured elements vary across different disease types. In addition to providing general-purpose models, ClinBrain offers an annotation platform that enables physicians to perform text annotation themselves, thereby facilitating the automated training of personalized models,” cited a colleague from ClinBrain’s AI department.
Furthermore, a vast amount of medical literature and clinical guidelines in the healthcare industry are presented in PDF format, creating an urgent need for effective data extraction and utilization. Leveraging natural language processing technologies and ClinBrain’s accumulated model capabilities, the company has achieved content recognition and extraction from PDF files of medical literature and clinical guidelines. This capability extends beyond standard text to include the extraction of more complex elements such as tables and flowcharts.
Furthermore, ClinBrain’s medical knowledge graph centers on domain-specific medical knowledge. By establishing associative relationships among medical entities, it systematically organizes textual knowledge to enhance machine understanding and processing. This facilitates data search, mining, and analysis, provides a foundational knowledge base for artificial intelligence implementation, and delivers knowledge and tool resources to the industry. In addition, ClinBrain has built an extensive library of healthcare industry standards and a comprehensive medical terminology database, which currently contains over 1.6 million terms.
ClinBrain has currently integrated knowledge graphs with the diagnosis and treatment of rare diseases. First, rare diseases are, by definition, uncommon in routine clinical practice, making them difficult to diagnose and leading to frequent missed or misdiagnoses. Second, clinical intervention for rare diseases is challenging; among more than 7,000 known rare diseases, only approximately 400 have available treatments. Finally, the literature on rare diseases is updated rapidly, making it difficult for clinicians, who are already burdened with heavy workloads, to keep abreast of the latest research and advances in diagnosis and treatment.
However, global rare disease experts have long reached a consensus that earlier diagnosis of rare diseases is more conducive to preventing disease progression, facilitating therapeutic intervention, and reducing the burden on families. ClinBrain fully leverages vast medical knowledge and an artificial intelligence analytics engine, integrating structured, semi-structured, and unstructured medical information to improve and enhance the efficiency of decision-making for rare diseases through human-computer interaction.
Building on its foundational natural language processing capabilities, ClinBrain continues to expand its clinical decision support system. By integrating big data and AI with knowledge graphs, artificial intelligence can enable auxiliary functions such as disease prediction and treatment recommendation across diverse clinical scenarios.
Typical Applications of ClinBrain’s Big Data and Artificial Intelligence
Building on the aforementioned foundational AI technologies, ClinBrain has introduced “big data + artificial intelligence” solutions across multiple disease-specific domains; notably, its Intelligent VTE Prevention and Management Platform and its Intelligent Decision Support System for Rare Diseases deserve special mention.
Venous Thromboembolism (VTE) refers to the condition in which thrombi form within veins, detach, and enter the systemic circulation, causing embolism. A portion of these emboli travel to the lungs, potentially leading to fatal pulmonary embolism. The occurrence of fatal pulmonary embolism in hospital settings has become a potential risk to medical quality and safety, posing a serious challenge for clinical healthcare professionals and hospital administrators. Patients across many clinical departments are at risk of VTE. Due to its insidious onset and non-specific clinical manifestations, VTE is prone to misdiagnosis and missed diagnosis. Once it occurs, it is associated with high rates of mortality and disability.
However, VTE is a preventable condition; proactive and effective prevention can significantly reduce its incidence, while standardized diagnosis and treatment can markedly lower its case fatality rate. Nevertheless, the current status of VTE prevention in clinical practice remains suboptimal. To strengthen comprehensive in-hospital prevention and control capabilities for VTE and enhance healthcare professionals’ awareness of its severity across all departments, ClinBrain has developed an intelligent VTE prevention and management platform.
This platform leverages the hospital’s big data middle platform and AI models to provide a high-quality basis for clinical decision-making. By offering standardized and customized assessment scales, and integrating an AI-powered automated decision engine for disease evaluation and treatment recommendations, it enables end-to-end management of venous thromboembolism (VTE) prevention and control through a clinician-assisted diagnosis and treatment system, a quality control management system for diagnosis and treatment, and a patient education and follow-up system.

ClinBrain VTE Intelligent Prevention and Management Platform
It enhances the quality of VTE prevention and control management through a multi-dimensional, three-tiered prevention system. First, by integrating automated screening for VTE-related risk factors, it alerts healthcare providers to patients’ VTE risks and establishes primary prevention strategies targeting underlying causes, thereby reducing VTE incidence. Second, it implements a dynamic monitoring system for secondary prevention to early identify high-risk individuals and promptly notify physicians to take appropriate interventions, further lowering VTE incidence. Finally, based on risk assessment results, it standardizes VTE prophylactic treatment pathways, implements tertiary prevention to prevent disease progression and deterioration, improve patients’ quality of life, extend survival, and reduce mortality.
By implementing the ClinBrain VTE Intelligent Prevention and Management Platform, major hospitals have achieved fully automated, end-to-end inpatient assessment for venous thromboembolism (VTE). This has reduced the workload of healthcare professionals by an average of 4 hours per patient for VTE assessment and management. The VTE risk assessment rate increased from 46.14% to 93.22%, shifting the assessment model from a single-point evaluation to continuous monitoring throughout the hospital stay. Average inpatient costs for patients were reduced by half. Furthermore, the platform provides real-time identification and proactive alerts for high-risk patients, enabling healthcare staff to promptly identify those at moderate to high risk. Consequently, the awareness rate of such at-risk patients improved from having no data support previously to 100%.
The Clinical Decision Support System for Rare Diseases represents the latest exploration in the comprehensive application of ClinBrain’s foundational artificial intelligence technologies. It comprises a rare disease decision-making interaction system, a disease phenotype analysis system, and a rare disease decision engine. This system facilitates the diagnosis and treatment of rare diseases by conducting comprehensive assessments of clinical phenotypes, disease knowledge, and other relevant information, thereby generating a list of candidate rare diseases. Its primary function is to integrate patient disease phenotypes and evaluate them against a database of over 7,000 known rare diseases, providing scoring support to assist clinicians in achieving accurate diagnoses.
On one hand, knowledge graphs enable efficient storage, management, and access to vast amounts of disease-related information, facilitating rapid retrieval and querying of known rare diseases and their research advancements by clinicians. On the other hand, benefiting from advances in artificial intelligence algorithms and models, intelligent clinical decision-support engines for rare diseases can rapidly synthesize, archive, identify, and distinguish the information required for rare disease diagnosis, perform preliminary assessments of patient symptoms, and reduce repetitive and tedious tasks for clinicians, thereby allowing them to devote more effort to the identification, diagnosis, and treatment of genetic disorders.
ClinBrain’s Rare Disease Decision Support System achieves technological innovation by unifying and expanding the mapping of Chinese terms for concepts across multiple knowledge graphs, and by mapping commonly used clinical descriptions to standard disease phenotype concepts. Subsequently, it calculates the similarity between any two diseases based on phenotypes using various similarity algorithms, aiding in differential diagnosis. Once the phenotype distribution of a new case is captured, the system can compute its similarity to each rare disease within the knowledge graph and suggest potential rare disease diagnoses.
Furthermore, the system can visually present hospital data on rare diseases through charts and graphs, and rapidly archive different types of rare diseases along with their associated information, thereby facilitating physicians in conducting scientific research efficiently.
Exploring the application of technologies such as artificial intelligence and big data to support the future development of medicine and improve human health is a key focus of national strategy. To this end, the state has continuously introduced relevant policies and refined the corresponding top-level design. As a result, “medical big data” has remained a hot topic in recent years. The consensus within the industry is that “those who master data master the future.” ClinBrain has spent nine years building China’s leading “ClinBrain Data Brain.”
Through its deep engagement in the field of medical big data, ClinBrain has established a competitive barrier based on “big data + artificial intelligence.” In the future, ClinBrain will explore and implement additional medical application scenarios, such as intelligent decision support systems for venous thromboembolism (VTE) and rare diseases, to enhance the quality of scientific research and clinical practice. By facilitating the development of medical data applications and providing data-driven intelligence support to the healthcare industry, ClinBrain aims to ultimately build the “ClinBrain Medical Brain,” a comprehensive solution serving all aspects of healthcare.