Although clinical research and application development have entered the era of intelligence, the vast amount of medical imaging data in China has not yet been transformed into structured, actionable big data. Standardized medical imaging datasets remain a scarce resource, hindering the rapid advancement of related scientific research and industry.
Now, this situation is about to change.
On July 5, 2022, the Capacity Building and Continuing Education Center of the National Health Commission (hereinafter referred to as the “Continuing Education Center”) issued the Notice on Publicizing the Results of Project Approval for the Radiological Imaging Database Construction Initiative, officially ushering in the systematic development of imaging databases.
The Radiological Imaging Database Construction Project, with the Capacity Building and Continuing Education Center of the National Health Commission as the host organization, will comprehensively lead and coordinate the systematic development of the database. The project plans to carry out key database construction tasks, including data collection, data processing, quality control, scientific research, product development, technology transfer, and training on medical data standards.
The document indicates that the first batch of radiology imaging database construction projects comprises a total of 13 items. These include databases for major diseases that seriously affect the life and health of Chinese residents, such as the cardiovascular and cerebrovascular imaging database, the chronic liver disease and primary hepatocellular carcinoma imaging database, the nuclear medicine multimodal imaging database for ischemic heart disease, the gastrointestinal disease imaging database, and the emergency imaging database. Additionally, eight other proposed projects have been included in the reserve pool, with the expectation of being incorporated into future construction batches.

List of Approved Projects for the 2022 Radiology Imaging Database

List of Reserve Topics for the Radiology Imaging Database
The systematic development of this imaging database features a more granular classification of disease types. Clear plans and requirements have been established for the various standards involved in all stages, from project initiation to subsequent image data collection, standardization, and quality control.
Meanwhile, during the construction of this database, the project team placed significant emphasis on the integrative role of engineering teams. The Capacity Building and Continuing Education Center of the National Health Commission established a medical-engineering interdisciplinary team, which prepared for the entire process from foundational work to full implementation. This included the integration of multi-source heterogeneous data, technical safeguards for secure multi-center data collection, development of a distributed data collection system and both general-purpose and customized annotation platforms, definition of the technical roadmap for disease-specific databases, and compliance with relevant security level protection standards, electronic medical record (EMR) data standards, and the openEHR standard framework, thereby paving the way for technology-driven database development. Furthermore, domestically developed original AI algorithms will be integrated into the construction process at appropriate stages to support scenarios such as data organization, image extraction, lesion reconstruction, and rapid validation of research directions. Consequently, this entire initiative will accelerate the application and breakthroughs of both database and AI technologies in the field of radiological imaging.
To understand the background, challenges, and future value of database development, VCBeat conducted a detailed review of relevant documents and interviewed Professor Liu Shiyuan, Chairman of the Expert Committee for the “Radiological Imaging Database,” to address these three questions.

Liu Shiyuan, Chief Physician, Professor, Doctoral Supervisor
Director, Department of Radiological Diagnosis, Shanghai Changzheng Hospital
Chairman of the Chinese Society of Radiology
Chairman of the Board, China Medical Imaging AI Industry-Academia-Research-Application Innovation Cooperation Platform (Alliance)
Chairman of the Expert Committee on Radiology Imaging Databases
Two years ago, the application of artificial intelligence products in medical imaging began to break through, with a few products obtaining Class III medical device registration certificates from the National Medical Products Administration. However, on the whole, digital and structured big data of medical images failed to form a system; the medical image database remained one of the important factors restricting AI development; the approval of single-task AI products seemed somewhat insignificant in the face of extensive clinical needs.
To break the deadlock caused by the critical bottleneck of missing medical imaging datasets, the Capacity Building and Continuing Education Center of the National Health Commission launched the construction of a radiological imaging database in 2020. Professor Liu Shiyuan was appointed as the Chairman of the Expert Committee for this project, leading the effort to establish a national-level, high-standard medical imaging database.
“Whether in scientific research, clinical education, or the development of artificial intelligence, we require large-scale databases that are diverse, standardized, and highly annotated; however, such databases are currently in short supply. Furthermore, the formulation and refinement of supporting laws and regulations concerning data ownership, security, ethics, and related issues remain relatively lagging,” stated Professor Liu Shiyuan.
Against this backdrop, Team Liu Shiyuan has anchored its inaugural project in the relatively mature field of pulmonary nodules, aiming to promote lung nodule screening and enable precise differential diagnosis between benign and malignant lesions by constructing a standardized medical imaging database for pulmonary nodules. Furthermore, through the development of this pulmonary nodule database, the team seeks to establish expert consensus on key aspects—including the essential elements of datasets, construction processes, development standards, annotation protocols, and quality control—thereby providing a reference framework for the subsequent development of other databases.
As of October 2021, the team leveraged a data platform developed based on medical big data and AI technologies to complete data extraction, processing, and transformation, ultimately establishing a high-quality specialized imaging database for pulmonary nodules. Currently, the value of this dataset and its construction methodology have gained recognition within the medical community.
Following the successful initial pilot, the systematic development of the imaging database was launched.
As stated in the Implementation Plan for the Construction of a Medical Imaging Database for Major Diseases, the project will be carried out over a five-year period in three phases. Phase I, from the date of contract signing through the end of December 2022, is the Standards Establishment Phase; Phase II, from 2023 to 2025, is the Data Platform Construction Phase; and Phase III, from 2025 to 2027, is the Development and Application Phase.
Specifically, the first phase will establish standards and consensus for medical imaging acquisition protocols (for single or multiple diseases based on anatomical sites or organs), image recognition criteria, image segmentation and annotation standards, and related database construction. It will also involve building a technical team for data development and setting up technological and management platforms for databases.
Following the completion of standard formulation, the more critical phase is the second-stage construction. The tasks in this phase can be divided into four aspects: establishing a multimodal, large-capacity, high-quality, and richly annotated medical imaging database tailored to the characteristics of the Chinese population and aligned with clinical diagnosis and treatment guidelines; building a secure research service platform for artificial intelligence-based medical imaging across multiple diseases; developing a training system; and promoting the research, development, and application of relevant technologies.

Phase II Construction Tasks and Content for the Medical Imaging Database of Major Diseases
The implementation plan does not specify the detailed construction tasks and content for Phase III; however, if the tasks for Phases I and II can be completed within the allotted timeframe, medical imaging research and its derivative applications will receive sufficiently robust support, thereby enabling the transition to the next stage of development.
Overall, the project plans to establish, over the next three years, standardized protocols for single-disease or multi-disease medical image acquisition and recognition, image segmentation and annotation standards, and consensus on related database construction standards, all based on anatomical sites or organ-specific diseases. It aims to build a multimodal, large-scale, high-quality, and richly annotated medical imaging database that aligns with the characteristics of the Chinese population and clinical diagnosis and treatment guidelines, thereby leveraging high-quality national medical imaging data resources to support the Healthy China initiative.
The value of radiology imaging databases lies in their ability to address the current challenges facing imaging data in China.
“China is a major holder of medical data, with medical imaging accounting for 80%–90% of such data and continuing to grow at a rate of 30%,” Professor Liu Shiyuan told VCBeat. “However, the sheer volume of medical data does not mean that China has established a systematic framework for medical big data. More than 80% of this data is unstructured, making it impossible to extract its value.”
Establishing a radiology imaging database is a critical solution. By creating large-scale, standardized, and structured radiology imaging databases, we can not only set industry standards and break down barriers between hospitals, but also apply mature databases to medical education and scientific research; this will facilitate the training of relevant professionals and drive innovation in precision diagnosis and treatment.
However, while it is simple to clarify the value of radiology image databases, building them in practice is extremely difficult.
“The numerous calls for establishing standardized databases have yielded limited results, indicating that this is a highly complex and challenging endeavor.”
Data diversification is the first challenge. According to Professor Liu Shiyuan, establishing a standardized imaging database encounters issues of data source diversity and non-homogeneity. How to incorporate these “differences” into unified specifications or standards is one of the key difficulties. In addition, integrating multimodal data—such as CT, mammography, and MRI images—with important documents like clinical histories and laboratory test results into a data system that facilitates classification, extraction, and collaborative use is another critical issue to consider in data collection.
Next is standardized data annotation. “For different signs in medical images, we must reach a consensus on quantitative identification methods, segmentation methods, and classification methods. Training should be conducted based on this consensus before proceeding with annotation. The annotation process must meet the requirements for data traceability and pass through three-level quality control, including arbitration review, before being stored in the database. This ensures the accuracy of the standardization process.”
Finally, there is the management and updating of the database. “The database must remain dynamic, with a continuous increase in data volume and ongoing updates to its data composition, while ensuring data security throughout the entire process. This also requires our sustained effort.” Centralized and decentralized design architectures, as well as the advancement of related ethical frameworks, are also important areas of exploration in database development.
Given the complexity of the construction process, effective top-level design is particularly crucial. At the outset of database development, the personnel in charge must clearly define the purpose of the database (i.e., which disease types it will serve), its intended applications (for R&D, education, or scientific research), and quality control measures (including data quality control and process quality control). They must also establish data standards and annotation methods, and gradually plan the division of labor and collaboration within the entire team. Only after completing these steps can the personnel in charge proceed with implementation.
Beyond the high technical barriers inherent in database construction itself, the entire process demands substantial human, financial, and energetic support from relevant stakeholders over an extended period. Consequently, most projects fail to survive the early stages of database development, which are characterized by limited and slow returns. Therefore, state support and guidance for the establishment of radiological imaging databases are essential, while enterprises, physicians, and scholars must also collaborate closely to participate jointly.
Professor Liu Shiyuan stated, “Only through the concerted collaboration of government, enterprises, and research institutions can we avoid fragmentation, disorder, substandard quality, and redundant construction, thereby establishing a radiological imaging database in the fastest and most effective manner.”
There is a large number of professionals in the field of medical imaging. To determine who will benefit from the establishment of the database, it is necessary to conduct independent analyses from three perspectives: education, scientific research, and research and development (R&D).
First is education. With the widespread implementation of medical AI in radiology departments, the training model for new radiologists needs to be adjusted in accordance with technological advancements. Furthermore, continuing education for existing radiologists must keep pace with changes in workflow and other factors brought about by new technologies. However, to date, the lack of standardized databases has hindered the development of AI-based training using standardized data for radiologists, making it difficult for physicians to engage in independent practice.
In this context, the establishment of a standardized radiological imaging database serves as a significant complement to the existing imaging education system. Once established, the database will enable early- and mid-career radiologists to both construct models and perform validations during their learning process, while also facilitating the acquisition of clinical diagnosis and treatment experience based on case data. Furthermore, by adopting clinically driven applications as the research entry point, it is possible to explore patient-centered, disease-oriented, multi-center big data research for precision medicine that integrates imaging, pathology, biochemistry, and even genomics.
Second is scientific research and AI development. Data forms the foundation of AI. Currently, most AI-assisted diagnosis and detection systems rely on supervised learning. The product development, testing, and quality control processes all depend on large volumes of standardized imaging cases. Imaging datasets jointly established and endorsed by regulatory agencies, hospitals, enterprises, and educational and research institutions can support the upstream and downstream needs of AI product development, thereby facilitating the clinical deployment and adoption of more newly developed AI products.
Therefore, the development of a radiological imaging database, led by the National Health Commission and collaboratively built by multiple hospitals, is of paramount importance. Supported by a database that closely mirrors real-world clinical scenarios, both enterprises seeking regulatory approval for AI development and physicians conducting imaging-related research can more frequently undertake multicenter studies. This will help accelerate advancements in medical imaging research, effectively expand the scope of medical AI development and application, and propel the field of medical imaging into a period of rapid growth.
While building a medical imaging database is undoubtedly important, it is equally crucial to consider how to ensure the database becomes fully operational.
Professor Liu Shiyuan stated, “To ensure that databases are actively utilized rather than left idle, we need to establish third-party public platforms that fully account for compliance, security, and ethical considerations. These platforms should operate in an authoritative, impartial, neutral, and standardized manner, in accordance with national regulatory requirements. Meanwhile, we must continually explore compliant and secure operational models under the certification of regulatory authorities, thereby truly unlocking the academic, scientific, and social value of databases, and ensuring that people’s data serves their health.”
The future value of imaging databases is promising. As domestic radiological imaging databases enter the era of disease-specific specialization, digital technologies—such as artificial intelligence in medical imaging—are advancing into deeper exploration. We may be able to uncover more logical correlations between imaging findings and diseases through data, thereby truly unlocking the multifaceted values of radiology, including “intelligentization,” “precision,” “clinical integration,” “pre-hospital application,” and “network connectivity.”