
Medical Big Data Platform
Recently, a commentary titled “Opportunities and Challenges of Large-Scale Medical Databases for Gastrointestinal Tumors” was published in the Chinese Journal of Digestive Surgery, a prominent domestic medical journal. Co-authored by Peking University Cancer Hospital and Yidu Cloud, the article analyzes common issues prevalent in Chinese hospitals, including low data accuracy, slow update cycles, and underutilization of data.
As stated in the article, an increasing number of high-quality studies in oncology are based on the analysis of data from tens of thousands of cases. From research into tumorigenesis mechanisms to clinical diagnosis and treatment, prevention, and monitoring, all aspects involve data collection, management, and analysis. Therefore, high-quality clinical data constitute the primary foundational prerequisite for future precision clinical decision-making and high-level clinical research.
Furthermore, the article highlights three prevailing issues with current clinical databases: “small,” “poor,” and “disordered.” “Small” refers to the limited scale of many databases, characterized by a small number of cases and sparse data entry structures. “Poor” denotes substandard data quality; in particular, due to high population mobility in China, follow-up data are difficult to collect, leading to data loss. “Disordered” indicates poor data consistency and low feasibility for database integration and sharing. Therefore, how to leverage big data technologies to address the “small,” “poor,” and “disordered” state of domestic medical databases has become a pressing challenge for researchers.
Exploration of the Gastric Cancer Disease Database
Yidu Cloud, a leading domestic medical big data company, has established a gastric cancer disease database using novel big data technology to ensure quality control of research data. First, Yidu Cloud integrates data from hospital EMR, LIS, PACS, and HIS systems to identify patients diagnosed with gastric cancer. It then processes the data through EMPI, mapping, structuring, and normalization techniques to create a disease database. Within this database, users can view, filter, and manage data, as well as perform double-entry verification.
Additionally, patients can be included or excluded based on specific requirements to establish research projects, with data collected in accordance with study protocols. When follow-up data are required, follow-up rules can be configured to remind physicians to track patient revisit appointments or enter data. Meanwhile, the Yidu Cloud disease database offers statistical functionalities such as descriptive statistics, univariate analysis, multivariate analysis, correlation analysis, and survival analysis, enabling users to view the distribution of disease indicators, analyze patient data, and preliminarily validate research hypotheses at any time. Furthermore, genomic and biobank data can be integrated with clinical data to continuously expand the breadth of the database, thereby supporting a wider range of research studies.
Three Major Advantages of the Gastric Cancer Disease Database
Overall, the Yidu Cloud Gastric Cancer Disease Database has three major advantages compared to traditional databases, namely
1) Structured Data Processing Capabilities Post-NLP
2) Normalization and Standardization Capability
3) Complex Logical Reasoning and Computational Capabilities
Medical data often contains large volumes of unstructured text, which cannot be directly subjected to statistical analysis and computation. Yidu Cloud leverages natural language processing (NLP) technology to perform structured extraction from such text. As illustrated in the figure below, the gastric cancer disease database can structurally extract and process descriptions of surgical procedures and medical record examination texts.

The gastric cancer disease database extracts values based on priority, retrieving all values from the following sources: diagnosis name, pathological conclusion, surgical procedure description, and examination findings (gastroscopy, endoscopic ultrasound, CT). After deduplication, the Borrmann classification is output, with various inconsistent classification descriptions standardized into four types (Types 1–4).

Hospital department data varies, and some data cannot be found in the raw data. Yidu Cloud’s disease database can leverage comprehensive medical logic to perform in-depth calculations on diagnostic and treatment data sourced from different hospital production systems, thereby deriving data not explicitly present in the raw records. The following is an example of pathological staging.
Standard Dataset for Gastric Cancer Disease Models Is Under Continuous Upgrade
Version 1.0 of the Gastric Cancer Disease Database references mainstream domestic and international oncology guidelines and standards, including but not limited to:
★ AJCC TNM Staging
★ WHO Pathological Classification
★ LOINC (Logical Observation Identifiers Names and Codes)
★ Common Terminology Criteria for Adverse Events (CTCAE)
★ INN International Nonproprietary Names
★ Nomenclature in the guidelines / Expert consensus adopted where no standard exists
The value range standards for many fields in the surgical treatment and pathology sections also refer to:
★5th Edition of the Japanese Gastric Cancer Treatment Guidelines
(Japanese Gastric Cancer Treatment Guidelines) (Adopted)
★Japanese Gastric Cancer Treatment Guidelines, 15th Edition
(Japanese Classification of Gastric Carcinoma);
Currently, the standard dataset for gastric cancer disease models has been upgraded to version 2.0. Building upon the original model, Yidu Cloud has implemented comprehensive enhancements, including in-depth processing of fields with complex logic, incorporation of a broader range of reference standards, and improved completeness of data standard value domains. For instance, fields related to examination and treatment now incorporate references from the latest CSCO Guidelines for Diagnosis and Treatment of Gastric Cancer (2018, 1st Edition).
To date, Yidu Cloud has collaborated closely with leading experts in China to co-develop standardized datasets for disease models covering more than 40 conditions. These span multiple therapeutic areas, including gastrointestinal cancers, respiratory cancers, gynecological cancers, urological cancers, head and neck cancers, hematologic disorders, and cardiovascular diseases, with ongoing expansion and upgrades. In the field of gastric cancer, in addition to its partnership with Peking University Cancer Hospital, Yidu Cloud has established collaborations with top experts at institutions such as the First Affiliated Hospital of China Medical University and Nanjing Drum Tower Hospital. Through continuous accumulation of expertise and resources, Yidu Cloud aims to foster further collaborations with more experts on gastric cancer data platforms from a technical support perspective, thereby contributing to the development of big data infrastructure, related research, and standardized diagnosis and treatment protocols for gastric cancer in China.
According to industry experts, driven by national policies, the integrated application of data on resident health and social security will facilitate easier access to health information covering the entire life cycle of residents. By sharing data with existing hospital databases and building a big data platform tailored to China’s national conditions, it is possible to gradually establish large-scale disease databases with international influence. In the future, healthcare institutions are expected to leverage Yidu Cloud’s disease database to create a new ecosystem for health big data applications.