
For a long time, the development of medical big data in China has been in a state of “crossing the river by feeling the stones.” Hospitals have poor data infrastructure and relatively weak application capabilities; even at the level of data collection, there is still a long way to go.
In response to this current situation, the state has frequently allocated funds over the past two years to intensify efforts in promoting the informatization transformation and upgrading of hospitals.
In September 2017, the National Health and Family Planning Commission approved the preliminary designs and investment estimates for big data-related projects at four hospitals: West China Hospital of Sichuan University, Fuwai Hospital of the Chinese Academy of Medical Sciences, Peking University People’s Hospital, and Children’s Hospital of Fudan University.Totaling RMB 67.71 million. Among them,Central Budgetary Investment of RMB 42 Million, with the remaining funds to be raised by each hospital independently. Investment in information technology infrastructure at tertiary hospitals has accelerated significantly.
At present, the focus of construction in China’s Grade A tertiary hospitals has largely shifted to information integration platforms and interoperability platforms. To implement the “Guiding Opinions of the General Office of the State Council on Promoting and Regulating the Application and Development of Health and Medical Big Data” and the “Healthy China 2030” Planning Outline, data structuring and quality have become the core priorities for the next phase of hospital informatization development.
In addition to the growing national emphasis on data, the industry is also making frequent and significant efforts at the data level. The strong focus on data structuring and quality stems from the fact that a solid data foundation is crucial for translating the concept of medical big data into tangible benefits in the future.
If 2017 is referred to as the inaugural year of medical artificial intelligence, few would likely object. When evaluating the capabilities of AI companies within the industry, it is common practice to comprehensivelyAlgorithms, Computing Power, DataThree key factors. The prevailing consensus surrounding these three elements is that algorithms are unlikely to become the absolute core competitive advantage for enterprises in the future, and the high cost of computing power is prohibitive for most companies. Consequently, data has become the focal point of competition among all stakeholders in the industry.
How can we ensure that raw data is professional, structured, and multidimensional, thereby making it possible to develop artificial intelligence with authenticity and accuracy in the future? A big data company named Cable Wen Bo know Technology has provided its own answer:"Pre-structured" + "Disease-Specific"。
What are the advantages of pre-structuring over post-structuring?
Cable Wen Bo Know’s core product, Boshi Medical Cloud, is currently the largest specialized disease-oriented pre-structured electronic medical record platform in China in terms of application scale. It has alreadyNearly 500 Grade-A Tertiary Hospitals, Nearly 4,000 DepartmentsUsing applications based on the Boshi Medical Cloud Platform.
In the past, clinicians often spent 98% of their research time organizing cases and retrieving medical records, with only 2% of their time generating effective scientific value. After adopting the disease-specific pre-structured electronic medical record service based on the Boshi Medical Cloud, physicians can save 95% of the time spent on data organization, allowing them to devote more time to patient diagnosis and treatment, as well as to the details of scientific research and manuscript preparation.
The most critical factor in the future development of medical artificial intelligence is high-quality, structured clinical data. The optimal approach to acquiring such data is through pre-structured entry based on disease-specific electronic medical records.
Healthcare IT in the United States has evolved to a stage where electronic health records (EHRs) serve as the core platform, integrated with other business units, forming a major trend toward pre-structured, disease-specific data. U.S. clinicians have largely adopted the use of pre-structured documentation for patient information, whereas China lags significantly behind in this regard.
Currently, including Boshi Medical Cloud, a large number of enterprises in China still use post-structured methods to process existing information. However, in essence, it is only a transitional product of the development of Chinese medical data.
The advantage of post-structuring is that, as a mainstream data processing technology, it can help hospitals process existing data by extracting structured data from the vast amount of historical medical records, thereby supporting physicians in their clinical research.
Its drawbacks are obvious, mainly including the following four points:
1. Incomplete and deficient original medical record content, making it difficult to ensure the quality of existing data
2. Integration of various hospital systems is required to achieve unified data consolidation;
3. Low reusability of functional algorithms such as NLP;
4. The operational costs for enterprises, such as manual review, are extremely high.
Compared with post-structuring, pre-structuring has the following advantages:
1. Physicians can directly input subjective information and upload it in real time;
2. Ensured that data was preserved with comprehensive content and dimensions from the outset;
3. Reduced time consumption and costs for data processing.

The transition from post-structuring to pre-structuring was not an overnight achievement for Cable Wen Bo know.
Starting in 2013, the Suowen Boshi team spent two years organizing, developing, and testing the prototype technical architecture of the entire Boshi Medical Cloud. In 2015, Boshi Medical Cloud positioned its services for Grade 3A hospitals and began launching into the market, with the Department of Thoracic Surgery at Peking Union Medical College Hospital being one of its most representative clients.
In 2015, the Department of Thoracic Surgery at Peking Union Medical College Hospital and other tertiary hospitals in China were still operating with traditional Hospital Information Systems (HIS) that combined electronic medical records with paper-based charts. This situation caused significant inconvenience for thoracic surgeons.
Following collaboration with Cable Wen Bo Know, and leveraging its proprietary core technologies, the company rapidly assisted the Department of Thoracic Surgery at Peking Union Medical College Hospital in developing a department-level, fully structured electronic medical record (EMR) system for specific diseases. This system strictly adheres to clinical pathways while integrating the department’s extensive clinical expertise.
At the initial launch of the electronic medical record (EMR) platform, Cable Wen Bo know Technology employed post-structural processing (text extraction combined with natural language processing) to export, clean, and import five years’ worth of historical patient data from hospital systems into the Boshi Medical Cloud, thereby facilitating more efficient use by physicians.
However, during the cleanup of historical data, the Suowen Boshi team found that beyond the inconvenience of importing paper-based media, the most significant challenge was the inconsistency in clinical documentation. The individual stylistic variations in wording and terminology among different physicians posed substantial difficulties for subsequent natural language processing.
How to Solve This Challenge?
After integrating international experience and conducting product validation, Cable Wen Bo know Technology (Beijing) Co., Ltd. found that adopting a pre-structured approach—utilizing highly customized, fully structured forms to enable physicians to input subjective diagnostic data via point-and-click and text entry—allows for the direct generation of diagnostic results. This method effectively overcomes linguistic variations among different physicians while ensuring high-quality data input.
In 2017, building on the experience gained from processing historical hospital data, Cable Wen Bo Know Technology tailored numerous operational details and customized forms for the thoracic surgeons at Peking Union Medical College Hospital. Clinicians increasingly recognized the advantages of highly pre-structured electronic medical records in ensuring documentation uniformity.
After more than a year of clinical application, physicians in the Department of Thoracic Surgery at Peking Union Medical College Hospital have been able to record nearly all current inpatient information in a pre-structured format on the Boshi Medical Cloud. This is not an isolated case among Boshi Medical Cloud’s clients.
How Are Disease-Specific Electronic Medical Records Superior to General Electronic Medical Records?
Pre-structuring offers another advantage: it enables electronic medical records (EMRs) to be tailored to specific diseases through structured design. Compared with traditional EMRs, disease-specific EMRs demonstrate significant advantages in terms of information dimensions, data quality, disease specialization, and applicability.
In terms of information dimensions, disease-specific electronic medical records (EMRs) can perform targeted data collection for specific conditions based on physicians' requirements, with the number of dimensions ranging from one to unlimited. In contrast, general EMRs only collect common information across diseases, resulting in significant limitations in data dimensionality.
In terms of data quality, disease-specific electronic medical records collect only information relevant to the specific condition, with over 80% of the content customizable through pre-structured formatting, ensuring standardized and uniform data collection.
Regarding system updates, disease-specific electronic medical records (EMRs) can be continuously updated and iterated in accordance with clinical guidelines and developmental trends for specific diseases. In contrast, general EMRs cannot be promptly updated in response to guideline changes or pharmaceutical advancements for one or a few specific diseases.
In terms of application, only high-quality, large-volume disease-specific data can summarize treatment outcomes, drive new drug development, and improve therapeutic efficacy. The collection of such disease-specific data can even be regarded as the cornerstone of artificial intelligence.
How to Create Structured Medical Records for Specific Diseases?
To establish structured medical records for specific diseases, two core issues must be addressed. The first is achieving rapid customization. There are few common requirements among tertiary hospitals, or even among individual physicians; instead, personalized needs predominate. In addition to clinical diagnosis and treatment, hospitals engage in extensive scientific research and disciplinary development related to disease progression.
Each hospital has its own accumulated experience in the same disease area, and on this basis, it simultaneously conducts many specific clinical research projects. Therefore, customization is a very important matter for the clinical departments of tertiary hospitals.
The second factor is the ability to iterate rapidly. Based on Cable Wen Bo Know’s three years of experience serving nearly 4,000 clinical departments in Grade A tertiary hospitals across China, these departments require updates to their electronic medical record (EMR) systems every 3–6 months on average. Such iterations incorporate physicians’ suggestions for functional improvements during daily use, as well as changes in clinical practice guidelines and research directions. Rather than occurring as abrupt, one-time overhauls, these updates typically unfold as a wave-like, continuous accumulation that ultimately leads to qualitative transformation.
Overall, disease-specific electronic medical records are a prerequisite for pre-structured electronic medical records. If electronic medical records cannot be specialized for specific diseases, they certainly cannot be structured.
The predominant text-based format of current mainstream electronic medical records (EMRs) is primarily due to the need for inter-departmental standardization during hospitals’ informatization upgrades. In practice, however, this standardization comes at the expense of disciplinary specialization.
It can be said that,Pre-structured electronic medical records for specific diseases can yield the highest-quality medical data, which is a prerequisite for training artificial intelligence models and applying algorithms.。
The following case offers some insights.
How Can Structured Electronic Medical Records for Specific Diseases Be Integrated with Artificial Intelligence?
In 2017, leveraging its technical expertise in structured electronic medical records (EMRs) for specific diseases, Cable Wen Bo know Technology began to expand its efforts into the field of artificial intelligence.
In September this year, the Liver Tumor Center of the 302nd Hospital of the Chinese People's Liberation Army, together with the Molecular Diagnostics Medicine Professional Committee of the Chinese Research Hospital Association and Boshi Medical Cloud, released an algorithm-based achievement: an artificial intelligence model product for the diagnosis and survival probability prediction of cholangiocarcinoma. This product was developed by combining machine learning algorithms with specialized clinical medicine algorithms. The aim is to assist clinicians in tertiary and secondary hospitals in accurately distinguishing cholangiocarcinoma from hepatocellular carcinoma.
Cholangiocarcinoma and hepatocellular carcinoma both arise in the liver and appear similar on imaging; however, they originate from different cell lineages, and their treatment regimens differ. Accurate identification and diagnosis of cholangiocarcinoma, along with the formulation of rational therapeutic strategies, are of significant clinical importance for patients.

EN Achieves Optimal Peak AUC
Boshi Medical Cloud adopts a pre-structured approach, initially establishing an oncology dataset comprising thousands of cases. This structured dataset features nearly 2,000 field dimensions. Machine learning algorithms are first employed to rapidly reduce the dimensionality of these fields, efficiently compressing them into a reasonable range to meet the usability requirements of the model product.
Subsequently, by integrating medical-specific algorithms and fitting key variable information obtained from machine learning models, we successfully developed an AI product with core output metrics such as disease incidence probability and one-year survival rate.
From the perspective of clinicians, the aforementioned work can help them establish patient risk assessment criteria, which may potentially be incorporated into China’s clinical practice guidelines for the diagnosis and treatment of cholangiocarcinoma in the future. Meanwhile, Boshi Medical Cloud continuously iterates and updates its model algorithms, integrating them into its existing product platforms, including web and mobile interfaces. This enables a broader range of clinical hospitals to access and utilize the product through public channels, applying it to diagnostic and therapeutic practices.
Undoubtedly, by screening tumor cells and providing diagnostic and treatment recommendations, Suowen Boshi has opened up a new application scenario beyond medical imaging. Based on the data foundation established from structured electronic medical records for specific diseases, Suowen Boshi can, in the future, develop AI clinical applications tailored to clinicians' needs. By leveraging machine learning, deep learning, and even hybrid algorithms, it will enable AI deployment in a wider range of clinical scenarios.