With the implementation of national policies on health and medical big data, electronic medical records (EMRs), as one of the foundational databases, have been gaining increasing prominence within hospital information systems. Both startups and publicly listed companies are actively engaging in technological trials and explorations in this area.
As a well-known provider and service provider of medical information system solutions in China, Peking University Health Information (PKUHI) has been established for more than 20 years. Relying on Peking University and Founder Group, it helps the development of medical reform through continuous innovation of medical informatization technology.
Currently, PKU Healthcare ITServing over 3,000 cumulative users across 28 provinces, municipalities, and autonomous regions in China, including more than 200 tertiary hospitals., has become one of the largest providers and service providers of digital hospital and regional healthcare solutions in China.
In the HIT 3.0 Era, Characterized by Wisdom in Knowledge Development and Utilization,PKU Healthcare IT’s Definition of the Next Stage in Informatization Development. In this stage, PKU Healthcare IT willActively explore derivative services based on medical big data, such as electronic health records, for drug research and development, health insurance, telemedicine, and health management.
Based on the development stage of Peking University Health Information Center, VCBeat (WeChat Official Account: vcbeat) conducted an exclusive interview regarding the application of electronic medical records in health and medical big data.Wang Qi, Big Data Director of Peking University Medical Information.

Wang Qi, Director of Big Data at Peking University Medical Information
Four Stages in the Development of Electronic Medical Records
The concept of Electronic Medical Records (EMR) is broad, encompassing the Clinical Data Repository (CDR), Clinical Decision Support Systems (CDSS), medical vocabularies, Computerized Physician Order Entry (CPOE) systems, and clinical documentation data. Due to the wide variety of information systems used in hospitals, an integration platform is required to aggregate this information before transmitting it to the CDR.
Simply put, the CDR serves as a data repository into which all hospital data—whether operational or clinical—must be integrated. In contrast, electronic medical record (EMR) systems are more closely aligned with clinical practice, as the majority of their core content is generated primarily during the inpatient care phase.
If the development of electronic medical records is categorized, referencing the classification method proposed by Professor Xue Wanguo from PLA General Hospital (301 Hospital), Wang Qi believes it can be broadly divided intoFour Stages:
Phase 1: Implement comprehensive clinical information systems within healthcare institutions.A phase of building from scratch, digitizing paper documents, which was then called CPR.
Phase 2: Achieve patient-centered information integration within healthcare institutions.The evolution progressed gradually from imaging to digitalization, specifically in the form of electronic medical record (EMR) data acquisition. The second stage involved integrating patient-centered information within the hospital. In this context, an EMR is not merely a product of manual data entry; rather, it is synthesized by extracting data from various systems and combining it with clinicians’ subjective judgments, which truly embodies the concept of an electronic medical record.
Phase 3: On the basis of internal informatization within medical institutions, achieve patient information sharing among medical institutions and establish a regional electronic medical record (EMR) system.This is because only regionalized electronic medical record systems constitute a patient-centered information-sharing platform, which aligns with the concept of the Electronic Health Record (EHR) system currently being planned and developed by the state.
Phase 4: Electronic Medical Records in the Era of Cloud Computing, Big Data, and the Internet of Things.To develop personalized treatment and precision medicine, services must be delivered on an individual basis. In addition to traditional patient information, electronic medical records (EMRs) need to expand their scope to include non-medical data related to personal health, such as diet, exercise, and environmental factors. At this stage, a new concept has emerged: the Personal Health Record (PHR).
Barriers to the Formation of Big Data in Healthcare
Healthcare Big Data encompasses information from the entire healthcare system, such as electronic medical records (EMRs), electronic prescriptions, and population health data. It is the integration of these diverse datasets that constitutes Healthcare Big Data. In particular, electronic medical records contain the most comprehensive clinical data.
Leveraging Electronic Medical Records to Form Big Data, the Main Barriers IncludeFive Aspects:
1. To generate comprehensive big data, interoperability must be achieved through standardization.However, data from a single hospital cannot be classified as big data; only within a certain region can full-lifecycle health and medical big data be formed.
However, this presents a paradox in supply and demand: hospitals are the suppliers of big data, while the demand originates from external entities such as the government, pharmaceutical companies, and insurance firms. Although these external parties have a strong demand for hospital data, hospitals, as data providers, have relatively limited and singular needs for external data, primarily focused on scientific research.
2. The level of electronic medical record (EMR) adoption in healthcare institutions is too low.Currently, the majority of hospitals in China at or below the secondary level, particularly those in remote areas, have extremely low levels of informatization. This results in a lack of data-based evidence for assessing and monitoring the health status of the population.
3. Leveraging the deep data mining capabilities enabled by machine learning, deep learning, and related technologies.At this stage, clinicians’ primary demand for electronic medical records (EMRs) remains focused on supporting scientific publications. The data utilized within hospitals essentially involves deep mining of existing datasets, which does not yet constitute big data in the true sense. Given that the data volume of a single hospital is inherently limited, their needs are more oriented toward in-depth mining and information extraction from finite data samples, a process that requires the support of machine learning algorithms.
4. The core is the electronic medical record covering the entire process and full lifecycle.This underscores the importance of ensuring high-quality data at its source. Electronic medical records (EMRs) consume a significant amount of physicians’ time and effort in documentation. In China, EMR systems are designed based on templates; currently, many EMR products rely on drag-and-drop template functionalities. Products from different vendors generally adhere to fixed formats, requiring physicians to complete numerous data fields.
If a physician spends one to two hours completing a medical record, the quality of electronic health records (EHRs) becomes difficult to control under the principle of time-efficiency output ratio. Therefore, more intelligent and flexible solutions are needed to assist physicians in documenting the patterns of disease onset and progression.
5. The specialized nature of medical knowledge makes it difficult for health IT companies to determine the value of specific clinical data.For instance, in the case of hypertension data, given the numerous subtypes of hypertension, the electronic medical record (EMR) system must determine which specific categories of data to retrieve, including the identification of erroneous entries. Without intelligent analytical methods, it is difficult to accurately export the data truly required for research, thereby compromising the reliability of health big data analysis results and the validity of hospital research conclusions.
Two Major Solutions from PKU Healthcare IT
To address the four current challenges in the development of electronic medical records, Peking University Medical Information Center has adopted the following approach:
1. To address the issues of poor data interoperability and low levels of informatization, a three-tiered plan is proposed:
Regional Level: Under the leadership of the Guizhou Provincial Health and Family Planning Commission, Peking University Healthcare Information Technology has successfully deployed a provincial-level electronic medical record (EMR) sharing platform, which is also the first provincial big data platform in China.
As of now, Peking University Medical Information has integrated data from a total of 199 hospitals, including the Guizhou Provincial People's Hospital. This data is being consolidated with the National Health and Family Planning Commission, during which process standardization is implemented.
Regional platforms and shared infrastructure projects involve two key aspects: how data is aggregated upward and how systems are deployed downward. Therefore, PKU Healthcare IT has divided its electronic medical record (EMR) product into two components: data integration and collaboration, and business support services. This regional platform, named Smart Regional Health, comprises five major modules: primary care, public health, collaborative healthcare, health administration, and health management.

Large Hospitals: The traditional electronic medical record (EMR) information systems used internally within hospitals are also a critical component. By integrating with other systems, they form a comprehensive EMR framework. The second component is the government-led regional health information exchange platform. Its primary mission is data aggregation and interoperability, rather than extending operational business systems to lower levels.
Primary Healthcare Institutions: The third part is the Cloud Hospital Management Platform, a cloud-based SaaS product. This solution enables the centralized aggregation of hospital management information and electronic medical record (EMR) data from primary healthcare institutions, facilitating unified management and deployment.
Upstream data transmission is facilitated through a regional shared data platform, covering secondary and tertiary hospitals. Downstream transmission is managed via the Cloud Hospital Management System, extending coverage to primary healthcare institutions. This architecture enables the direct deployment of standards set by the National Health and Family Planning Commission and interoperability rules from a central hub to all terminal nodes, thereby achieving data unification.
The interoperability of electronic medical records (EMRs) offers patients the most direct benefit of no longer needing to carry bulky paper records when seeking medical care. Mutual recognition of medical records among hospitals allows patients to avoid numerous redundant tests. For hospital medical records departments, replacing paper-based records with digital ones maximizes cost savings on storage.
Currently, in addition to Guizhou, PKU Healthcare IT has participated as a technology provider in medical consortia in Yinchuan and Luohu District, Shenzhen, as well as in medical alliances in Zhangjiagang and Haidian District, Beijing. The PKU Healthcare IT SaaS Cloud Hospital has also been deployed in more than 400 primary healthcare institutions across China.
2. To address issues with the quality of electronic medical record (EMR) data, Peking University Healthcare Information Technology (PKU HIT) references established international data models and knowledge bases to optimize data processed through natural language processing (NLP), thereby constructing its own knowledge graph. By evaluating the value of data points within specific thematic areas, PKU HIT determines which data should be retrieved, ensuring the authenticity and validity of EMR data at the source.
Since a large amount of detailed patient information is stored in text form, and textual descriptions often contain ambiguities and numerous non-standardized expressions, converting such unstructured data into unified structured data is a critical step in medical information processing.
Natural language processing is one of the solutions. Transforming unstructured medical data into structured data requires a series of medical natural language processing techniques, including: "medical named entity recognition," "automatic coding of diagnostic entities"Code,” “Named Entity Modifier Recognition,” “Clinical Text Temporal Information Extraction,” etc.
Through such innovative technological approaches, Peking University Medical Information Technology is able to reduce the time physicians spend on completing electronic medical records.
Differences from the Development of Electronic Health Records in the United States
Epic Systems, the largest electronic health record (EHR) company in the United States, adopts a patient-centric business model for product development, ensuring that physicians can deliver efficient and cost-free medical services. Hospitals and healthcare institutions serve as the paying customers, generating revenue through software products, cloud services, telemedicine, and data sharing.
In terms of healthcare delivery models, there are significant differences in the application of electronic medical records (EMRs) between the United States and China. Most Americans access healthcare through primary care physicians. For them, health takes precedence over disease management, with health positioned at the apex of a pyramid. In contrast, China’s model is inverted: patients typically address health concerns only after falling ill.
In the United States, patients are not required to hand over their complete medical records to hospitals, as private physicians typically maintain long-term, periodic health data for their patients. In China, however, the family doctor system is still in its developmental stage, and medical services are primarily delivered within healthcare institutions, particularly due to the strong siphon effect of Grade A tertiary hospitals. Consequently, from a data perspective, hospitals remain the central hub of electronic medical records, rather than the patients themselves.
Furthermore, even within hospitals, the application of big data is not straightforward, as it involves hospital management systems and the overall health administrative mechanism. Since all hospital data are centralized in the information center, clinicians cannot directly access or review the data, even after the implementation of electronic medical record (EMR) systems. Access to and use of the data require review and approval by the Information Department, the Ethics Committee, and the Academic Committee. This established procedure is designed to ensure data security.
In summary, the application and development of electronic medical records (EMRs) in health and medical big data still require time to mature, as various conditions are not yet fully ready. However, with the implementation of top-level policies, the establishment of regional health platforms, and the continuous upgrading of hospital information systems, the potential of EMRs is being increasingly unleashed.