Home Han Yishun of Tsinghua-Qingdao Institute for Data Science Delivers Keynote on Norms for Open Sharing of Medical Data

Han Yishun of Tsinghua-Qingdao Institute for Data Science Delivers Keynote on Norms for Open Sharing of Medical Data

Jul 07, 2017 15:12 CST Updated 15:12

Han Yishun currently serves as the Executive Deputy Dean of the Tsinghua-Qingdao Data Science Research Institute at Tsinghua University. In his early years, he studied Applied Mathematics and Economic Management at Tsinghua University. He moved to the United States in 1986, where he spent many years studying and working. Upon returning to China, he dedicated himself to corporate management, engaging in multiple sectors including high-tech and traditional industries. He possesses extensive multi-level experience in business operations and real-economy industries.


Recently, the “Tsinghua University Summit on Artificial Intelligence and Future Medical Imaging” was held at Tsinghua University. Han Yishun, Executive Deputy Dean of the Tsinghua–Qingdao Institute for Data Science at Tsinghua University, delivered a speech titled “Discussion on Standards for Open Sharing of Medical Data.” This article presents a curated compilation of the key highlights from his address.

 

IMG_1872_爱奇艺.jpg

Han Yishun


I have attended numerous forums on topics related to medical big data, and I have gained the impression that if we collectively advance this field, given the widespread interest in healthcare big data among us, China’s healthcare big data sector can achieve greater progress and higher quality development. However, in practice, we encounter certain challenges, with the primary concern revolving around how to address security, privacy, and other issues associated with data sharing.


The development of big data in health and medical care urgently requires standards for open sharing. Today, I hope to share with everyoneReflections on the Medical Data Ethics Working Group to be Established by Tsinghua University, laying a solid foundation for these fundamental tasks.

 

First, our judicial practice should tilt toward technological innovation. Can China’s judicial practice truly accommodate non-malicious errors made in the course of innovation?

 

We should learn from foreign practices. Not long ago, a Tesla vehicle was involved in an accident, which was ultimately resolved through an out-of-court settlement rather than harshly penalizing Tesla. This illustrates why innovation is more vibrant in the United States: its judicial practice demonstrates greater tolerance toward innovation.

 

Secondly, we have always perceived data sharing as insecure and a potential violation of patient privacy; however, China has yet to establish clear standards for medical data privacy.

 

We need to establish a set of standards. Some concerns within the medical community are technical in nature; we advocate delegating such technical issues—such as non-standardized data formats—to technical experts for resolution, with the support of medical professionals.

 

Finally, we advocate for efforts from both top-down and bottom-up approaches. Currently, large-scale medical health data platforms and medical clouds are being established at the national level as well as across various provinces and cities. These initiatives are predominantly top-down; however, I hope to promote bottom-up efforts that mobilize the enthusiasm of physicians, department heads, and hospitals, enabling healthcare professionals to truly benefit from data mining and demonstrate the value of open data sharing.

 

We deem it essential to vigorously promote the use of healthcare and medical data by third-party research institutions, under the premise of safeguarding the privacy of both healthcare professionals and patients.

 

Why is medical staff privacy specifically highlighted? As everyone knows, doctor-patient relations in China are somewhat tense. If doctors and hospitals are not adequately protected, problems will arise. Our definition of privacy is primarily based on the purposes of scientific research and teaching. We hope that Tsinghua University will adopt this standard in its future work.

 

I have outlined five specific aspects of the specifications:De-identification standards and security standards must be credible, with specifications for traceability and accountability., this guideline is not necessarily exhaustive; I offer it as a starting point to spark discussion among everyone.

 

As is well known, de-identification refers to the so-called data de-privatization. In the following section, I will attempt to specify which types of private data should undergo de-identification.. As data desensitization technologies become increasingly mature, coupled with the adoption of desensitization concepts, significant progress can be made in protecting data privacy.

 

Safety, from storage security to transmission security to usage security, we need to put forward our own considerations and also have corresponding technologies for support. We believe that although it is not absolutely secure, it is still a feasible matter.

 

Public Credibility Standards, we have put forward proposals regarding information disclosure and data quality, believing that this will help enhance public credibility, provided that quality is assured.

 

Traceability, we need to conduct multicenter studies that comply with international standards, and the data must ultimately be traceable. Not long ago, many research-related issues in China were exposed, leading to some journals blacklisting Chinese researchers. This is actually due to poor foundational practices, which have hindered many from performing to their full potential. It is crucial to establish awareness of data traceability.

 

Responsibility Standards, in addition to defining privacy, we are eager to enforce accountability. As previously mentioned regarding data breaches, national regulations mandate criminal penalties for such incidents, which has hindered many hospitals and physicians from promoting multi-center data sharing.

 

However, in my view, it is essential to clearly delineate responsibilities. If we define all data leaving the hospital as de-identified, even if such aggregated data are compromised by hackers and disseminated, the risk of personal privacy breaches would be minimal. This approach would significantly reduce the risk of privacy leakage.

 

Audit System, identify and resolve issues rather than imposing a one-size-fits-all mandate on everyone. A data audit system should be implemented to ensure compliance.

 

Privacy Issues, we have given this some thought. First, patient privacy is divided into three categories.


Category 1 information is unrelated to healthcare and diseases and can be completely filtered out., to minimize potential disturbance to patients. For instance, data such as patient names and telephone numbers, which are irrelevant to disease research, must be prevented from leaving the hospital.

 

For the second category of epidemiology-related information, we believe that standardized encryption protocols should be implemented., enabling it to serve valuable scientific research purposes, such as tracking and studying discussions on specific topics, while also safeguarding privacy.

 

The third category also draws on U.S. health information protection standards, classifying devices implanted in the human body or those that augment human capabilities as personal private information., for instance, if the body is implanted with micro drug-delivery pumps, data leakage could pose potential risks to life safety. This category of information also requires protection.

 

Why is it classified as Category III? This is because most such studies are commissioned by pharmaceutical or medical device companies to evaluate specific products. The resulting data may be related to drug efficacy and device reliability, and we aim to utilize this data under encrypted protection.

  

In addition to protecting patient privacy, the privacy of healthcare providers should also be safeguarded.. For instance, we strive to protect physicians' basic personal information and family circumstances—details irrelevant to their professional duties—to minimize harassment of healthcare workers by malicious actors.

 

Drawing on patient privacy frameworks, healthcare professionals’ private information is categorized into two types. When conducting data mining with healthcare professionals as the subject of study, relevant data must be encrypted. We propose safeguarding healthcare professionals’ occupational information to alleviate concerns and encourage the genuine sharing of valuable data for scientific research and mining.

 

Finally, I would like to say: I have a dream. I aspire to build a multi-center research data platform that meets international medical research standards and to establish China’s single-disease databases. I believe that once such internationally compliant infrastructure is in place, leading medical experts and scientists worldwide will one day contribute their expertise to benefit patients in China.


The Tsinghua University Future Medical Imaging Summit is hosted by the School of Medicine, Tsinghua University, and the Tsinghua-Qingdao Institute for Data Science, Tsinghua University, and organized by the Laboratory for Future Medical Imaging at the School of Medicine, Tsinghua University. It brings together renowned scholars in artificial intelligence, clinical scientists from top-tier hospitals, experts in data science, and leaders in medical imaging technology to jointly discuss hot topics and future development directions concerning the application of medical imaging, artificial intelligence, and big data technologies in clinical medical research.

At the forum, the Future Medical Imaging Laboratory of Tsinghua University School of Medicine launched the “AI+MI” initiative. By partnering with leading clinical hospitals and integrating medical imaging resources to establish a comprehensive database, and leveraging Tsinghua University’s robust engineering capabilities in artificial intelligence algorithms, high-performance computing, and large-scale storage, the laboratory aims to conduct AI-driven medical imaging research focused on cardiovascular and cerebrovascular diseases, neurodegenerative disorders, and respiratory diseases. The initiative welcomes additional partners to join.