Zhang Bo, born in 1935, is a native of Fuqing, Fujian Province, and an expert in computer science and technology. He is an academician of the Chinese Academy of Sciences, a professor, and a doctoral supervisor. He formerly served as Director of the Academic Committee of the Department of Computer Science and Technology, Director of the State Key Laboratory of Intelligent Technology and Systems, Vice Chairman of the Robotics Professional Committee of the Chinese Association of Automation, and Chairman of its Intelligent Control Professional Committee.
Mainly engaged in theoretical research on artificial intelligence, artificial neural networks, genetic algorithms, fractals, and wavelets; as well as applied technology research on applying the aforementioned theories to fields such as pattern recognition, knowledge engineering, intelligent robotics, and intelligent control.
On July 3, 2017, the “Tsinghua University Summit on Future Medical Imaging” was held at Tsinghua University. Zhang Bo, an academician of the Chinese Academy of Sciences, delivered a speech titled “Artificial Intelligence and Medical Image Recognition.” This article presents a comprehensive compilation of his insightful remarks.

Zhang Bo
Artificial intelligence is not as miraculous as legend has it. It cannot replace doctors, and even in the field of autonomous driving, there is still a long way to go. This is not to say that I oppose the development of driverless cars; rather, it means we must inform the public of the reality: under complex road conditions, current driverless vehicles cannot truly operate without human intervention. As is well known, countries and regions such as the United States and Germany legally require a human driver to be present in autonomous vehicles on public roads.
This demonstrates that while artificial intelligence can help humans solve many problems, it still has room for improvement when dealing with complex environments. In the field of medical imaging, what types of problems can AI currently address, and which ones remain unresolved for the time being?
To understand artificial intelligence, one must first grasp deep learning. As early as the 1960s and 1970s, the so-called artificial neural networks (which are essentially algorithms) emerged. We extract features from images and speech, then use these algorithms to classify the features or map them into a specific space.
This algorithm initially saw limited application, as it required a deep understanding of the specific problem at hand to determine which features to extract. In other words, one needed substantial domain-specific expertise to implement it effectively. Later, this algorithm evolved into what is now widely known as deep learning. So-called deep learning simply involves increasing the depth of neural network layers. What researchers did not anticipate was that this mere “deepening” would bring about “earth-shaking” changes in the algorithm’s performance.
What is this change? First, the improvement in performance exceeded everyone’s expectations. With the same network, only the number of layers was increased,The performance of image and speech recognition can improve by double-digit percentage points.。
Additionally,Deep networks can automatically extract features without human intervention.. We previously studied images or speech,It is essential to have a certain understanding of it, knowing not only what it is but also why it is so.. For example, to identify an individual such as Zhang San, manual feature extraction was previously required, necessitating explicit instructions to the machine regarding the appearance and specific features of Zhang San’s face. However, since we cannot clearly articulate the cognitive process by which we recognize Zhang San, it is difficult to determine which features should be extracted.
With deep learning, this problem is readily resolved; it suffices to provide a sufficient number of photographs of Zhang San, as the machine can automatically extract the required features during training. ThereforeDeep learning enables the resolution of a vast array of problems where the phenomenon is observed but the underlying mechanism remains unknown.。Applying deep learning to a specific domain does not require researchers to possess extensive expertise in that field. This is both an advantage and a significant drawback of deep learning.。
Let's start with its advantages, which require no specialized knowledge.Therefore, anyone can use deep learning; by simply feeding data into a deep network, it automatically extracts features and performs recognition. This has led to the widespread application of deep learning, with three areas having a significant impact on the general public. The first is image recognition, where Microsoft’s image recognition software has surpassed human performance on ImageNet. The second is speech recognition, where Baidu’s single-sentence speech recognition software has achieved accuracy exceeding that of humans.
Another well-known milestone is AlphaGo’s victory over the human Go champion. In the past, computers had already surpassed humans in many areas, such as numerical computation, which people took for granted since computers are inherently designed for calculation. However, their superiority in speech recognition, image recognition, and board games like Go came as a major shock. These three domains were long considered human strongholds, yet machines have now outperformed us. What accounts for this shift? I believe it stems from three key factors: data, computational resources, and artificial intelligence algorithms.
Let’s start with data. As mentioned earlier, deep learning is a democratized tool; anyone who has access to high-quality, large-scale datasets can potentially achieve better results than others. For example, one of my doctoral students founded a startup that uses medical imaging for disease diagnosis, particularly for diabetic retinopathy. (According to VCBeat, this company is Zhiyuan Huitu, and its CTO, Ding Dayong, was supervised by Professor Zhang Bo.)
He collaborated with over 40 physicians to annotate nearly 400,000 image datasets. Due to the large volume of data, his work yielded superior results compared to others. Typically, diabetic retinopathy screening results are categorized into three levels: no diabetic retinopathy, mild cases requiring no further in-depth examination or treatment, and cases requiring subsequent in-depth examination and treatment.
Zhiyuan Huitu classifies diabetic retinopathy into five categories according to international grading standards: no diabetic retinopathy, mild non-proliferative diabetic retinopathy (NPDR), moderate NPDR, severe NPDR, and proliferative diabetic retinopathy (PDR). It clearly identifies pathological features such as microaneurysms, intraretinal hemorrhages, and hard exudates in the fundus, thereby making the examination results more targeted. Differentiating between the presence and absence of diabetic retinopathy and determining the need for further in-depth examinations is relatively straightforward; however, subdividing the condition into five severity levels or annotating specific lesions presents a significantly greater challenge.
Deep learning algorithms generally do not require specialized expertise; professional knowledge can only be conveyed to the algorithm through annotation. However, such annotations merely indicate whether the subject is diseased, presenting a binary “yes/no” outcome. At most, they may mark the location of lesions, but they cannot convey professional knowledge such as the underlying causes of the pathology. How can this issue be addressed?
VCBeat interviewed Sun Yuhui, CEO of Zhiyuan Huitu, who stated that in order to make the system better align with physicians’ clinical needs,They are also continuously catching up on their knowledge of medical fundus imaging, so that they can ask valuable questions when communicating with medical experts.。
Experts may skip certain fundamental questions, not because they are unimportant, but because they are too basic. If you lack this foundational knowledge, you risk missing critical information. Similarly, experts sometimes possess IT-related knowledge. Ultimately, this reflects the need for interdisciplinary integration.
In the second example, this year’s Data Science Cup offered substantial prize money totaling $1 million, with $500,000 awarded to the first-place winner. A total of 1,700 teams participated in the competition. The organizers provided medical imaging data comprising 1,600 training samples and 500 test samples, all of which were unlabeled. The objective was to identify which samples in the test set represented lung cancer cases. Ultimately, a team composed of three Ph.D. students from Tsinghua University secured first place. A key factor in their success was their prior exposure to neuroscience during their medical school studies, which fostered a strong interest in medical imaging and enabled them to closely integrate algorithms with practical application scenarios.
What are the consequences of a lack of domain expertise for deep learning? A critical challenge in applying deep learning to medical image recognition (or other medical problems) is the persistent scarcity of high-quality data. For instance, this is evident in the development of diabetic retinopathy screening systems trained on high-quality fundus images.
However, fundus images in real-world clinical practice are of inconsistent quality. As a result, high-quality fundus images can be accurately recognized, whereas low-quality ones cannot. Because deep learning models lack integration of domain-specific medical knowledge, their outputs are uninterpretable; consequently, clinicians are unaware of where the system fails, and technical engineers are equally perplexed. This raises a critical question: how can deep learning be effectively combined with professional medical expertise to meet the requirements of interpretability and comprehensibility? Addressing this challenge constitutes an important direction for future research.
The Tsinghua University Future Medical Imaging Summit is hosted by the School of Medicine, Tsinghua University, and the Tsinghua-Qingdao Institute for Data Science, Tsinghua University, and organized by the Laboratory for Future Medical Imaging, School of Medicine, Tsinghua University. It brings together renowned scholars in artificial intelligence, clinical scientists from top-tier hospitals, experts in data science, and leading figures in medical imaging technology to jointly discuss hot topics and future development directions in the application of medical imaging, artificial intelligence, and big data technologies in clinical medical research.
At the forum, the Future Medical Imaging Laboratory of Tsinghua University School of Medicine launched the “AI+MI” initiative. By partnering with leading clinical hospitals and integrating medical imaging resources to establish a comprehensive database, and leveraging Tsinghua University’s robust engineering capabilities in artificial intelligence algorithms, high-performance computing, and large-scale storage, the laboratory aims to conduct AI-driven medical imaging research focused on cardiovascular and cerebrovascular diseases, neurodegenerative disorders, and respiratory system diseases. The initiative welcomes additional partners to join.