Shanghai Jiao Tong University's MingQi Team Unveils China's First Multimodal AI Model for Accurate Rare Disease Diagnosis

May 29, 2025 07:59 CST Updated 08:00

“Some people only discover they have Crohn’s disease shortly before passing away,” remarked Professor Wang Shuo from the LoCCS Laboratory at the School of Computer Science, Shanghai Jiao Tong University.

This rare disease, often referred to as “undying cancer,” has an unknown etiology but can affect the entire gastrointestinal tract. Like most rare diseases, Crohn’s disease also faces challenges such as insufficient medical resources, prolonged diagnostic timelines, and difficulties in achieving a definitive diagnosis.

This has prompted reflection at the School of Computer Science (School of Cyber Security) of Shanghai Jiao Tong University: In an era where large language models (LLMs) are closely intertwined with the healthcare industry, and as these models are increasingly deployed in specific medical and health scenarios, can they facilitate efficient and precise diagnosis for rare diseases such as Crohn’s disease? The answer is affirmative, yet the challenges remain formidable.

Specifically, the development of large models for the precise diagnosis of rare diseases faces at least three major challenges: First, the data challenge. This is a common hurdle for all medical large models at the current stage; however, compared with other diseases, data on rare diseases are scarcer, making data acquisition even more difficult. Second, the interpretability challenge, which is a key factor in determining whether large models can gain the trust of both physicians and patients. Third, the deployment cost challenge. The purpose of promoting precise diagnosis of rare diseases is to alleviate the shortage of physician resources and enable more primary-care hospitals to develop the capacity for diagnosing and treating rare diseases. Given the limited financial resources of primary-care hospitals, the cost of deploying large models has become a critical issue for their practical implementation and widespread adoption.

Led by Professor Wang Shuo, Mingqi—the first multimodal large medical imaging model developed in China for the precise diagnosis of rare diseases—has overcome challenges related to data scarcity, high deployment costs, and model interpretability. It not only achieves a diagnostic accuracy rate exceeding 92%, but also reduces deployment costs to just RMB 100,000.

上交大配1.png

Image source: Official website of the Mingqi Multimodal Large Model

So, how does the Mingqi multimodal large model tackle the most challenging problem in medical AI: precise diagnosis of rare diseases? How does it make the model interpretable and reduce deployment costs? What are the future promotion plans? With these questions in mind, VCBeat contacted Professor Wang Shuo to hear him detail the capabilities and breakthrough journey of the "Mingqi" multimodal large model.

Building a Data Flywheel System to Overcome the Challenge of Scarce Rare Disease Data

According to Professor Wang Shuo, the Mingqi team adopted a “progressive” strategy to address the challenge of data scarcity. Specifically, in the first stage of large model training, the objective was merely to equip the large model with basic cognitive capabilities. During this phase, the Mingqi team first collected a large volume of publicly available gastrointestinal endoscopy images, and then combined self-supervised learning with customized optimization of the large model to enhance the capabilities of a lightweight large model in key diagnostic processes. This approach enabled the model to establish a foundational matrix of recognition and judgment capabilities, such as identifying the intestinal tract, intestinal polyps, and intestinal ulcers.

In the second phase, the team needs to consider how to enhance Mingqi’s capabilities in vertical domain applications. In interviews, Professor Wang Shuo explicitly stated that, compared to the first phase, the volume of data required in the second phase is relatively smaller. Therefore, in practice, data volume is not the primary constraint on model training during this phase; rather, data precision is more critical. In other words, at this stage, the team should focus on identifying and addressing gaps, ensuring that the training data is precisely matched to the specific capabilities the model needs to acquire.

In response, the Mingqi team has established a data flywheel system: First, the team developed a “golden data extraction” mechanism to produce thousands of high-quality golden data samples tailored to the key capabilities required for model training. Second, the team leverages the generative capabilities of large models for intelligent data synthesis. Finally, the Mingqi team trains large models to learn the annotation logic and expertise of human specialists, thereby equipping the models with high-quality data annotation capabilities that ultimately feed back into and enhance model training.

As a result, Mingqi has established a robust data distribution system within the medical vertical sector, thereby overcoming the challenge of data scarcity in model training and laying a solid foundation for its applications in this field. Indeed, Mingqi’s real-world performance has been impressive—achieving a diagnostic accuracy rate exceeding 92% for gastrointestinal conditions such as Crohn’s disease. Experts from the Department of Gastroenterology at the Third Xiangya Hospital of Central South University stated, “Mingqi’s diagnostic accuracy has surpassed that of senior specialists.”

Clearly, the 92% diagnostic accuracy relies not only on rich and diverse data but also on a dual-engine architecture called “Large Model Capability Matrix + Expert Routing Collaboration.”

Adopting a “Large Model Matrix + Transparent Diagnostic Cabin,” diagnostic accuracy exceeds 92%

Why Have People Generally Perceived Large Language Models as Becoming Smarter in Recent Years? Professor Wang Shuo Explains That the Underlying Reason Is Related to the Enhanced Ability of Large Language Models to Invoke “Tools.”

A prime example is that, in the past, when users input a math problem, large language models might generate nonsensical responses and fail to produce any meaningful analysis. Now, however, these models first recognize the input as a mathematical problem; second, they invoke relevant tools to perform calculations; and finally, they provide the result. For more complex tasks such as data analysis, large language models integrate the outputs from various tools to deliver a comprehensive final result.

Mingqi has also adopted a similar technical approach. According to Professor Wang Shuo, the Mingqi team first deconstructs the capabilities required for the precise diagnosis of rare gastrointestinal diseases, including Crohn’s disease. They then develop corresponding lightweight large language models (LLMs) based on these capabilities, and finally “integrate” these lightweight LLMs into a large model matrix via an expert routing protocol. This large model matrix provides the rich and essential “tools” needed to achieve precise diagnosis of rare diseases.

“Expert Routing Collaboration” refers to Mingqi’s integration of clinical guidelines and diagnostic expertise from clinical specialists to establish “diagnostic pathways,” also known as “diagnostic logic,” that align with evidence-based medicine and clinical requirements. For instance, in diagnosing Crohn’s disease, Mingqi adheres to a diagnostic pathway that prioritizes assessment of the lesion location, followed by evaluation of ulceration, and then examination of the margins, while also coordinating and planning the use of various tools throughout this process. This demonstrates Mingqi’s capability for clinical logical reasoning.

It is worth noting that to enhance Mingqi’s logical reasoning capabilities and strengthen trust among both physicians and patients, the Mingqi team has introduced a “Transparent Diagnostic Cabin” mechanism. This system visually presents each diagnostic step and reasoning process, providing three tiers of interpretable evidence for every diagnosis, including imaging annotations, diagnostic pathway decisions, and a reference library of similar medical cases. Specifically, in practical applications, Mingqi identifies suspected lesion areas through imaging annotations and provides a comprehensive chain of evidence. It then retrieves and compares similar historical cases for analysis and reflection, ultimately delivering a precise diagnosis.

“The black-box nature of decision-making in traditional AI healthcare makes it difficult for doctors and patients to trust AI diagnostic results; in other words, interpretability is key to gaining clinical acceptance. Mingqi’s ‘Transparent Diagnostic Cabin’ mechanism not only improves diagnostic accuracy but also resolves the trust deficit between doctors, patients, and AI, empowering physicians to confidently and willingly deploy AI in real-world clinical settings.” Thus stated Professor He Chaoxiang, a core member of the LoCCS Laboratory at the School of Computer Science, Shanghai Jiao Tong University, at the “Academic Conference on AI-Enabled Innovative Development of Precision Diagnosis and Treatment.”

In a nutshell, sufficient data and robust models have endowed Mingqi with the capability to diagnose rare diseases, while the mechanisms of “Expert Routing Collaboration” and “Transparent Diagnostic Cabin” have equipped it with diagnostic reasoning logic for such conditions. This dual strength of capability and logic has enabled Mingqi to achieve a diagnostic accuracy rate exceeding 92%, thereby earning the trust of both physicians and patients.

However, Mingqi’s empowerment of rare disease diagnosis extends beyond improved accuracy and strengthened trust between physicians and patients.

Optimize lightweight models to reduce deployment costs to approximately RMB 100,000.

As Professor Wang Shuo noted, Mingqi has adopted a technical approach based on the integration of lightweight large language models (LLMs). One of the key characteristics of lightweight LLMs is a significant reduction in training costs, which consequently lowers deployment expenses. However, it is important to note that adopting lightweight LLMs does not imply compromising model performance; rather, it fully amplifies the unique features and value of each model to maximize its utility. “Even with lightweight models, we are still leveraging large language models, and specifically, trustworthy ones,” stated Professor Sun Shifeng, a core member of the LoCCS Laboratory at the School of Computer Science, Shanghai Jiao Tong University.

Consequently, at this stage, Mingqi can perform large language model inference using an all-in-one machine costing only around RMB 100,000. This means that county-level and primary care hospitals can access expert-level diagnostic systems at an affordable price, further promoting the decentralization of high-quality medical resources and the realization of inclusive healthcare. Furthermore, according to Professor Wang Shuo, in addition to further optimizing the model to reduce deployment costs, the Mingqi team is also exploring a CPU+GPU deployment architecture, offloading part of the model’s computational workload to CPUs, thereby further reducing deployment costs.

上交大配图2.png

Image source: Official website of the Mingqi Multimodal Large Model

Furthermore, regarding the expansion of disease coverage, Wang Shuo stated that over the next three years, Mingqi will further extend its coverage to include 15 diseases listed in the National Catalogue of Rare Diseases, while continuously enhancing diagnostic capabilities for core conditions such as Crohn’s disease. With Mingqi’s support, it is estimated that approximately one million misdiagnoses of rare diseases will be prevented annually, resulting in healthcare cost savings exceeding RMB 1 billion. This will make a significant contribution to the high-quality development of China’s healthcare sector and the realization of inclusive healthcare.