Hangzhou-Based Medical AI Model Dominates Two Authoritative Benchmark Rankings

Mar 02, 2025 18:57 CST Updated 18:57

Amid the AI boom, the competition surrounding large language models continues to unfold dramatically, with authoritative LLM evaluation platforms becoming a critical battleground for tech giants to prove their AI prowess.

Recently, the latest rankings were unveiled on MedBench, a renowned open evaluation platform for Chinese large medical language models. In its self-assessment leaderboard, three Hangzhou-based companies secured the top three spots: WeDoctor Holdings’ WeDoctor Medical Large Language Model ranked first, followed by Ant Group’s Ant Medical Large Language Model (powering Ant AI Health Assistant) in second place, and Hangzhou Zhizhen Technology’s WiseDiag in third.

图片2.png

MedBench Self-Assessment Leaderboard Screenshot

The results show that the WeDoctor Medical Large Language Model ranked first in the self-assessment with a total score of 94.7, demonstrating outstanding performance across evaluation dimensions such as medical knowledge Q&A, medical language generation, complex medical reasoning, medical language understanding, and healthcare safety and ethics. Developed by WeDoctor Holdings, this large language model is intelligent and capable, comparable to a "top-performing student." It is reported that over the past six months, the WeDoctor Medical Large Language Model has consistently remained among the top three in the self-assessment rankings. As an authoritative evaluation platform launched by the Shanghai AI Laboratory and the Shanghai Digital Medicine Innovation Center, MedBench leverages expert experience and knowledge reserves from leading medical institutions to establish a fair and rigorous evaluation system for Chinese medical large language models. Ranking among the top of the list serves as strong recognition from industry "judges" of the model's comprehensive capabilities.

Similarly, the CMB (Comprehensive Medical Benchmark in Chinese), a leading domestic evaluation platform for large medical models, represents another critical battleground in the intense competition among these models. Over the past six months, WeDoctor’s large medical model has repeatedly topped the rankings. Industry experts view this achievement of securing dual-chart championships as a testament to WeDoctor’s exceptional capabilities in the field of AI-driven healthcare.

Competition among large medical AI models is intensifying, with multiple models not only vying in technological R&D but also rapidly advancing in practical applications. The empowerment of AI in healthcare has already benefited many patients.

Public information indicates that the WeDoctor Medical Large Language Model has not only achieved technological breakthroughs but also demonstrated outstanding performance in practical applications. In the Health Community projects co-developed with local governments, including Tianjin, four major intelligent agents—AI Doctor, AI Pharmacist, AI Health Manager, and AI Intelligent Controller—developed on the foundation of the WeDoctor Medical Large Language Model, have been deployed at scale. This has established a closed-loop application for full-lifecycle health management, covering pre-diagnosis, during-diagnosis, and post-diagnosis stages.

Data shows that in the six months ended June 30, 2024, WeDoctor Holdings’ “AI Doctor” achieved a 99.97% compliance rate with alerts for inappropriate prescriptions. Meanwhile, “AI Health Management” increased the number of patients managed per health manager from approximately 550 in 2022 to around 2,000 in the first half of 2024. Furthermore, from January 2023 to June 2024, empowered by WeDoctor’s large medical model, the Tianjin Health Community saw the HbA1c target attainment rate among its managed diabetes members rise from 17.8% to 44.2%, the blood pressure target attainment rate increase from 19.5% to 61.5%, and the blood lipid target attainment rate climb from 24.8% to 27.9%.

Media reports indicate that, as China’s largest provider of AI-driven healthcare solutions, WeDoctor Holdings is continuously optimizing its AI-based clinical healthcare solutions, leveraging its constantly iterated and upgraded AI capabilities.

On one hand, WeDoctor Holdings continuously integrates advanced large language model (LLM) capabilities, leveraging open academic resources and multimodal data to infer diagnostic and therapeutic possibilities, thereby continually enhancing its foundational analytical capabilities. On the other hand, WeDoctor’s proprietary technologies focus on the deep integration of real-world clinical diagnosis and treatment data with clinical decision-making pathways. Through engines for chronic and specialized disease clinical pathways, rational drug use, risk control across medical care, health insurance, and pharmaceutical sectors (“Three-Medical Linkage”), health management, as well as expert deliberations, the company conducts clinical validation of its inference model solutions to ensure evidence-based compliance. The capabilities of these large models are further deployed through four intelligent agents: AI Physician, AI Pharmacist, AI Intelligent Control, and AI Health Management. This achieves a closed-loop transformation from technical capability to commercial value, creating a data flywheel effect that enhances training and self-reinforcement. By continuously optimizing AI healthcare capabilities based on feedback such as efficacy and economic evaluations, WeDoctor ultimately achieves improvements in both quality and efficiency.

WeDoctor Holdings is a pioneer in AI-driven healthcare in China. As early as 2017, WeDoctor partnered with Zhejiang University to establish the Ruiyi Artificial Intelligence Research Center, aiming to advance the research and application of artificial intelligence technologies in the healthcare sector. Today, WeDoctor Holdings has scaled the application of Ruiyi’s research outcomes across various scenarios, including digital health communities.

As of now, WeDoctor Holdings has had four algorithms filed with the Cyberspace Administration of China: the WeDoctor Medical Large Model Algorithm, the WeDoctor Medical Assistant Large Model Algorithm, the WeDoctor Health Assistant Model Algorithm, and the WeDoctor Text Generation Algorithm.