MedLink's MedGPT Achieves 96% Diagnostic Consistency with Tier-3 Hospital Experts in China's First Generative AI Medical Milestone

Jul 04, 2023 08:00 CST Updated 08:00

Hyro

Conversational AI Platform

Hippocratic AI

Healthcare Large Language Model (LLM) Provider

Medlinker

Chronic Disease Management Platform Provider

If asked what the hottest topic of the first half of the year was, most people would likely include generative AI. On the last day of the first half of 2023, a major announcement regarding generative AI took the healthcare community by storm, serving as the perfect capstone to generative AI’s sustained prominence during that period.

On June 30, Medlinker conducted China’s first-ever consistency evaluation between AI doctors and human physicians in Chengdu and Beijing, with a 24/7 real-time live broadcast. The results showed that the AI doctors powered by Medlinker’s MedGPT achieved a remarkable 96% consistency rate with attending physicians from top-tier (Grade A tertiary) hospitals in their evaluation scores, a performance deemed “(remarkably) beyond expectations” by the reviewing experts.

Generative AI Has Become the Foundational Infrastructure of Future Healthcare, with Application Exploration Rapidly Gaining Momentum

Since the rise of generative AI, represented by ChatGPT, earlier this year, it has demonstrated significant application potential across many industries and may even bring about disruptive innovation in numerous sectors in the future. According to a report by McKinsey,Generative AI could add $2.6–$4.4 trillion to global GDP annually—by comparison, the UK’s GDP in 2021 was only $3.1 trillion。

As generative AI gradually demonstrates potential that exceeds expectations in its application exploration across various industries, research institutions’ forecasts for the global market size of generative AI have also risen sharply. According to the latest report from MarketsandMarkets,The global market size for generative AI was estimated at $11.03 billion in 2023 and is projected to reach $51.8 billion by 2028, representing a compound annual growth rate (CAGR) of 35.6%.。

Based on this, investment in generative AI within the primary market has become increasingly active. In just the latter half of May, Hyro, a company whose core business is conversational bots, secured $20 million in Series B funding, while Hippocratic AI, which specializes in developing generative AI models for healthcare, raised $50 million in seed funding.

According to incomplete statistics from VCBeat, from January 1, 2022, to June 28, 2023, there were over 160 financing and investment events globally in the generative AI healthcare sector, with cumulative investment amounts exceeding $5.71 billion.。

In the healthcare sector, generative AI is regarded as a powerful enabler with transformative potential. Its applications are already being implemented in drug discovery and development, as well as in medical imaging and diagnosis.

In fact, generative AI has been applied in new drug development for some time. It can learn the mapping relationship from protein sequences to protein structures and, leveraging its powerful computational capabilities, address complex high-dimensional data mapping problems, thereby enabling protein structure prediction that was previously nearly impossible. Furthermore, it can generate entirely novel proteins that do not exist in nature, based on predefined performance and structural criteria.

In its integration with medical imaging, generative AI can provide enhancements in several aspects. First, generative AI can generate synthetic data based on raw data and apply it to the generation of final results, thereby achieving image enhancement. This approach helps overcome the limitations imposed by the imaging principles and technologies of medical equipment, reducing image quality degradation caused by improper operations.

Second, generative AI can produce large volumes of synthetic imaging data for data augmentation in model training. This plays a critical role in scenarios with data scarcity, such as rare diseases or domains with uneven data distribution.

Third, generative AI can estimate patients' health status and disease risks based on existing data. The industry has already enabled generative AI to self-learn and predict subsequent changes in examinees by observing the developmental changes in retinal blood vessels and nerves within populations, thereby assessing future risks of cardiovascular and cerebrovascular diseases. Furthermore, explorations have also been conducted in areas such as predicting the risk of Alzheimer's disease and forecasting myopia progression.

Beyond these two domains, generative AI is also exploring integration into the entire clinical diagnosis and treatment workflow, aiming to empower physicians in their diagnostic and therapeutic practices and enhance patient experience.

In the pre-consultation phase, generative AI can leverage its powerful data retrieval and reasoning capabilities to enhance disease prediction for patients, thereby improving the accuracy of triage and patient guidance.

During the consultation phase, generative AI can leverage multimodal data—such as patients’ medical records, symptoms, and disease history—to provide physicians with assisted diagnosis, treatment guidance, and prognostic plans through data analysis and intelligent algorithms.

In the post-consultation phase, generative AI can alleviate the burden on healthcare professionals by providing 24/7 online responses to patient inquiries regarding their medical conditions, medication side effects, and preventive measures. It can also serve as an educational tool to impart accurate health knowledge and preventive strategies to patients.

For physicians, generative AI also serves as a convenient repository of medical guidelines, enabling them to stay abreast of the latest advances in medical research, evidence-based medicine, and clinical guidelines, thereby enhancing their professional expertise and promoting improvements in healthcare quality. Furthermore, the anthropomorphic capabilities of generative AI are significantly more advanced than those of previous human-computer dialogue systems, which will greatly improve the patient experience.

However, there is still a considerable gap before these clinical concepts can be fully implemented. Users of ChatGPT will have noticed that its most significant issue is “confidently generated nonsense”; asking the exact same question repeatedly yields inconsistent answers. Fundamentally, this stems from the fact that current generative AI systems are primarily built on general-purpose large language models (LLMs) similar to GPT, which rely heavily on statistical probabilities in text to generate responses, thereby failing to guarantee answer accuracy.

This is undoubtedly unacceptable in medical application scenarios where accuracy and consistency are the baseline. Addressing this issue requires fine-tuning and engineering optimization of existing general-purpose large language models, along with the establishment of corresponding review mechanisms, to ensure the delivery of services with practical utility and consistent disease diagnosis and treatment capabilities.

Medlinker Leads Generative AI Breakthrough with 96% Diagnostic Consistency with Tier-3A Hospital Experts

Along this path, Chinese enterprises are also exploring and making initial strides. In April 2023, Medlinker announced the launch of MedGPT, a large language model built on the Transformer architecture and optimized for medical application scenarios. This model features up to 100 billion parameters, was trained on 2 billion medical text records and 8 million clinical diagnosis and treatment records, and underwent reinforcement tuning by 100 physicians.

To address the limitations of general-purpose large language models in medical application scenarios, MedGPT offers several specialized optimizations tailored for healthcare use cases.

First, MedGPT introduces a consistency verification mechanism for model algorithms.By incorporating a clinical medical rule validator, MedGPT undergoes clinical validation before generating formal responses for patients, thereby ensuring medical accuracy.

Secondly, Medlinker has established a multidimensional evaluation system for the diagnostic and treatment accuracy of MedGPT., for instance, the focus in the consultation scenario is on consultation accuracy, whereas in the diagnostic scenario, the emphasis is on the sufficiency of diagnostic evidence, disease diagnosis accuracy, and missed diagnosis rate. Through this evaluation system, the consistency and accuracy of MedGPT throughout the entire diagnosis and treatment process can be analyzed and assessed from multiple perspectives.

These measures are still insufficient; to evaluate the output of MedGPT, it is necessary to employ a benchmarking mechanism based on expert-reviewed real-world physician concordance.This is precisely the objective of this Medlinker Consistency Evaluation: to assess the consistency between treatment plans generated by MedGPT and those provided by real physicians through single-blind testing, with the results evaluated by an expert committee.

To this end, Medlinker held China’s first consistency evaluation between AI doctors and human physicians at Chengdu High-Tech Haiersen Hospital on June 30, with a live stream broadcast around the clock. The one-day evaluation study involved more than 120 real patients and 10 attending physicians or above from the departments of Cardiology, Gastroenterology, Respiratory Medicine, Endocrinology, Nephrology, Orthopedics, and Urology at West China Hospital of Sichuan University.

On-Site Evaluation of Consistency Between Medlinker AI Doctors and Human Physicians

To ensure the rationality and scientific rigor of the evaluation, the consultation phase of this test was specially designed: After entering the examination room, patients communicate their medical conditions to a medical assistant. The assistant then transmits the patients' chief complaints via online text input to both human physicians and AI physicians, facilitating multi-round communication between the patients and the doctors.

After collecting sufficient decision-making factors, both human physicians and AI physicians issue test orders or diagnoses for patients, who can then complete the examinations on-site at the hospital. Subsequently, patients may return for follow-up visits with their test results, where human and AI physicians independently provide clinical diagnoses and treatment plans, which are then consolidated. This process enables human and AI physicians to conduct independent, non-interfering diagnoses under largely consistent conditions.

MedGPT provides a treatment plan after integrating multi-round inquiries and medical test results (The aforementioned inquiries were impromptu questions raised during VCBeat’s on-site experience. The input chief complaints and test data may not be reasonable and do not represent patient data from this test.）

Of course, if patients participating in the test still have doubts about the results, they can also communicate face-to-face directly with the attending physicians from West China Hospital stationed on-site to ensure patient satisfaction.

Following the consultations, seven expert professors from Peking University People’s Hospital, China-Japan Friendship Hospital, Fuwai Hospital, and Beijing Friendship Hospital reviewed the 91 valid cases generated during the evaluation. They scored the AI doctor’s performance across seven dimensions: accuracy of consultation, diagnostic accuracy, accuracy of treatment recommendations, accuracy of auxiliary examination plans, accuracy of data analysis, provision of interpretable information, and natural language consultation and interaction.

After three hours of comparative analysis and evaluation, and incorporating the judgments and scores from all expert panel reviewers,Human doctors achieved an overall score of 7.5, while AI doctors scored 7.2. The consistency between AI doctors and attending physicians from tertiary Grade A hospitals in scoring results reached 96%.。

This outcome exceeded all expectations and received high praise from the review experts. The review experts generally agreed that,MedGPT collects sufficient information through multi-turn inquiries, advancing the consultation process with medical accuracy as a prerequisite, thereby minimizing the probability of misdiagnosis and missed diagnosis.。

Surprisingly,MedGPT also diagnosed conditions outside the scope of the consulting department based on patients' chief complaints and provided other potential differential diagnoses.. This is not easily achieved in routine specialist consultations. Based on this, the review experts concluded that MedGPT’s knowledge coverage has already surpassed that of some human physicians with limited clinical experience.

What’s more noteworthy is that,MedGPT not only achieves a certain level of consistency, but also pioneers the ability to prescribe necessary medical tests when the diagnosis is still unclear, and then provides accurate disease diagnoses and designs subsequent treatment plans based on the patient’s returned medical test data.. While this is routine practice for human physicians, it represents a major breakthrough for AI.

As early as May, MedGPT had already acquired capabilities across multiple medical testing and inspection modalities. By leveraging Medlinker’s various cloud-based services (such as “Cloud Testing”), it enables patients to complete the entire process of consultation, testing, diagnosis, and medication purchase without leaving their homes. Furthermore, after patients receive their medications, MedGPT proactively provides intelligent disease management services, including medication guidance and management, smart follow-up consultations, and rehabilitation guidance.

Currently, the Medlinker MedGPT plugin application platform has integrated over 1,000 proprietary and third-party multimodal medical capabilities, significantly enriching and enhancing the intelligent diagnosis and treatment experience across the entire patient journey. Furthermore, Medlinker is rapidly expanding its disease coverage—by the end of this year, MedGPT will increase the number of covered diseases (ICD-10 subcategories) from the current 100 to 300, raising the proportion of patient visits covered from 60% to 80%.

Although MedGPT is still in the testing phase, current progress indicates that its initial deployment to assist physicians is drawing nearer.

Accumulating Strength for a Breakthrough: Generative AI Streamlines the Entire Disease Management Process

The first release of a large language model dedicated to healthcare, the first realization of AI’s leap from online consultations to medical examinations, and the first completion of diagnostic consistency evaluations between AI doctors and human physicians with outstanding results... With every step forward, MedGPT is making history. Through a series of "firsts" in the field of healthcare-specific generative AI, Medlinker has firmly established itself as a leader in medical generative AI.

This achievement is no accident; it stems from Medlinker’s consistent accumulation and investment in this field over the years.

As early as 2017, when internet healthcare was still in the prevalent stage of online consultations and lightweight medical inquiries, Medlinker began to explore a more in-depth direction characterized by higher technical barriers, greater integration complexity, and accountability for patient outcomes. It sought to identify the true value alignment between the internet and healthcare, aiming to establish a more rational approach to serving patients. Ultimately, Medlinker committed to the pathway of whole-course disease management, benefiting a vast number of patients by providing comprehensive care that encompasses medical screening and testing, diagnosis and treatment, and rehabilitation.

Driven by this need, Medlinker gradually established and refined its capabilities in medical big data cleaning and structuring, laying the foundation for its subsequent development.

In 2018, after laying out its chronic disease management strategy, Medlinker has been continuously improving horizontal disease coverage and vertical service refinement and standardization. Centered on digital discipline construction, and under the guidance of experts, Medlinker has gradually formed standard operating procedures (SOPs) for online disease management by integrating clinical guidelines and clinical pathways, thereby establishing a professional, standardized, and effective internet-based disease management system.

Leveraging artificial intelligence technologies such as NLP and CV, along with AIoT (Artificial Intelligence of Things), Medlinker has established a presence in fields including data mining, machine learning, deep learning, and knowledge graphs. The company has implemented a series of application scenarios across prevention, diagnosis, and rehabilitation stages, such as intelligent body fluid testing, smart triage, TMD-assisted diagnosis, oral imaging recognition, and intelligent medical assistants.

It was also in these application scenarios that Medlinker’s decision-makers directly witnessed the significant empowerment AI brings to healthcare, thereby further solidifying their future strategic plans.

In 2019, Medlinker began to establish AI diagnostic and treatment models for single diseases in stages. That year, Medlinker joined hands with The Third Affiliated Hospital of Sun Yat-sen University, the International Research Center for Pharmaceutical Management at Peking University, Sanofi, and other institutions and enterprises to jointly create Asia’s first AI model for early screening in the field of multiple sclerosis.

According to independent external tests conducted by Medlinker and expert teams, the validation results were highly consistent with the model’s performance metrics. This AI-based early screening model enables 61%, 51%, and 49% of multiple sclerosis patients to receive early warnings one, two, and three years in advance, respectively, thereby enhancing risk prediction and prevention capabilities for multiple sclerosis.

These research findings were also included in the 8th Joint ECTRIMS-ACTRIMS Congress and published in the specialty journal *Multiple Sclerosis and Related Disorders*.

By 2021, Medlinker had initially established an AI-driven diagnosis and treatment system based on its internet hospital. Leveraging AI technologies such as natural language processing, image recognition, and cognitive computing, this internet hospital system integrates online services, offline care, and medical teams, significantly enhancing consultation efficiency.

What others may perceive as time-consuming and labor-intensive is precisely Medlinker’s “own pace of development,” honed over years of exploration—adopting a deliberate, unhurried approach to achieve in-depth expansion in patient disease management.

Nevertheless, Medlinker was still unable to achieve seamless, end-to-end AI-driven disease diagnosis and treatment at that time. This was because artificial intelligence technologies represented by NLP and CV, despite their advantages of strong rule-based logic and controllability, faced barriers in natural language communication and were incapable of addressing systemic and complex issues.

Large language models, represented by Transformer, possess significantly superior natural language communication capabilities and perform high-concurrency, long-range learning and integration on massive volumes of medical texts and data, thereby achieving systematic integration for complex problems.

It is not difficult to see that without years of sustained, in-depth accumulation, Medlinker would not have achieved its current success in the field of generative AI. The phrase “Heaven rewards the diligent” aptly describes Medlinker’s trajectory of rising to prominence through long-term preparation.

Final Remarks

Currently, generative AI can replace humans in solving various engineering problems across many fields, such as code engineering and experimental automation. It has become a widely recognized trend that the rational application of generative AI will disrupt the existing industry landscape.

In the healthcare sector, numerous critical pain points urgently require resolution, such as the uneven distribution of medical resources and the difficulty for patients in remote areas to access high-quality care. This is precisely where Medlinker aims to leverage generative AI to effectively supplement medical resources, enhance the overall health and well-being of the population, address deficiencies in primary care services, and improve the efficiency of public health services. By doing so, it seeks to resolve the structural challenges of relative scarcity of high-quality medical resources and insufficient capacity in grassroots medical services.

This vision has moved a step closer with MedGPT, developed by Medlinker, successfully passing the consistency evaluation against human physicians with high scores. We also anticipate that as generative AI, represented by MedGPT, continues to mature and refine, it will provide more profound empowerment to physicians in the future and significantly enhance patient satisfaction.

References:

MarketsandMarkets：Generative AI Market worth $51.8 billion by 2028, growing at a CAGR of 35.6%: Report by MarketsandMarkets

Carl Franzen，venturebeat.com：McKinsey report finds generative AI could add up to $4.4 trillion a year to the global economy