Medical GPT: Fleeting Bloom or Transformative Revolution? Insights from WAIC 2023

Jul 09, 2023 08:00 CST Updated 08:00

UNITED IMAGING

Artificial Intelligence Medical Product Developer

GE Hangwei

Medical Device R&D and Manufacturer

At the recently concluded 2023 World Artificial Intelligence Conference, Zhou Xiang, CEO of UNITED IMAGING, summarized this technology and the defining characteristics of its era with two words: “integration” and “emergence.”

The widespread recognition of the term “emergence” is attributed to Kevin Kelly’s classic book Out of Control, where it is used to describe the phenomenon in which unpredictable complex patterns arise from simple, predefined interactions among individuals within a system. This is precisely the case with today’s large language models: as model scale continues to increase, model capabilities maintain a roughly linear trend within a certain range (as seen in deep learning), and then exhibit explosive growth once a threshold is surpassed.

The development path of “integration” for future industries also serves as an “emergence accelerator.” By leveraging the ecosystem model, it transcends the boundaries between software and hardware, between imaging and text, and between operational stages and processes, thereby generating cross-scenario multimodal data and multi-threaded capabilities. In this process, “integration” supplies diverse and massive amounts of data to fuel “emergence.”

“Integration” and “Emergence” permeated the entire World Artificial Intelligence Conference, with the Health Summit likewise seeking to explore the underlying logic behind these two concepts.

It is then applied to the future medical field.

2023 World Artificial Intelligence Conference

A World of Difference: Medical Large Language Models Cannot Simply Adopt General-Purpose Large Language Model Frameworks

Compared with the general domain, large models in the medical field, while sharing certain similarities, differ fundamentally in aspects such as model design, training, and application.

“Positioning” is the greatest similarity between large medical models and general-purpose large models. Whether it is the machine learning of the past, the subsequent deep learning, or today’s generative AI and large language models, the essence of AI is a “tool for tools,” with “empowerment” as its path to value realization. AI—at least in its current stage—will not become a doctor, nor will it independently develop a drug; otherwise, Google and Microsoft would have long shed their labels as technology companies to become world-class pharmaceutical firms.

“Scenario Requirements,” “Training Data,” and “Application Targets” constitute the differences between medical large models and general-purpose large models; these three key points sharply distinguish the two types of models, initiating their respective development paths.

Let’s first discuss “scenario requirements.” In his speech, Zhou Xiang provided a comprehensive explanation of the differences in scenario requirements between the two. He argued that general-purpose large language models cannot fully meet the needs of medical scenarios for three reasons:

First, the professionalism and seriousness of the healthcare industry are beyond doubt. Given the low tolerance for error in medical scenarios, higher standards are naturally imposed on large language models (LLMs). Specifically, AI must provide more professional and precise medical recommendations based on specialized medical corpora. Second, over 90% of current medical data originates from medical imaging. This implies that a practical and effective medical AI large model capable of complex decision-making must integrate multimodal information—including medical images, text, and even audio or video—to empower various healthcare scenarios. Finally, considering the actual deployment environments and data security requirements of hospitals at this stage, “large models” cannot be infinitely “large”; the accessibility of in-hospital applications is a critical factor that must be taken into account.

Next is “training data.” The multimodal nature of medical data effectively dilutes the entire dataset, dispersing massive volumes of medical big data across various scenarios. As a result, the size of each subset rarely reaches the threshold required for emergent behavior to occur within the system.

Most of the released large medical models have largely focused on textual data. For instance, MedGPT, recently launched by YiLian, connects various stages of doctor-patient text-based communication, such as chief complaints, follow-up visits, and medication purchases, but has not yet integrated imaging data into its large model framework.

In the long term, the full realization of large models’ value in the healthcare sector still requires the support of big medical imaging data. While algorithm-related technologies such as computer vision (CV) and privacy-preserving computation have already reached the application stage, infrastructure for computing power allocation and the volume of imaging data have become the key constraints on model development. Therefore, it remains essential to advance the construction of infrastructure—including computing power allocation systems and large-scale medical imaging databases—as well as the development of cross-hospital datasets. This necessitates a shift in mindset among data custodians, coupled with sustained effort over time, to achieve an organic “integration” within the healthcare domain.

Finally, there is the “target user.” The vast majority of clients for large medical models are in the B2B and G2G sectors, representing a highly serious domain that requires precise judgment supported by evidentiary basis. Unlike general-purpose language models, which may provide ambiguous answers or generate images merely characterized by an accumulation of elements, large medical models must enhance the precision of their conclusions to deliver accurate decisions and recommendations.

Nine Directions: The “Emergence” and “Integration” of Large Medical Models

Clarifying the Differences Among Large Models to Establish Their Development Path. For medical large models, the existence of these differences means they cannot replicate the “emergence” and “integration” achieved by general-purpose large models, but must instead forge a unique path to improve the models and explore their value.

At the Health Summit, Min Dong, Deputy Director of the Cloud and Big Data Institute at the China Academy of Information and Communications Technology (CAICT), provided a comprehensive overview of nine potential application areas for large medical models, offering insights that could inspire the development of this emerging industry.

1. Assisted Diagnosis, Assisted Decision-Making

Compared with traditional CDSS, large models have more extensive training data sources and more efficient self-purification capabilities, thus bringing significant improvements to CDSS.

In terms of auxiliary diagnosis, physicians are required to mobilize extensive medical knowledge and retain vast amounts of patient information, which often leads to fatigue. The introduction of such large language models can assist physicians in documenting information and alleviating fatigue. Meanwhile, by learning from data such as electronic health records and medical literature, these models can engage in linguistic interactions with physicians, thereby enhancing the accuracy and efficiency of diagnosis. Both pathways contribute to improving the quality and efficiency of clinical diagnosis.

2. Treatment Plan Generation

It can rapidly generate treatment plans after patient intake in areas such as emergency pharmaceutical care, orthopedics, and bacterial infections. Particularly in emergency settings, large language models can quickly formulate treatment strategies based on patient information during the rescue process, assisting physicians in making faster diagnoses and buying more time for patient care.

3. Quality Control

Automated data entry, formal quality control, and substantive quality control can be performed on structured medical documentation. Given that physicians have varying documentation styles and limited energy, large language models can rapidly generate standardized medical documentation templates. These templates feature clear quality control logic and rich content expression, enabling accurate, standards-compliant documentation entry and reducing the burden on physicians during writing and review processes.

4. Patient Services

Large language models can provide patient triage and answer questions using plain language. Traditional patient education requires physicians to invest significant effort in creating materials, striking a balance between professionalism and readability, and often addressing follow-up queries during subsequent communications. In contrast, large language models can generate relevant patient education materials tailored to the patient’s native language background and engage in conversations to deliver the information patients need.

5. Hospital Management

It can generate various forms required for hospital management, providing auxiliary decision-making support for hospital administrators. It compiles statistics on data covering multiple aspects, including physicians’ basic information, clinical competencies, hospital logistics, and hospital finance, and then generates dynamic management plans tailored to the hospital’s current status, enabling intelligent and efficient allocation of medical resources. Taking medical equipment management as an example, large language models can plan procurement and maintenance schedules for various medical devices, generate maintenance-related forms, and effectively improve management efficiency.

6. Teaching and Research

In terms of research, large language models can play a significant role in topic selection and project initiation, research design, result analysis, and manuscript preparation. It is important to note, however, that these models may still exhibit issues such as fabricating references, failing to accurately attribute scientific contributions, and lacking accountability for generated content, all of which remain to be addressed in future developments.

In the realm of medical education, large language models can assist physicians in developing teaching materials and address certain student inquiries. Physicians often devote substantial time to preparing lesson plans and responding to highly repetitive questions from students, which diverts their energy away from clinical and research responsibilities. The introduction of large language models can alleviate this burden by supporting the learning of first-year residents, thereby freeing physicians from routine teaching tasks and enabling them to focus on clinical practice and scientific research.

7. Traditional Chinese Medicine

Traditional Chinese Medicine (TCM) often faces challenges in making its medical knowledge explicit and structured, which hinders the inheritance of such knowledge. The introduction of large language models can facilitate data mining of TCM-related information, promote the construction of a systematic knowledge framework, and generate standardized diagnosis and treatment plans for patients.

8. Drug R&D and Sales

In R&D, large language models can enhance target discovery efficiency and construct complex molecules during drug discovery and preclinical research. They also provide support in clinical trials by offering recommendations on trial design and statistical methods, thereby significantly improving the efficiency of drug development.

In terms of sales, automated and intelligent methods can be used to connect with target users during drug promotion, reducing marketing costs and improving marketing efficiency.

9 Public Health

Used to assist in big data analysis and trend assessment in epidemiology. Due to the complexity and stochastic nature of transmission modes and pathways, disease progression exhibits significant uncertainty and variability, exceeding the capabilities of conventional algorithms. In contrast, large language models can effectively support big data analysis and prediction in epidemiology, providing more accurate assessments. Currently, numerous research institutions and hospitals both domestically and internationally are conducting relevant explorations and have achieved promising results.

Standards and Ethics: Constraining or Protecting Large Language Models?

It is difficult to predict which of the nine aforementioned directions will first yield high-quality large medical model outcomes. However, it is certain that fostering the robust development of large models requires creating an inclusive platform where enterprises, hospitals, universities, and research institutions can all contribute effectively. This necessitates regulatory bodies enacting legislation and industry experts establishing consensus-based standards to create a fair competitive environment for large models at an early stage, thereby guiding technological advancement toward beneficial ends.

At the Health Summit, the China Academy of Information and Communications Technology (CAICT), the Medical Management Service Guidance Center of the National Health Commission, the Shanghai Industrial Innovation Center of CAICT, iFlytek Medical Technology Co., Ltd., Peking Union Medical College Hospital, Institute of Intelligent Medicine at Fudan University, Tongji Medical College of Huazhong University of Science and Technology, the First Affiliated Hospital of University of Science and Technology of China, the National Clinical Research Center for Orthopedics, Sports Medicine and Rehabilitation, and the Cardiovascular Health Alliance jointly participated in the launch ceremony for the research on standards for large models in the healthcare industry, marking the first step toward promoting the standardized development of medical large models.

In the future, institutions led by the China Academy of Information and Communications Technology (CAICT) will accelerate frontier research; develop a three-tier technical standard framework for AI large models tailored to the characteristics of healthcare industry applications (infrastructure layer, model layer, and application layer); and, relying on laboratories, conduct assessments of the compliance, safety, controllability, and reliability of medical AI large models from three aspects: data processing, algorithmic models, and service management, thereby promoting standardized development and developmental standardization within the industry.

Beyond standards research, ethical issues surrounding generative AI were also a core topic of discussion at this year’s World Artificial Intelligence Conference. The “Initiative on Ethics and Governance of Generative AI” and the “Ethical Handbook for AI in Medical Imaging” were released successively. These initiatives aim, on one hand, to address lingering concerns regarding trustworthiness, privacy, and healthcare applications associated with contemporary AI; and on the other hand, to prepare for the advent of generative AI, preventing emerging technologies from falling into misuse.

Opinions among experts on site regarding the formulation of various standards for the World Artificial Intelligence Conference were not uniform. Some experts argued that the healthcare industry should approach the development of new technologies with caution, deconstructing technology through layered standards to ensure its implementation under the premise of trustworthiness. This approach serves not only as a protection for patients but also for AI technology itself.

Some experts also argue that technology and regulation do not develop in tandem through a consistent upward spiral. Therefore, when confronting emerging technologies, efforts should be made to identify the most appropriate “degree” of regulatory intervention, avoiding insufficient oversight that may steer technological applications away from beneficial outcomes, as well as excessive regulation that could stifle technological adoption and innovation.

A Rational Perspective on Large Medical Models

Although every forum at the World Artificial Intelligence Conference was dominated by large language models and generative AI, some companies remained steadfast in their own strategic approaches, methodically advancing the development of their AI applications.

For instance, GE Healthcare released the “Report on Innovating for a New Future of Health 2023” at the forum and upgraded its Edison Digital Health Ecosystem to version 2.0. Mao Xinsheng, Chairman of Shukun Technology, discussed AI innovation in China, emphasizing the development of original AI products tailored to the characteristics of the Chinese population in areas such as cardiovascular and pulmonary diseases. Meanwhile, 91360 focuses on innovations in digital pathology, continuing its efforts to address screening challenges for common cancers, including breast cancer.

After all, the current “emergence” and “integration” of large medical AI models have not transcended the application scope of first-generation deep learning-based AI, nor have they demonstrated new commercial pathways to address the long-standing challenge of high costs and low returns associated with various AI technologies. These models still require time for accumulation, both to achieve technological self-emergence and to penetrate clinical practice, achieving deep integration with healthcare.

Before reaching that critical threshold, contemporary medical AI must not be abandoned.