What Kind of Large Language Model Do Hospitals Really Need? Insights from a Newly Filed Prospectus

Aug 31, 2023 08:00 CST Updated 08:00

The emergence of ChatGPT has reshaped people’s imagination of AI. In a short period, model developers and application builders have rushed into the field. Within just three quarters, the number of large language models specialized in healthcare on the market has reached double digits.

However, healthcare differs from other industries. Reviewing the historical trajectory of medical technology development, whether an emerging technology can be scaled for application in healthcare institutions depends not on the number of market participants, but on whether the products derived from the technology can truly integrate into physicians’ workflows.

That is where the problem lies.

For large language models, what are the genuine needs of hospitals? How can these genuine needs be met?

What Is the Significance of Large Language Models in Healthcare?

AI with commercialization potential typically exhibits two characteristics: high frequency and efficiency enhancement.

So-called "high-frequency" refers to the frequent application of large medical models across various scenarios. According to Wang Tao, President of Winning Health, medical applications derived from generative AI are poised to become indispensable personal assistants for healthcare professionals. Functioning as a "super brain," these systems will not only efficiently allocate and organize data resources but also enable autonomous reasoning, thereby enhancing work efficiency and quality of care while providing effective decision support.

Next is efficiency enhancement, which delivers tangible benefits to the purchasers of algorithms. An ideal large healthcare model should be capable of real-time quality control across various medical processes in hospitals; it should meet the IT requirements imposed on hospitals by policies at minimal cost, thereby creating new possibilities for hospital development and operations.

There are numerous scenarios in hospitals that fit this profile. Two classic examples of AI application needs are doctor-patient communication and medical record documentation.

Let us first discuss doctor-patient communication. Unlike other sectors of the tertiary industry, the “points of contact” between doctors and patients are highly frequent, permeating the entire diagnosis and treatment process. However, due to information asymmetry between doctors and patients, a significant amount of ineffective communication is interspersed throughout.

For healthcare providers who are already in short supply, a large volume of fragmented and repetitive communication permeates their daily work, continuously generating stress while encroaching on time dedicated to clinical care and scientific research. For patients, the pressure faced by some medical staff hinders thorough communication, which adversely affects the medical experience to varying degrees and may even serve as a trigger for doctor-patient conflicts.

Revisiting Medical Record Documentation. At a recent healthcare IT conference, numerous hospital presidents and officials from the National Health Commission actively discussed the future of large language models (LLMs) and related technologies: Can LLMs be leveraged to assist in quality control processes? Can generative AI be employed to automatically draft medical records? They argued that understanding and documenting medical records constitutes an extremely repetitive and time-consuming aspect of physicians’ daily routines. Only by relieving doctors of these tedious tasks can they deliver greater value.

Therefore, the significance of large models for hospitals lies inIdentify high-frequency workflows in hospital settings to assist medical staff in efficiently completing highly repetitive tasks, thereby leveraging their comparative advantages and enabling them to focus on clinical diagnosis, treatment, and scientific research, ultimately promoting the sustainable development of the hospital’s overall capabilities.

Yet some may ask: Isn’t this a long-standing need that natural language processing (NLP) has already addressed? The answer is yes, but not entirely.

What Makes Large Language Models So Powerful?

In the early 1990s, technicians began attempting to use IT methods to assist physicians in diagnosis and treatment, thereby reducing their workload. However, the “expert systems” developed at that time were flawed; simple data mapping or basic algorithms proved inadequate even for relatively straightforward medical problems.

After all, medical diagnosis relies on the traditional principles of “inspection, auscultation and olfaction, inquiry, and palpation,” requiring physicians to gather information through multiple sensory channels, utilize various auxiliary methods such as laboratory and imaging tests, and integrate medical knowledge with logical reasoning to arrive at an effective diagnosis. Relying solely on a patient’s brief chief complaint, without comprehensively evaluating factors such as age, physical condition, and past medical history, renders such a “diagnosis” little more than a matter of chance.

Therefore, “expert systems” never truly made the transition from theory to practice. It was not until the second decade of the 21st century, with the emergence of traditional rule-based or statistical natural language processing (NLP) techniques, that enthusiasm for intelligent technologies was rekindled. In recent years, NLP-based systems for medical record quality control and clinical decision support systems (CDSS) have been widely adopted in both healthcare management and clinical practice.

However, this type of NLP also has its limitations.

“Although earlier NLP techniques based on rules or statistical methods enhanced the analytical capabilities of artificial intelligence, they did not break away from the logic of ‘input information–database association–conclusion search.’” Zhao Daping, CTO of Winning Health, stated in an interview. “This reasoning approach only considers the ‘preceding context’ while neglecting the ‘subsequent context.’ In contrast, physicians’ deductive processes not only encompass data provided by various reports but also infer possibilities beyond the data based on prior experience.”

Strictly speaking, large language models are also a form of NLP; however, compared to rule-based or statistical NLP approaches, knowledge- and data-driven large language models possess greater capacity for self-evolution.

Specifically, the essence of evolutionary capability stems from the stochastic gradient descent algorithm employed in training neural networks. This algorithm enables the program to explore randomly in a certain direction generation after generation, which is, to some extent, equivalent to genetic mutations in species.

Under intense survival pressure, erroneous explorations are eliminated while correct ones are accumulated; over time, certain higher-order functions emerge. These emergent higher-order capabilities not only automatically extract key information and understand and generate information, but also integrate knowledge within the model to complete incomplete information, thereby forming comprehensive reasoning.

Such capabilities are particularly valuable in medical scenarios. For instance, in the context of assisting physicians with interpreting medical images, large-model-based artificial intelligence not only provides diagnostic support based on the given imaging data but also leverages prior learning and integrated knowledge to address image artifacts, thereby delivering more precise auxiliary diagnostic results at a faster pace.

Furthermore, the communication capabilities of large language models (LLMs), as demonstrated in ChatGPT, have been witnessed globally. When adapted to vertical domains through pre-training and fine-tuning, these models can be effectively employed in doctor-patient interactions, optimizing applications such as intelligent triage, intelligent consultation, and intelligent follow-up. For more complex medical record documentation, they enable AI to move beyond rigid templates, understanding and generating electronic medical records with the proficiency of a seasoned clinician.

Where Lie the Greatest Challenges in the Clinical Deployment of Large Medical Models?

Beneath its disruptive potential, numerous internet companies and leading healthcare IT firms have flocked to the large language model (LLM) arena. However, making LLMs clinically applicable is no easy feat; if model training is already arduous, the practical implementation of these models presents even greater challenges.

Before understanding the difficulties in implementing large models, let us first examine the DIKW knowledge model proposed by the British writer T.S. Eliot. Simply put, this model divides knowledge in a broad sense into a four-tier pyramid structure: Data, Information, Knowledge, and Wisdom.

图片 1.png

The DIKW Knowledge Model and Contemporary Artificial Intelligence

Under traditional logic, we continuously refine data, elevating it into knowledge and even wisdom to serve as the foundation for applications. In the era of artificial intelligence, however, we focus on processing the second layer of information flow. On one hand, we extract the essence from the coarse, elevate dimensions to acquire knowledge, and build the foundation for intelligence; on the other hand, we standardize processes, reduce dimensions to obtain data, and establish the foundation for digitalization. With knowledge and data in place, supplemented by algorithms and computational power, a large model is completed.

In an interview, Wang Tao compared the two models, arguing that the shift from full-scale dimensional elevation to the “one-up, one-down” approach in the era of large language models essentially represents a transformation in thinking logic. Within the healthcare system, this change entails rearchitecting from a “technology-supports-application” model to one of “parallel integration of technology and application.”

Therefore, the urgent issue to be addressed in the deployment of large language models is their compatibility with medical information systems.

Few healthcare IT companies currently offer hospital information management systems that support “parallel technology application.” Only a handful of leading enterprises recognized the necessity of architectural changes during the surge in deep learning popularity and have developed medical management systems capable of ensuring optimal AI performance.

Taking Winning Health as an example, its next-generation healthcare management information system, WiNEX, was designed with architectural support for “Intelligence” in mind. Its embedded EA+AI intelligent architecture ensures that AI operations are supported at every layer.

图片 2.png

Winning Health EA+AI Intelligent Architecture

WiNEX’s innovative architecture is highly suitable for the iterative application of big data in hospital settings. To ensure the smooth operation of large models during deployment and their continuous learning capability, merely recording data within information workflows is insufficient; it is also essential to consider the data computation architecture and enable real-time interaction between business processes and technical workflows. Therefore, compared with traditional Enterprise Architecture (EA), the fundamental shift introduced by EA+AI intelligent architecture lies in the restructuring of business and technology domains. It parallelizes GPU-based computing architectures with CPU-based architectures to jointly support business operations, thereby laying the foundation for the system environment of large models.

Supported by an intelligent architecture, Winning Health has found it much easier to develop large language models. In fact, Winning Health has established an intelligent service layer for its self-developed medical large model. Positioned between business applications and data processing, this layer enables the large model to interact in real time with business operations and data, facilitating the upscaling and downscaling of data across any process, thereby feeding back into the iteration and upgrading of the large model.

图片 3.png

Winning Health Medical Large Model WiNGPT Technical Framework

Of course, system issues are merely one important branch of the many challenges involved in practical implementation. To ensure the stable operation of large language models (LLMs) in hospitals, developers must also pay attention to certain details. For instance, they should strive to achieve a “seamless” experience, enabling physicians to rapidly leverage LLM functionalities without frequent context switching.

This seemingly routine requirement poses a major challenge for internet companies. In the absence of business systems such as Hospital Information Systems (HIS) and Picture Archiving and Communication Systems (PACS), large language models often exist as external add-ons.

Although the plug-in architecture does not significantly impact model performance, it requires physicians to switch to the large language model (LLM) interface for activation with each use, thereby substantially compromising the user experience and workflow efficiency.

By comparison, medical IT companies clearly hold a distinct advantage. Leveraging WiNEX as its foundation, Winning Health enables physicians across various departments to access the extensive capabilities of large language models without altering their existing clinical workflows, thereby securing a leading position in practical implementation.

Are Larger Large Language Models Always Better?

When discussing general-purpose models, the number of parameters largely determines their knowledge acquisition capabilities and ability to handle complex tasks. Consequently, general-purpose models launched by China’s internet giants feature parameter counts in the hundreds of billions, while the GPT series has reached the trillion-parameter scale.

However, in vertical domains such as healthcare, larger model size is not always better. At times, excessively large models can become a burden that hinders their commercialization.

Since clinically relevant data cannot leave the hospital premises, large language models (LLMs) can only be deployed on-site after being encapsulated. However, the current IT infrastructure in most hospitals is primarily based on CPUs designed for general-purpose computing, with very few institutions possessing GPU resources optimized for graphics processing and parallel computing. This lack of suitable deployment environments necessitates the procurement of GPUs alongside the application to operate LLMs, while also ensuring sufficient storage capacity and high-speed network connectivity. If the model size is excessive, the configuration costs for hospitals will rise sharply.

To address the aforementioned challenges, Winning Health has consistently sought to control parameter scale while ensuring application quality. Zhao Daping stated, “Winning Health’s existing large language model, WiNGPT, which features 13 billion parameters, already meets the requirements of most medical scenarios. For such a large model, the GPU and hardware configuration cost for a single department can be kept under RMB 100,000. Furthermore, hospital-wide deployment of the large model can adopt enterprise-level GPU parallelization solutions, with configuration costs not exceeding RMB 1 million.”

So, what scale of parameters is required to fully address a hospital’s overall needs?

Based on WiNGPT’s current performance, Wang Tao believes that the parameter count for Chinese medical large language models can be maintained at approximately 15 billion, while multimodal large models incorporating both language and imaging data can be kept within 50 billion parameters. WiNGPT targets a parameter count of 13 billion, and in its upcoming third-generation large model—a multimodal medical imaging model—30 billion parameters will suffice to meet numerous needs in vertical domains.

Do Large Medical AI Models Need a "Killer App"?

To date, many enterprises have launched multiple applications based on large language models. Taking Winning Health as an example, its document generation capability can automatically extract key information, understand and generate documents such as electronic medical records (EMRs) and discharge summaries, and structure medical record sections, thereby reducing physicians’ workload. Its medical imaging interpretation feature assists physicians in analyzing medical images, such as X-rays, CT scans, and MRIs, and generates imaging reports and health examination reports, improving the accuracy and efficiency of diagnosis.

At first glance, most enterprises are still applying large models to traditional scenarios, failing to identify a killer application that would drive their widespread adoption.

However, this situation may open up a new path. In Wang Tao’s view, the positioning of large medical models should inherently be that of a “Copilot,” serving as an intelligent assistant to physicians, growing alongside them, and jointly addressing challenges in clinical diagnosis, treatment, and scientific research. This is also the vision behind Winning Health’s development of large models.

图片 4.png

In the ideal era of the future, the so-called “killer app” will be for every doctor to have a large language model that aligns with their individual style and is precisely tailored to their needs.