Just as deep learning once surged in popularity, emerging large language models are sweeping through the healthcare industry at a visibly rapid pace. Within less than six months, numerous leading hospitals have already deployed this cutting-edge technology, proactively embarking on the exploration of the next generation of medical artificial intelligence.
Under these circumstances, many research institutions have expressed confidence in the prospects of large medical models. Much like the successes of ChatGPT and Sora, they believe that large medical models can achieve scalable deployment more rapidly than deep learning, thereby more effectively driving the intelligent development of hospitals.
However, the strict adherence to evidence-based principles in medical decision-making prevents existing vertical large models from replicating the success trajectory of general-purpose large models. On one hand, current AI capabilities are limited in handling multimodal data, making it difficult to achieve comprehensive decision-making through integrated information as a real physician would, nor can they accurately and independently articulate the specific reasoning process behind their decisions.
On the other hand, healthcare institutions place a high priority on data privacy protection, adhering to the principle of keeping data within hospital premises whenever possible. This means that demand-side entities must invest in their own computing infrastructure to ensure the normal operation of large language models.
IBM Watson’s exit served as a cautionary tale for the entire life sciences sector: when evaluating a potentially emerging technology, we must not rely solely on “subjective impressions” from non-healthcare industries to assess its disruptiveness and viability. Instead, we must also address practical considerations specific to healthcare applications, such as how it integrates into diagnostic and treatment workflows, navigates regulatory review and approval processes, and achieves commercialization.
So, can today’s large medical AI models overcome these challenges and carve out their own path to commercialization?
Whether it is the machine learning of the past, the subsequent deep learning, or today’s generative AI and large language models, the essence of AI remains a “tool for tools,” seeking to realize value through “empowerment.”
For such a new tool to take root in hospitals, it must identify the tools it serves and integrate as “seamlessly” as possible into physicians’ workflows. Amid their busy schedules, doctors will not appreciate having to launch another software application just to support the use of a given program.
Furthermore, “high-frequency” usage is also a prerequisite for large medical models to deliver value. An ideal large medical model should be capable of autonomously extracting, processing, and analyzing various types of data, performing real-time quality control over hospital medical processes, and meeting all non-clinical needs of physicians at minimal cost. If a large model fails to perform these tasks with high frequency, it cannot truly empower hospitals, and naturally, no hospital would be willing to pay for it.
Within the vast hospital setting, it is not difficult to find scenarios that meet the aforementioned requirements. In fact, AI systems in the NLP era have already established proven models for the practical deployment of large language model applications.
Pre-consultation is a typical scenario where large language models can deliver additional value. In the past, internet technologies and natural language processing (NLP) have jointly worked to help physicians streamline this time-consuming yet essential process. However, due to limitations in intelligence levels, algorithms often failed to clearly extract patients’ chief complaints or accurately address questions encountered during the medical consultation process.
In contrast, the logical analysis strengths of large language models enable them to effectively integrate data from multi-turn dialogues and extract useful information, thereby providing comprehensive and effective recommendations that align with patients’ intentions.
Medical record documentation is another scenario where large language models can frequently deliver significant value. In physicians’ daily workflows, understanding and documenting medical records is an extremely repetitive and time-consuming process. Large language models can liberate physicians from these tedious tasks, enabling them to focus on higher-value activities.
Although these applications have long been present in hospitals, this does not preclude large language models from reengineering them with greater efficiency and at lower cost. Moreover, precisely because clinicians are familiar with these scenarios, large language models can deeply integrate into clinical workflows without disruption, thereby facilitating scalable deployment.
Given that text-based vertical large language models (LLMs) in healthcare can achieve rapid breakthroughs by leveraging existing mature scenarios and demands, can multimodal vertical LLMs in healthcare also take root in the field of medical imaging, where machine learning has already achieved deep penetration?
Before answering this question, we must first determine the role that such large language model applications play in healthcare settings.
If it aims to replace the AI-assisted diagnostic tools currently used in radiology and clinical departments, it will clearly not make significant headway in the short term.
Auxiliary diagnostic products and auxiliary therapeutic products strictly rely on clinical evidence, requiring algorithms to reproduce the given results and provide corresponding evidence. Given the current application status of vertical large models, although they can produce a deterministic output based on input requirements, repeated inputs for the same requirement often lack consistency. In other words, when imaging inputs are overly complex and high precision is demanded, large models fail to accurately reproduce the provided answers.
Even if AI companies can overcome the aforementioned technical challenges, they will still face a period of stagnation during the market access phase. This is because current review and approval documents do not address the key approval criteria for products based on large language models.
In the past, AI imaging companies spent years coordinating with the Center for Medical Device Evaluation to secure approval for their deep learning products. Even with new algorithms gaining a first-mover advantage, the process is still estimated to take at least one year.
If it is deployed in comprehensive departments such as pathology or in scientific research settings, then large language models of this type do indeed have the potential for large-scale implementation at the current stage.
Currently, enterprises have developed specialized large language models for the field of pathology. These models can generate results based on image findings, such as tissue distribution and microscopic examination descriptions, or provide suggestions regarding pathologists’ diagnostic conclusions (without directly issuing final diagnoses). Theoretically, they can replace machine learning-based computer-aided pathological diagnosis software, thereby improving diagnostic efficiency and reducing the rates of missed and misdiagnoses during the diagnostic process.
To address the “black box” problem that is difficult to avoid in AI, some companies specializing in vertical large language models for pathology have incorporated an additional logical underpinning into their models. This layer is designed to eliminate potential “hallucinations” in the model’s outputs and to visualize the model’s decision-making pathways. Through this approach, these companies may be able to resolve issues related to the traceability and interpretability of artificial intelligence.
Research scenarios are currently the most likely context for the large-scale implementation of multimodal medical large models.
Previously certified deep learning-based AI products were capable of precisely delineating specific lesions in particular organs such as the lungs, heart, and brain, whereas large language models have broken through this limitation.
Nowadays, large models developed by some medical imaging AI companies can delineate and annotate any lesion in any medical image, effectively improving the efficiency of medical research and enabling physicians to rapidly conduct studies on rare or non-mainstream diseases at a low cost.
Given the high alignment between the capabilities of large language models (LLMs) and the informatization needs of hospitals, many leading medical IT companies have rapidly deployed LLMs by leveraging their existing hospital management systems. However, non-leading medical IT firms and developers of multimodal large models lack such a foothold for rapid application deployment. How can they overcome the barriers to market entry?
A summary of the strategic approaches adopted by existing large language model (LLM) companies reveals that their methods for breaking through market barriers can generally be categorized into two types. To compete with established medical IT companies, many enterprises choose to develop LLMs as standalone products that are externally integrated into existing health information systems. Although this approach may slightly increase the operational burden on physicians, the added procedural complexity remains within an acceptable range, given that numerous software solutions have been successfully implemented using similar integration methods.
If hospitals are no longer the direct payers for large language models (LLMs), implementation becomes more challenging but also holds greater potential. For instance, enterprises can partner with medical device manufacturers and commercial insurance companies to build an ecosystem around LLMs, achieving deployment by empowering end-users with intelligent capabilities. In this model, the payer shifts from hospitals to business-to-business (B2B) clients, allowing startups to scale rapidly while mitigating certain implementation risks.
Last year, Baidu Health launched an AI-powered drug leaflet based on a large language model. It not only allows patients to read medication instructions but also enables them to ask questions via text or voice input. This approach saves patients the time spent reading lengthy materials and provides direct access to accurate information from the leaflet, which is particularly beneficial for elderly individuals who have difficulty reading printed documents.
However, Baidu Health does not expect to achieve profitability through its consumer-facing (C-end) services. What it values is the “critical communication channel between pharmaceutical companies and patients” generated during user engagement. For instance, Baidu can help pharmaceutical companies gather data on the usage, dosage, and contraindications of existing drugs, thereby guiding subsequent drug development and ultimately capturing the value of its large language models from pharmaceutical enterprises.
In the realm of multimodal large language models, a typical case study emerges from the collaboration between EVIDENT, an international optical technology enterprise, and DeepThinking. Specifically, EVIDENT’s microscope and camera hardware products are integrated with DeepThinking’s vertical large model, Dongni, to jointly create the “Huiyan” AI platform. This platform leverages AI technology to assist physicians in interpretation and remote consultation under microscopy, providing pathologists with convenient auxiliary tools that enhance quality and efficiency.
As previously mentioned, it is nearly impossible for multimodal large models to directly enter clinical settings. However, by collaborating with medical device manufacturers, they can bypass various regulatory approval processes to deliver auxiliary value and establish new B2B revenue streams.
Returning to the initial question, numerous large medical AI models have already achieved scaled deployment and attained a certain degree of commercialization. However, these applications remain fragmented, lack systematic integration, and require deeper clinical penetration. Consequently, the total addressable market accessible to enterprises through these solutions is limited, which is insufficient to demonstrate that this technology has yet found a viable path to commercialization.
Therefore, for numerous medical large language model (LLM) companies, the most critical priority at present is to further enhance the models’ capability to process multimodal data. In an ideal scenario, a multimodal LLM should not merely classify various types of medical data; it should also extract key insights from each modality and provide comprehensive recommendations.
It is worth noting that the vast majority of enterprises have been exposed to and applied large language models for less than a year, so it is understandable that they have yet to identify a “killer app.”
Paving this new path is a long and arduous journey, but fortunately, they still have ample time.