Can ChatGPT Disrupt Medical AI? Insights from Recent IPO Filings

Feb 19, 2023 08:00 CST Updated 08:00

Since 2023, the heated discussions surrounding ChatGPT have reignited market interest in medical AI.

In the past, artificial intelligence models in the healthcare sector were mostly limited to processing single-modality data and addressing relatively narrow medical issues, such as identifying nodules on chest CT scans. In contrast, ChatGPT can be trained on multiple data types, enabling it to provide comprehensive medical advice akin to that of a physician.

However, public perception of ChatGPT’s value varies. Some believe that large language models (LLMs) can revolutionize AI reasoning logic and optimize algorithms for interpreting medical images and texts. Others argue that this technology has been around for years and is merely a rehash of old ideas, with quantitative changes but insufficient qualitative breakthroughs.

To clarify whether ChatGPT can reshape the global landscape of medical AI and to explore the industry’s future development prospects, VCBeat engaged in dialogues with multiple industry experts, attempting to address these questions one by one.

"Approved, but not yet in clinical use"

IBM Watson’s exit served as a warning to the entire life sciences sector: when confronting a potentially emerging technology, we cannot rely solely on “subjective impressions” from non-healthcare industries to judge its disruptiveness and usability. We must also consider practical issues such as how it integrates into clinical diagnosis and treatment workflows, how it navigates regulatory review and approval, and how it achieves commercialization within the healthcare domain.

Regulatory review and approval are critical determinants of whether AI can enter the market, representing a core hurdle that ChatGPT cannot bypass in its clinical integration. Let us consider a hypothetical scenario: If an AI system based on ChatGPT were to provide auxiliary diagnosis as a medical device, what regulatory pathway would it need to follow, and which medical device standards would it need to meet?

MedTech Dive once conducted a comprehensive statistical analysis of FDA-approved AI products. As of October 5, 2022, the FDA had authorized a total of 521 AI/ML medical device applications. The vast majority followed the 510(k) pathway, a small portion received Premarket Approval (PMA), and only 18 devices were approved through the De Novo classification process. After all, the 510(k) pathway streamlines the approval process for medical AI. This is particularly relevant for many imaging equipment manufacturers, whose AI applications may target only specific modules. As long as developers can demonstrate that their devices are “substantially equivalent” to those already on the market, they are not required to conduct new clinical trials.

The National Medical Products Administration (NMPA) has adopted a relatively cautious approach to the authorization of AI/ML-based medical devices, with no fast-track pathway analogous to the 510(k) clearance available. However, as the regulatory approval framework has continued to improve, a large number of Class II and Class III intelligent medical devices have emerged since 2018. In particular, after Keya Medical’s “DeepVessel FFR” obtained a Class III medical device registration certificate—marking the first time that “deep learning” was explicitly included in the basic information of a registration certificate—the approval of medical artificial intelligence products has experienced explosive growth.

Number of AI-based medical devices approved by the NMPA and FDA over the years (NMPA data includes only Class III medical devices)

Therefore, focusing solely on approval pathways, both the NMPA and the FDA are embracing valuable AI technologies. If a company integrates ChatGPT-based AI into its own devices and can demonstrate that they are “substantially equivalent” to already marketed devices, it is highly likely to achieve market clearance through the 510(k) pathway. The “Guiding Principles for Registration Review of Artificial Intelligence Medical Devices,” issued by the NMPA in March 2022, expanded the scope of approval for core AI algorithms. If LLMs can prove their value, they may also be able to enter the approval process under the existing regulatory framework.

Further Discussion on Potential Application Scenarios for ChatGPT. The composition of approval submissions to the National Medical Products Administration (NMPA) and the U.S. Food and Drug Administration (FDA) is broadly similar. As of October 5, 2022, among the 521 AI/ML-based medical device applications authorized by the FDA, over 75% were for computer-aided diagnosis products, and 13% were for computer-aided treatment products. Among the 70 AI/ML-based medical device applications authorized by the NMPA, over 71% were for computer-aided diagnosis products, and 24% were for computer-aided treatment products.

Auxiliary diagnostic products and auxiliary therapeutic products strictly rely on clinical evidence, requiring algorithms to reproduce given results and provide corresponding evidence. In contrast, current applications of the ChatGPT model can generate a definite output based on keyword inputs, but repeated inputs of the same keywords do not yield consistent results. In other words, when input information is overly complex and high precision is required, ChatGPT cannot accurately reproduce its previous answers, making it difficult to apply in these two fields.

Next-generation clinical decision support systems (CDSS) are among the sectors most likely to be disrupted by ChatGPT. Next-generation CDSS relies on natural language processing (NLP) and can only process textual information. In contrast, the large language models (LLMs) underpinning ChatGPT incorporate not only NLP but also numerous other systems, enabling them to integrate electronic medical records, imaging data, laboratory test results, genomic data, and even microbiome sequence information.

VCBeat’s analysis of FDA-approved AI projects from 2020 to 2022 reveals that while AI for assisted diagnosis and treatment remains dominant, the number of approved Clinical Decision Support System (CDSS) products has risen significantly compared to pre-2020 levels. (In China, CDSS products typically do not require review and approval by the National Medical Products Administration [NMPA]; only Senyi Intelligent’s VTE risk assessment software has obtained Class II medical device certification.)

2020–2022 FDA-Approved AI Medical Devices (Partial List)

For the healthcare system as a whole, AI-driven oversight and empowerment of primary care can effectively improve disease prevention efficiency. By promoting early treatment, this approach reduces long-term expenditures from medical insurance accounts. From this perspective, ChatGPT-based applications may hold potential for practical implementation.

Who Endorses ChatGPT’s Decisions?

Research findings published in the journal PLOS Digital Health by researchers at the U.S. startup Ansible Health indicate that ChatGPT can achieve “approximately 60% of the passing threshold” on licensing examinations. Another study evaluated ChatGPT’s diagnostic performance using 45 clinical cases and found that it correctly identified the diagnosis in 39 cases (an accuracy rate of 87%), significantly outperforming previous symptom-checking tools as well as earlier versions of ChatGPT (82%). Consequently, many experts consider clinical decision support systems (CDSS) to be an effective practical application pathway for ChatGPT.

Supported by data, ChatGPT can clearly serve as an effective clinical decision-support tool; however, for true integration into clinical practice, AI must offer more than just a statistical ratio.

“Whether it’s Baidu or Google, when you pose a question, they return numerous web pages for you to sift through and evaluate yourself. ChatGPT, however, is different; it acts like an evolved search engine, providing you with a single, definitive answer,” Wang Shi, CTO of Huimei Technology, told VCBeat. “This is its advantage, but also a potential risk for its practical implementation.”

The CDSS currently used in hospitals mainly consists of three core components: human-computer interaction, inference engine, and knowledge base. Machines leverage NLP to understand doctors' inputs, addressing interactive issues in the process. This does not involve truly replacing doctors' decision-making with AI. This is not because AI cannot surpass doctors in certain specific scenarios, but rather because AI cannot be held accountable for any potential errors.

Wang Shi stated, “We are witnessing the development of smart healthcare. Particularly between 2018 and 2020, the National Health Commission successively introduced policies such as the Electronic Medical Record (EMR) grading system, the Interconnectivity grading system, and the Smart Hospital grading system. These initiatives aimed to promote comprehensive digital transformation and upgrading in medical institutions through evaluation-driven construction. In this process, many emerging technologies have been applied. Among them, Clinical Decision Support Systems (CDSS), as one of the core components of high-level assessments, are subject to stringent requirements regarding their development mechanisms—namely, they must be based on evidence-based medicine.”

Therefore, the prompts and recommendations provided by Clinical Decision Support Systems (CDSS) are designed to assist physicians in decision-making by integrating guideline references, all while adhering to clinical practice standards. In contrast, while ChatGPT may sometimes provide superior answers to certain questions, it cannot cite sources to substantiate its responses, nor can it be held accountable for potential errors. Furthermore, no physician is willing to bear the liability for mistakes made by algorithms.

This poses a critical test for the practical implementation of ChatGPT. Similar to IBM Watson in its day, ChatGPT’s disruptive potential lies in its ability to make decisions like a physician, whereas physicians prefer AI to handle information processing tasks within its scope while retaining decision-making authority themselves.

Cost: The Key Constraint on ChatGPT

Judging from the development trajectories of CNNs and NLP, technology developers have always been able to make trade-offs in applications to ensure that the final products meet market demands. Therefore, if one focuses wholeheartedly on developing medical applications based on LLM technology, achieving results is inevitable. However, for developers, not every startup can afford to invest massive amounts of capital in model training like OpenAI does.

Public data shows that OpenAI’s previously released large language model (LLM), GPT-3, has 175 billion parameters, with corresponding training costs reaching as high as $12 million (approximately $1.4 million per single run). Estimates of ChatGPT’s training costs vary, but they are generally believed to fall within the range of $2 million to $12 million.

For niche vertical sectors such as healthcare that seek to develop similar models, it is imperative to first possess a foundation model comparable to GPT. Only then can substantial time, effort, and capital be invested in long-term, continuous computational and data training of the foundation model to create new models. Meeting these prerequisites requires resources available only to enterprises in China on the scale of BAT (Baidu, Alibaba, and Tencent).

Meanwhile, given the exorbitant training costs, even large enterprises are unable to make targeted adjustments to models that have already been trained. If a model of ChatGPT’s scale goes astray in its exploration of the healthcare sector, researchers seeking to further unlock the potential of large language models (LLMs) may have no choice but to wait for the emergence of the next generation of models.

Amid various influencing factors, the value of ChatGPT and other large language models (LLMs) in clinical medical practice may be quite limited. Focusing solely on the present, scenarios such as search-related health education and internet hospitals clearly hold greater potential. By stepping away from clinical settings, ChatGPT’s unique capabilities may open up new avenues for growth in these areas.

Overall, discussions regarding the clinical application of ChatGPT may be somewhat disappointing. ChatGPT was not designed specifically for healthcare, and AI systems based on it are unlikely to integrate into clinical workflows as deeply as well-established AI tools for computer-aided diagnosis and treatment that have been refined over many years.

However, in the long run, LLMs still possess the capability to disrupt existing AI paradigms. If they can integrate multimodal medical data—including electronic health records, medical imaging, and genomics—to build comprehensive analytical capabilities, they will undoubtedly break through the current bottlenecks facing AI and redefine its value.