AI Milestone! NMPA Releases Review Guidelines and Application Process for Class III AI Medical Devices

Dec 26, 2018 08:00 CST Updated 08:00

Yesterday, the “Public Welfare Training on Registration and Application for AI-Based Medical Devices” quietly kicked off in Beijing. Although the conference lasted only one afternoon, it revealed key approval information for three categories of medical devices that are of common concern in the medical AI sector. Does this mark the beginning of a milestone victory?

At the conference, the National Medical Products Administration (NMPA) conducted a meticulous analysis of every stage in the approval process for medical artificial intelligence devices, providing detailed explanations for each metric. This event was truly an “AI Medical Device Registration and Submission Training.”

Meanwhile, it was announced at the conference that as of the end of November 2018, the National Medical Products Administration (NMPA) had received 1,054 applications for special examination and approval of innovative medical devices, with 192 approved for review under the special procedure. Fifty-one innovative medical devices had been granted market authorization through this special pathway. Regrettably, no information was disclosed at the conference regarding the approval status of AI-related products.

According to informed sources, the National Medical Products Administration (NMPA) has clarified its approach to the entire AI approval process, and the approval channel opened in mid-December. However, under these stringent standards, no company has yet submitted an application for Class III AI medical devices. Nevertheless, with the standards now in place, it is only a matter of time before submissions begin.

Here, VCBeat has summarized the entire conference content in an effort to help medical AI practitioners clarify the regulatory approval framework and key considerations of the National Medical Products Administration (NMPA). The summary primarily covers the following four aspects:

I. Approval Process;

II.Interpretation of Key Approval Points;

3. Clinical Trial Design and Considerations for Pulmonary Nodules and Diabetic Retinopathy;

IV. Application Materials and Other Information

Part I: Overall Mindset, Principles, and Process for Approval

Medical device registration is an administrative licensing system whereby the food and drug regulatory authority, upon application by a medical device registration applicant and in accordance with statutory procedures, conducts a systematic evaluation of the safety and efficacy studies and their results for the medical device intended for market launch, so as to determine whether to approve the application.

From a holistic perspective, the regulatory submission of medical devices is based on classified management and determined by risk levels to establish specific requirements for registration and filing. The detailed submission process is illustrated in the figure below:

屏幕快照 2018-12-25 下午7.21.40.png

屏幕快照 2018-12-25 下午7.22.10.png

Flowchart of Medical Device Approval by the National Medical Products Administration

Part II: Interpretation of Key Approval Points

Interpretation of Key Approval Criteria was the core focus of this conference. The event provided a systematic explanation of the challenges faced by medical AI products during the regulatory approval process, including issues related to databases, data security, software updates, product applicability, and cloud computing services. It covered every element of AI-based products, enabling researchers to even establish clear scoring metrics based on these criteria.

Scope of Application

AI products can be classified based on the following three factors according to their scope of application:

Deep Learning-Assisted Decision-Making Medical Device Software:

Including medical device data, deep learning, clinical decision support, and medical device software.

Software Type:

It can be categorized into AI standalone software (AI software that is itself a medical device)

AI Software Components (AI Software Embedded in Medical Devices).

Intended Use of the Software:

Clinical Decision Support;

Auxiliary screening, identification, diagnosis, treatment, etc.—non-assistive decision-making;

Pre-processing, Process Optimization, and Routine Post-processing.

Risk Considerations

Risk assessment involves evaluating the risks associated with the use of AI products, with the aim of mitigating these risks and enhancing the reliability of AI products. This primarily includes considerations of the following two major categories of factors.

1. Clinical Use Risks

False Positives: Misdiagnosis and Risk of Overtreatment.

False Negative: Missed Diagnosis, Risk of Rapidly Progressive Disease.

Imported Software: Differences Between China and Foreign Countries (Ethnicity, Epidemiology, Clinical Diagnosis and Treatment Guidelines).

2. Risk Management Activities

Elements: Intended Use (target disease, clinical application, importance, urgency), Usage Scenario (target patient population and users, setting, clinical workflow), Core Functions (object of processing, function type).

Measures: Design, Protection, Warning.

Requirements:Throughout the entire software lifecycle.

Requirements Analysis

Requirements analysis is guided by clinical needs and usage risks, integrating intended use, usage scenarios, and core functionalities, while comprehensively considering regulatory, standards, user, product, data, functional, performance, interface, user interface, and cybersecurity requirements. In this context, enterprises should focus on the following:

1. Data Collection: Epidemiological characteristics of the target disease, such as disease composition (classification, grading, staging), population distribution (health status, gender, age), statistical indicators (prevalence, cure rate), complications, and similar diseases.

2. Algorithm Performance: False Positive and False Negative Metrics, Repeatability and Reproducibility, Robustness

3. Clinical Use Limitations: Scenarios of Contraindications and Cautious Use

Software Validation

Software validation is the process of providing objective evidence to confirm that the software meets user needs and intended purposes, including a series of activities such as software validation testing (user testing), clinical evaluation (if applicable), and reviews.

Clinical evaluation is the primary method for validating this type of software, primarily encompassing two principles.

1. Software Guidance Principles: Clinical Evaluation Data Based on Clinical Trials,i.e., clinical trial data for the software, or clinical trial data for equivalent products that are substantially equivalent to the software’s core algorithms.

2. Imported Software: Assessing Differences Between China and Foreign Countries,If significant differences exist, clinical trials should be conducted in China.; and the use of overseas clinical trial data shall meet the requirements of the corresponding guidelines.

Clinical Trials

Clinical trials must be designed as diagnostic studies based on the software’s intended use, usage scenarios, and core functions, with the following four key elements:

1. Trial Design: It is recommended to prioritize a non-inferiority controlled design using products of the same type or clinical reference standards; alternatively, a superiority controlled design comparing user decision-making assisted by software versus user decision-making alone may be selected. The determination of non-inferiority or superiority margins must be supported by sufficient clinical evidence.

2. Observation Indicators: Sensitivity, specificity, and ROC/AUC are used as the primary indicators; other metrics such as time efficiency may also be selected as evaluation criteria.

3. Inclusion and Exclusion Criteria: Based on the Epidemiological Characteristics of the Target Disease

4. Source Institutions: Distinct from the primary institutions in the training dataset, with as wide a geographic distribution as possible and as large a number of institutions as feasible.

Retrospective Study

To encourage innovation and reduce the cost of clinical trials,Clinical trials may utilize retrospective data,However, bias issues should be considered and strictly controlled during the design phase. In principle, the study should include concurrent data from multiple clinical institutions in different regions (excluding those that are the primary sources of the training data).

Usage Principles (Risk-Based); see the Software Guidelines for details on software safety level determination:

For high-risk software: Supplement with pilot clinical studies or clinical trials.

For medium-risk subcomponents: clinical pilot studies or alternative clinical trials.

Basic Principles of Software Updates

Software updates should consider the impact (both positive and negative) on software safety and effectiveness,Note: Software updates are one of the leading causes of software recalls.

Regulatory authorities will oversee updates of the following degrees:

Major Software Update: Change in Licensing Terms.

Minor Software Update: Quality System Control, No Application for Registration Change Required.

Software Version Naming Convention:

Clearly define and distinguish between major and minor software updates, where major software updates shall enumerate all typical scenarios, covering both algorithm-driven and data-driven software updates.

Key Highlights of Major Software Updates

Common update types include algorithm-driven updates and data-driven updates. Algorithm-driven updates involve changes to the software’s algorithms, algorithmic architecture, algorithmic workflows, frameworks used, as well as inputs and outputs. Data-driven updates refer to software updates prompted solely by an increase in the volume of training data.

For major software updates, the determination shall adhere to the following principles:

1. Algorithm-driven software updates are generally classified as major software updates.

2. Data-driven software updates that result in significant changes to algorithm evaluation outcomes (compared with the previous registration) are classified as major software updates.

3. For the criteria for determining other types of major software updates, please refer to the Software Guidelines and Cybersecurity Guidelines.

Validation and Verification

Regardless of the type of software update, verification and validation activities appropriate to the type, content, and extent of the update shall be conducted in accordance with quality management system requirements. For both algorithm-driven and data-driven software updates, re-evaluation of algorithm performance and clinical re-assessment shall be performed.

Among them, clinical re-evaluation (risk-based) includes:

High-Risk Software: Changes to the intended use shall require clinical trials; in other cases, retrospective studies may generally be used.

Low- to Moderate-Risk Software: Retrospective Studies May Be Used.

Expansion of Scope of Application

As required, requirements analysis, data collection (if applicable), algorithm design, and software validation shall be conducted for all AI software functions; each AI software function shall undergo these activities independently.

For deep learning-based non-assistive decision-making software, the following steps must be followed:

1. Pre-processing: Algorithm performance evaluation, clinical evaluation.

2. Process Optimization: Algorithm Performance Evaluation.

3. Routine Post-processing: Algorithm Performance Evaluation and, When Necessary, Clinical Evaluation.

Third-Party Database

Third-party databases are considered a special form of retrospective study and can be used for algorithm performance evaluation, but they may not fully meet the requirements for software validation.

Third-party database types include evaluation databases and non-evaluation databases. Evaluation databases can be used for software validation, while non-evaluation databases (such as public databases) cannot be used for software validation.

Evaluation Database

The assessment database must meet requirements for network and data security, scalability, and other factors. The specific requirements are as follows:

Authoritativeness: Data annotation should be conducted by appropriately accredited clinical institutions.

Scientific Rigor: Sample size and sample distribution should meet statistical requirements.

Standardization: Data governance should establish quality control procedures that are traceable.

Diversity: Data should be sourced from multiple clinical institutions.

Closed System: The system shall be managed as a closed loop, with the total sample volume significantly exceeding the volume required for a single test.

Dynamism: A certain proportion of data should be replaced on a regular basis.

Network and Data Security Process Control

Whether before or after a company’s initial public offering, in addition to strengthening the cybersecurity capabilities of the software itself, the company should also address network and data security process control requirements throughout the entire software lifecycle.

Key Considerations Include: De-identified Data Transfer, Closed and Open Network Environments, Data Interface Compatibility, and Data Backup and Recovery.

Cloud Computing Services and Mobile Computing Terminals

Cloud computing services shall clearly define the service model, deployment model, core functions, data interfaces, network security capabilities, and service level agreements (SLAs).

Mobile computing terminals shall define performance indicator requirements based on the terminal type, characteristics, and usage risks. For details, refer to the Guiding Principles for Mobile Medical Devices and the Guiding Principles for Cybersecurity.

Scope of Application

The scope of application for standalone AI software includes the following scenarios:

1. Clarify the intended use, usage scenarios, and core functions.

2. Including but not limited to the intended subject, target disease, clinical indication, patient population, intended users, setting of use, requirements for data acquisition devices (if applicable), and clinical use limitations (if applicable).

The scope of application for AI software components may refer to the requirements for standalone AI software and should be reflected in the product’s scope of application.

Research Materials

Research documentation includes the software description document, cybersecurity description document, and software version naming convention.

The software description document requires that the core algorithm section provide corresponding algorithm research data in conjunction with these review points, as well as comparative analysis data on algorithm performance evaluation results from test sets, public databases, benchmark databases, retrospective studies, and clinical trials.

Other research materials shall include documentation on process controls for cybersecurity and data security, as well as basic information (e.g., name, creator, data volume, and data distribution) and usage details (e.g., volume of use, data distribution, proportion, and qualifications) of third-party databases (including those used for assessment purposes and publicly available ones).

Instructions for Use

The decision-support software shall clearly specify the scope of application, clinical use limitations, precautions, user training requirements, data acquisition equipment specifications, standard operating procedures for data acquisition, inputs and outputs, summary of algorithm performance evaluation (basic information on the test dataset, evaluation metrics and results), and summary of clinical evaluation (basic information on clinical data, evaluation metrics and results).

In addition to the aforementioned content, deep learning-assisted decision-making software shall also include a summary of algorithm training information (basic information on the training dataset, training metrics, and results).

屏幕快照 2018-12-25 下午9.12.22.png

Part III (Upper): Clinical Trial Design for Pulmonary Nodules and Considerations

“Clinical Trial Design and Considerations for Pulmonary Nodules” was prepared by Director Liu Lunxu of West China Hospital, Sichuan University. Pulmonary nodules have long been a core research focus in the field of medical artificial intelligence (AI), with discussions on their approval standards having continued for nearly a year. However, the regulatory approval of AI-based solutions is inherently an extremely complex and rigorous endeavor. At the conference, Director Liu Lunxu summarized clinical issues in the surgical diagnosis and treatment of pulmonary diseases, aiming to establish a robust research design framework.

Based on several practical clinical issues currently existing in the field of thoracic surgery, researchers should construct an intelligent system for the diagnosis and treatment of lung cancer in thoracic surgery from three aspects: judgment of surgical indications, selection of surgical procedures, and prediction models for postoperative prognosis.

Indications for Surgery Include:

1. Intelligent localization and qualitative identification of pulmonary nodules;

2. Intelligent Localization and Qualitative Identification of Mediastinal Window Lymph Nodes;

3. Extraction of GGO Image Features and Correlation Analysis with Stages of Adenocarcinogenesis;

4. Identification and Differentiation of Imaging Spectrum Characteristics of Multiple Primary Cancers and Intrathoracic Metastases.

Selection of Surgical Approach:

1. Surgical Approach Selection for Pulmonary Segment Visualization and Small Nodule Localization Based on Image Segmentation and 3D Reconstruction

2. Precise Prediction of Lymph Node Metastasis Based on Neural Networks

3. Prediction of STAS and Micropapillary Components in Small Pulmonary Nodules Based on Chest Imaging Radiomics

Postoperative Prognosis Prediction

1. Construction of a Postoperative Complication Prediction Model Based on Neural Networks

2. Construction of Prognostic Prediction Models for Postoperative Recurrence and Metastasis Patterns of Lung Cancer Based on Multiple Data Types

3. Imaging Data-Driven Analysis of Gene Mutations and Immune Checkpoint Alterations

4. Unknown (due to missing PPT)

Construction of a Neural Network-Based Model for Predicting Postoperative Complications: An official study investigated 8,465 lung cancer patients who underwent surgical treatment, among whom 1,453 experienced postoperative complications. From this cohort, 250 patients with postoperative complications and 250 without were randomly selected to form the test set, while the remaining 7,965 cases constituted the training set used to train the neural network model. After data processing, the model demonstrated a dynamic recognition performance of 88.0%, with a recognition rate of 81.2%, a recall rate of 73.2%, and a precision rate of 87.14%. These figures represent ideal performance metrics currently achievable by artificial intelligence products in clinical practice.

Part III (Continued): Issues and Considerations in the Clinical Trial Design of AI-Assisted Diagnosis for Diabetic Retinopathy

“Issues and Considerations in the Design of Clinical Trials for AI-Assisted Diagnosis of Diabetic Retinopathy,” presented by Director Yu Weihong from Peking Union Medical College Hospital. Three methods for clinical trials on AI for diabetic retinopathy were discussed at the conference, two of which are included here:

1. With product efficacy as the benchmark, AI products should in practice meet the criterion of “AI > Physician”; if the emphasis is on AI’s supportive role for physicians, then the criterion becomes “Physician + AI > Physician.”

Theoretically, this is a sound method for clinical evaluation; however, the actual outcomes are heavily dependent on physicians’ expertise. Under current clinical trial frameworks, companies predominantly select tertiary Grade A hospitals, which diminishes the perceived auxiliary role of AI. Given the uneven proficiency among physicians at primary care institutions and the difficulty in establishing uniform standards, the primary application scenarios for AI lie in auxiliary screening and diagnostic support at primary healthcare facilities and health examination centers.

2. Using a single-group target value as the reference, the primary objective is to assess whether the AI product’s performance aligns with its claimed performance; this approach mirrors the clinical trial methodologies employed by FDA-approved IDX-DR products. Compared with effectiveness-based assessments, this method is less susceptible to human bias and offers greater objectivity.

In this category of AI products, enterprises must implement strict data controls while accounting for a variety of scenarios. These include differences in workflows for auxiliary screening, auxiliary diagnosis, and follow-up analysis; variations across settings such as tertiary hospitals, primary care facilities, and health examination centers; disparities in image quality under different scenarios and device models; and the need for assisted referral decisions (i.e., whether or not referral is required).

Part IV: Pre-Acceptance Consultation, Application Materials, and Other Information

Pre-acceptance Consultation, fully known as Pre-acceptance Technical Issue Consultation for Medical Device Registration, primarily covers issues related to medical device registration submissions prior to acceptance, excluding issues arising during the technical review process.

Pre-acceptance consultation applications are accepted every Friday from 1:00 PM to 4:00 PM at the Administrative Acceptance Service Hall, Dacheng Plaza, Xuanwumen West Street, Xicheng District, Beijing.

Domestic applicants shall bring: a letter of authorization issued by the applicant covering the relevant entrusted matters, a valid personal identity document, and the Consultation Registration Form of the Center for Medical Device Technical Evaluation.

Overseas applicants shall bring: the letter of authorization designating a domestic enterprise legal person as their agent, the letter of authorization issued by the domestic agent covering the relevant entrusted matters, valid personal identification documents, and the Consultation Registration Form of the Center for Medical Device Technical Evaluation.

Power of Attorney from the Applicant (consistent with the entity whose seal appears on the application form) authorizing the entrusted agent and specifying the matters to be handled (see Announcement No. 169 on Handling Administrative Licensing Matters Such as Acceptance and Collection of Approval Documents, Annex D). The agent shall carry the original and copies of their identity proof, as well as the registration and submission materials.

Summary

There is currently limited review experience regarding innovative applications for AI-based medical devices. During the meeting, the National Medical Products Administration (NMPA) provided the following recommendations:

1. Standardized names of medical device products;

2. There are clear intended uses, usage scenarios, core functions, and operating environments for the software;

3. The data used shall be sourced from clinical institutions, with the source institutions and collection requirements specified;

4. Provide relevant documentation on algorithm design, including algorithm selection and training;

5. Software validation documentation based on real-world clinical data;

6. Data and materials demonstrating that the product has significant clinical application value.

Due to the numerous approval requirements, some of the provisions may be relatively stringent for many current AI companies. These enterprises are still preparing the relevant materials. According to several leading AI firms, the current application process is characterized by intricate details and high standards. They are submitting documentation for filing purposes; however, due to the lack of precedents, it is difficult to make any further judgments regarding the subsequent outcomes.

Certainly, the rigorous oversight underscores China’s determination to develop AI. Products that successfully reach the market will inevitably be those capable withstanding scrutiny from hospitals, physicians, and patients. Meanwhile, standardized approval guidelines provide companies with clear objectives to strive for. This raises the question: who will be the first to dare to try?