Lancet Digital Health Publishes Multicenter COVID-19 Study Led by Wuhan Tongji Hospital Validating Deep Learning Model Across Multiple Clinical Scenarios

Sep 27, 2020 10:58 CST Updated 10:58

Infervision

Artificial Intelligence Product Developer

To this day, the COVID-19 pandemic continues to rage globally.

Although the epidemic in China has been effectively contained, Chinese healthcare workers continue to conduct in-depth research on coronavirus disease 2019 (COVID-19), aiming to contribute to the global response to the pandemic. In particular, large-scale, multicenter retrospective studies are of significant value for further understanding the virus, improving diagnosis and treatment, and even preventing a second wave of outbreaks.

Tongji Hospital Affiliated to Tongji Medical College of Huazhong University of Science and Technology (hereinafter referred to as Wuhan Tongji Hospital), once located at the epicenter of the outbreak, not only operated three campuses to receive patients but also took over the Wuhan Optics Valley Fangcang Shelter Hospital, playing a pivotal role in the battle against the epidemic in Wuhan. Throughout this process, physicians at Tongji Hospital continuously leveraged data accumulated from frontline clinical practice to conduct in-depth research on COVID-19.

On September 23, a multicenter retrospective study on coronavirus disease 2019 (COVID-19), jointly conducted by Wuhan Tongji Hospital and several other frontline hospitals in China, was published in the new journal The Lancet Digital Health, a premier global medical publication. The paper systematically investigated the value and role of deep learning models from multiple perspectives, including COVID-19 diagnosis, clinical triage efficiency, disease monitoring, and management of patients with mild symptoms or asymptomatic infection, thereby providing important references for the global response to the pandemic.

This paper was jointly completed by Infervision and several major Chinese hospitals that played pivotal roles in the fight against the epidemic, including Tongji Hospital of Wuhan, Tianyou Hospital Affiliated to Wuhan University of Science and Technology, Xianning Central Hospital, The Second Xiangya Hospital of Central South University, and Shenzhen Third People’s Hospital. Tongji Hospital of Wuhan, Infervision, and The Second Xiangya Hospital are co-first authors, with Wang Wei, President of Tongji Hospital of Wuhan, serving as the corresponding author. This study represents one of the few large-scale, multi-center clinical studies in China based on deep learning models for COVID-19, further validating the value and role of deep learning models during the pandemic.

This study proposes a deep learning model based on CT imaging to triage patients with suspected COVID-19 and automatically perform quantitative analysis of lesions in confirmed cases. Leveraging the U-Net deep learning algorithm, the model simultaneously achieves two research objectives: triaging patients with suspected COVID-19 and analyzing disease progression in confirmed cases.

The research team first collected 2,447 chest CT images (1,647 RT-PCR confirmed positive and 800 RT-PCR confirmed negative) for model training, and 639 chest CT images (439 RT-PCR confirmed positive and 200 RT-PCR confirmed negative) for internal model validation. The AUC, sensitivity, and specificity of the model on the internal validation set were 0.985 (95% CI 0.982–0.989), 0.973 (0.966–0.980), and 0.850 (0.827–0.875), respectively.

Second, the research team consecutively collected 1,097, 820, and 203 sets of chest CT images from patients visiting fever clinics at Tianyou Hospital Affiliated to Wuhan University of Science and Technology (Wuhan; incidence rate approximately 0.566%), Xianning Central Hospital (Xianning; incidence rate approximately 0.034%), and the Second Xiangya Hospital of Central South University (Changsha; incidence rate approximately 0.003%) as an external validation set. Using the patients’ CT imaging reports as the reference standard, the model’s AUC, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were 0.953 (0.949–0.959), 0.923 (0.914–0.932), 0.851 (0.842–0.860), 0.790 (0.777–0.803), and 0.948 (0.941–0.954), respectively.

Furthermore, the research team simulated the integration of AI into clinical workflows to evaluate its triage efficiency. Directly alerting senior physicians or clinicians to imaging findings significantly reduced the time required to generate imaging reports for positive cases (p<0.0001). The median (interquartile range) time reductions achieved by reporting results to senior physicians and clinicians were 15.73 (11.05–25.25) minutes and 22.62 (15.12–38.63) minutes, respectively.

To validate the model’s performance in patients with mild or even asymptomatic disease, the research team additionally collected chest CT images from 761 mild or asymptomatic patients treated at cabin hospitals. Among these, 618 (81%) patients exhibited COVID-19–related imaging findings, yielding a model sensitivity of 0.886 (0.873–0.898). The study also collected 686 chest CT images obtained before the outbreak from Tianyou Hospital and Shenzhen Third People’s Hospital to evaluate the model’s performance in non-COVID-19 confounding cases, achieving a specificity of 0.822 (0.808–0.836).

This study collected multiple datasets from real-world clinical scenarios to validate the model’s performance in triaging suspected COVID-19 patients and analyzing lesions. The results demonstrated that the model exhibits robustness and stable performance across various datasets. Furthermore, the research team integrated the deep learning model into traditional clinical workflows to accelerate the triage of suspected cases, with its automated lesion analysis functionality holding significant importance for monitoring and managing the condition of COVID-19 patients.

Finally, the study also evaluated the model’s performance in assessing disease progression based on CT images, using the assessments of three radiologists as the reference standard. The model achieved a sensitivity of 0.962 (95% CI: 0.947–1.000) and a specificity of 0.875 (95% CI: 0.833–0.923), with substantial agreement between the model’s results and the reference standard (Kappa = 0.839; 95% CI: 0.718–0.940).