NVIDIA Clara: Democratizing Medical AI Development with an End-to-End Platform for Imaging and Genomics

Apr 03, 2020 15:00 CST Updated 15:00

NVIDIA

Artificial Intelligence Computing Service Provider

In 2003, when the SARS outbreak caused by a coronavirus swept across Asia, the internet was still in its infancy and had not yet been implemented in healthcare settings. People could only go to hospitals and wait for treatment. At that time, radiology departments were always crowded, and many patients became infected due to gathering in hospitals.

This year, a new coronavirus with heightened transmissibility has swept across the globe. Yet in less than two months, China contained its spread beyond Hubei Province. With AI assistance, diagnostic efficiency in radiology departments within epidemic areas rose rapidly, and patient crowding was significantly alleviated. Meanwhile, new drug development companies leveraged innovative gene analysis methods to explore the virus’s RNA structure, enabling them to screen editable structured data for more than 1,000 nucleoside/nucleotide inhibitor compounds targeting RNA polymerase within just a few days.

As observed during the pandemic, technologies such as AI-assisted diagnosis and genomic analysis are attracting an increasing number of researchers due to their vast potential. However, given the high barriers to entry within the industry, many experienced physicians and scholars inevitably encounter setbacks in their research endeavors.

Today, to enable more developers to experience NVIDIA’s computing power support and edit data in a simpler manner, NVIDIA has developed the Clara framework on top of EGX, DGX, and cloud computing services. This framework provides researchers with services such as federated learning and transfer learning, lowering data barriers and allowing researchers to refocus on their core research efforts.

What is Clara

NVIDIA officially launched the Clara platform at RSNA 2018. At that time, NVIDIA’s objective was solely to provide medical imaging AI researchers with a software development toolkit for medical imaging, aiming to standardize imaging data and accelerate AI training.

Subsequently, NVIDIA developers recognized that the genome represents a far more massive data source. To process hundreds of millions of base pairs, it is essential to identify more optimal computing power sources to ensure the cost-effectiveness of experiments. At the GTC conference held in Suzhou in 2019, genomics had become another key frontier for Clara to tackle.

As of today, NVIDIA Clara is positioned as an intelligent computing software platform designed for healthcare developers. Moving forward, all healthcare industry solutions will be integrated into Clara, as NVIDIA aims to create a comprehensive “healthcare Swiss Army knife,” providing efficient and user-friendly data analytics tools for pioneers eager to explore the healthcare sector.

截屏2020-04-02 下午5.42.08.png

NVIDIA Clara

Overall, NVIDIA Clara comprises GPU-accelerated libraries, three SDKs, and a suite of reference applications. At this stage, the services Clara offers to researchers are primarily focused on medical imaging and genomic analysis—two fields that are experiencing rapid development driven by the wave of big data.

Medical Imaging Applications: Using AI to Help Developers Annotate 3D Images

In medical imaging, if a hospital or enterprise wishes to leverage AI technology to build a deep learning algorithm and deploy it for clinical use, four steps must be completed.

Data is the first step in building AI. After obtaining raw data, researchers must first perform specific annotations on the data, such as labeling nodules in lung CT images. This step is crucial for developing a high-performing AI algorithm.

Once annotated data is available, researchers need to import the data into selected AI models to develop deep learning algorithms tailored to their specific needs. In China, many researchers typically engage in secondary development of open-source algorithms or leverage transfer learning by adapting algorithms that have demonstrated strong performance in other fields.

Step 3 involves validating the existing algorithms using a test set. Researchers need to deploy the AI model in real-world scenarios to observe its actual performance. If the algorithm fails to adequately meet the requirements of the test set, researchers may need to adjust the algorithm’s parameters and retrain the model from scratch.

Once researchers obtain algorithms that perform well in testing, they can proceed to deploy them on edge devices for inference in real-world medical diagnostic settings. At this point, the AI development process is essentially complete.

NVIDIA developed the Clara AI application platform to standardize and simplify the aforementioned four steps, enabling researchers to focus more intently on medical research itself.

Taking data annotation as an example, this task is essentially labor-intensive, representing the inevitable repetitive work involved in transitioning from “manual” to “intelligent” processes. Physicians cannot afford to devote excessive time to connecting pixels. Therefore, AI companies typically recruit recent graduate students from hospitals to perform segmentation. The cost ranges from 20 to 30 RMB per dataset. Interns usually require 20 to 40 minutes to process a single set of low-level data; if more precise segmentation is required, the time commitment increases to 1–2 hours.

This data acquisition method presents two significant challenges. First, the large volume of data required for artificial intelligence training makes it difficult for enterprises to recruit enough interns for annotation, resulting in prohibitively high costs. Second, image annotation typically demands stringent qualifications; consequently, interns often commit errors such as missing nodules or incorrect labeling.

To address this need, NVIDIA has integrated the AI Annotation Server, a deep learning–assisted annotation component, into the Clara Train SDK, enabling developers to directly use this component for annotating medical images.

NVIDIA’s experimental data show that after applying this toolkit, the annotation time for a single pulmonary nodule can be reduced to 8–15 minutes, and physicians’ annotation efficiency can be improved by 4–8 times. Furthermore, rough estimates indicate that the annotation speed for the pancreas can be increased by 4 times, and that for the spleen by 10 times.

Furthermore, accelerated by DGX, NVIDIA can reduce computational tasks that previously took weeks to just hours, significantly lowering the trial-and-error costs for medical AI companies and even enabling them to conduct multiple algorithm tests simultaneously. As a result, the output and even the development of artificial intelligence will be greatly accelerated.

Medical Imaging Applications: Using Federated Learning to Overcome the Robustness Challenge in AI Models

Even if we resolve the challenges in AI development, it does not guarantee that AI will become practically viable. In the aforementioned steps, we have overlooked a critical characteristic of data in the medical field—security. Given that medical data is closely linked to life and health information, we can only discuss its application under the premise of ensuring data security.

This means that when enterprises or physicians train AI models, they cannot remove data from the hospital; however, mature AI algorithms typically need to overcome geographic variations by conducting multicenter trials. If an AI model lacks robustness, it loses its value for clinical application.

Although data cannot leave the hospital, models can. Therefore, can we directly integrate these models? The answer is yes. Federated learning involves aggregating models trained by multiple participating institutions using their respective data, thereby achieving model unification without data leaving the hospital and ultimately enhancing the robustness of AI models.

截屏2020-04-02 下午5.42.22.png

Comparison of Federated Learning and Centralized Training Results

Through a federated learning experiment involving 13 user groups, NVIDIA obtained the results shown in the figure above. The red line in the chart depicts the model accuracy curve for training conducted in data center mode, while the green line represents the model accuracy curve after aggregating 13 models under federated learning. It can be observed that as the number of training iterations increases, the two curves closely overlap, which to some extent demonstrates the viability of federated learning.

NVIDIA Engineers Explain Federated Learning at GTC 2019

However, new challenges continue to emerge. If the models vary significantly, how can federated learning autonomously “refine and optimize” them? “Incremental learning” will be NVIDIA’s next key research focus.

In addition to the Clara Train SDK, NVIDIA has also developed the Clara Deploy SDK to optimize existing PACS workflows. Meanwhile, by leveraging the Clara Deploy SDK, physicians can flexibly and rapidly deploy medical imaging AI models in clinical settings.

Genomics: Data Analysis and AI Architecture

Revisiting Genomics, NVIDIA’s Recent Focus. Since the first human genome was sequenced in 2003, the cost of whole-genome sequencing has been declining at a rate far exceeding predictions based on Moore’s Law. From newborn genome sequencing to the launch of national population genomics initiatives, the field is flourishing and becoming increasingly personalized.

Advances in sequencing technology have triggered an explosive growth in genomic data. The total volume of sequence data doubles every seven months. This astonishing rate could result in the amount of data generated by genomics by 2025 being more than ten times the combined volume of other big data sources, such as astronomy, Twitter, and YouTube.

Various new sequencing systems, such as BGI Group’s DNBSEQ-T7 from the world’s largest genomics research organization, are driving the widespread adoption of this technology. The system can generate up to 60 genomes per day, producing 1–6 Tb of high-quality data.

Leveraging the development of BGI Group’s flow cell technology and the acceleration provided by a pair of NVIDIA V100 Tensor Core GPUs, the DNBSEQ-T7 has achieved a 50-fold increase in sequencing speed, making it the highest-throughput genome sequencer to date.

But the acceleration of sequencing is far from over; scientists have put forward new demands as they observe the more microscopic world. To meet such demands, NVIDIA is also continuously exploring.

To address the growing scale and complexity of genomic sequencing and analysis through accelerated and intelligent computing, NVIDIA created Clara Genomics.

Through the Clara Genomics Analysis SDK within the Clara framework, researchers can accelerate genomic data reading and sequence alignment, thereby reducing analysis costs and improving data quality.

截屏2020-04-02 下午5.42.33.png

Clara Genomics

Furthermore, NVIDIA acquired Parabricks, a CUDA-accelerated toolkit for genomic data analysis that enables variant discovery and yields results consistent with the industry-standard GATK Best Practices pipeline. By leveraging this toolkit, computational performance can be improved by 30- to 50-fold, while also facilitating deep learning-based detection of genetic variants.

截屏2020-04-02 下午5.44.34.png

NVIDIA Parabricks: GPU-Accelerated GATK

Through collaboration with BGI Group, Parabricks’ software can complete whole-genome analysis within one hour. By utilizing a server equipped with eight NVIDIA T4 Tensor Core GPUs, BGI Group has demonstrated that genomic sequencing costs can be reduced to $2 by increasing throughput—less than half the cost of existing systems.

NVIDIA Offers Parabricks Free of Charge to COVID-19 Researchers; Click Here to Learn More.

The Future of AI Must Be Co-Created by All

As deep learning permeates an increasing number of fields, NVIDIA not only provides computational power support to a vast community of developers but also establishes a robust “foundation.” Built upon this foundation, developers can devote more energy to the exploration of knowledge rather than being constrained by the complexities of utilizing data analysis tools.

Today, Clara has amassed a wide array of partners. In China, United Imaging Healthcare and Ande Medical Intelligence have both adopted this platform for collaborative development. Overseas, Thermo Fisher Scientific, Canon Vital, and Johns Hopkins University have also established in-depth partnerships with NVIDIA.

So, do you want to help artificial intelligence advance faster and go further? By filling out the form in the link, let us join NVIDIA in pioneering a new era of supercomputing.

Click here or scan the QR code to register and watch more in-depth videos on NVIDIA healthcare technologies.