Google Leverages Deep Learning to Aid Pathologists in Cancer Detection with 89% Accuracy

Mar 07, 2017 11:34 CST Updated 11:34

Article by Martin Stumpe, Google’s Technical Lead, and Product Manager Lily Peng; translated and compiled by VCBeat.

Pathology report issued by a pathologist after examining the patient’s biological tissue samples,It is often the gold standard for diagnosing many diseases.. In particular, for cancer, the diagnosis by pathologists is of great significance to patients. However, the review of pathological slides is a highly complex task that requires years of training to acquire the necessary expertise and experience; consequently, the number of pathologists falls far short of meeting the demand.

Even among rigorously trained pathologists, diagnostic discrepancies for the same patient exist, and such variability is a significant contributor to misdiagnosis. For instance, inter-rater agreement among physicians in diagnosing certain forms of breast and prostate cancer can be as low as 48%. This lack of concordance is not surprising, given that accurate diagnosis requires access to extensive diagnostic information.

Typically, pathologists are responsible for examining all biological tissues visible on pathological slides. However, each patient may have numerous slides, and at 40x magnification, each slide contains over 10 billion pixels (10+ gigapixels). Imagine having to review more than 1,000 high-resolution images while being accountable for every single pixel. This requires processing vast amounts of data, yet physicians’ time is often insufficient.

To address the challenges of limited time and diagnostic accuracy, we are investigating the application of deep learning in digital pathology by developing an automated detection algorithm to assist pathologists. We trained the algorithm using images provided by the Radboud University Medical Center, which were also used in the 2016 ISBI Camelyon Challenge.This algorithm is optimized to determine whether breast cancer has metastasized to the lymph nodes or extended to the adjacent breast tissue.。

So, what were the results? Standard “off-the-shelf” deep learning methods, such as Inception (also known as GoogLeNet), performed quite well on both tasks, although the resulting heatmaps of tumor probability predictions were somewhat noisy. With additional customization, including training the neural network to experiment with images at different magnifications (much like pathologists do), these models can be used to examine the images.

Left: Images from two lymph node biopsies. Middle: Results of prior deep learning-based tumor detection. Right: Current results, showing a reduction in noise (potential false positives) between the two.

In fact, the prediction heatmaps generated by this algorithm have improved significantly, achieving a localization score (FROC) of 89%, which substantially surpasses the 73% diagnostic score achieved by pathologists without time constraints. We are not the only team to observe such promising results; other teams have also achieved scores as high as 81% on the same dataset.

Even more excitingly, our model performs exceptionally well, even on images acquired from different hospitals using different scanners. For details, please refer to our paper “Detecting Cancer Metastases on Gigapixel Pathology Images.”

Close-up of lymph node biopsy. The tissue contains breast cancer metastases as well as macrophages, which resemble tumor cells but are benign normal tissue. Our algorithm successfully identifies the tumor regions (bright green) without being confused by macrophages.

>>>>

The Revolution Is Not Yet Complete: Key Points to Keep in Mind

Like most metrics, the FROC localization score is not perfect.. The FROC score is defined as the sensitivity (percentage of tumors detected) at several preset average false positives per slide. Pathologists rarely produce false positives (misclassifying normal cells as tumors). For example, the aforementioned 73% score corresponds to 73% sensitivity with zero false positives. In contrast, the sensitivity of our algorithm increases when more false positives are allowed. At eight false positives per slide, our algorithm achieves 92% sensitivity.

These algorithms performed well during previous training but lack the breadth of knowledge and experience of human pathologists—For example, these algorithms lack the ability to identify other abnormalities that the model has not been extensively trained to classify (such as inflammatory processes, autoimmune diseases, or other types of cancer).

To ensure optimal clinical outcomes, these algorithms need to complement pathologists’ workflows and be continuously refined.. We envision that algorithms such as ours can improve the efficiency and consistency of pathologists. For example, pathologists can reduce their false-negative rate (the percentage of undetected tumors) by examining the top-ranked predicted tumor regions, including up to eight false-positive regions per slide. Another example is that these algorithms enable pathologists to measure tumor size simply and accurately, a factor associated with prognosis.

Training models represent only the first step in translating research findings into real-world products. From clinical validation to regulatory approval, we still have a long road ahead. However, we have taken the initial step, and by sharing our work, we hope to accelerate progress in this field.

Source: https://research.googleblog.com/2017/03/assisting-pathologists-in-detecting.html