The Evolution of Medical Imaging Big Data and Artificial Intelligence: From Ancient Diagnostics to AI-Powered Precision Medicine

Sep 21, 2016 08:00 CST Updated 08:00

From Traditional Chinese Medicine’s Four Diagnostic Methods to Medical Imaging

Under normal circumstances, the organs and tissues within the human body are not visible to the naked eye. In ancient times, renowned physicians such as Bian Que and Hua Tuo diagnosed the internal causes of patients’ ailments through the four diagnostic methods: inspection, auscultation and olfaction, inquiry, and palpation. This represented the most “advanced” diagnostic approach of that era.

One day in 1816, French physician René Laennec was strolling down the street when he accidentally observed several children tapping one end of a piece of wood with a large nail, while other children listened to the sound by placing their ears against the other end. This observation greatly inspired Dr. Laennec. Upon returning home, he immediately commissioned the creation of a hollow wooden tube, which became the first stethoscope in human history. Later, the stethoscope was widely adopted in the fields of cardiology and obstetrics and gynecology.

French Physician Laennec Uses a Wooden Tube for Auscultation

In the modern era, physicians no longer rely solely on stethoscopes to observe patients’ internal conditions. The advent of computed tomography (CT) in 1971 marked the formal establishment of medical imaging. With advances in medical imaging technology, the Department of Medical Imaging, which evolved from radiology, has become the most rapidly developing discipline in clinical medicine. Its scope has expanded from conventional X-ray examinations to ultrasound, radionuclide imaging, X-ray CT, magnetic resonance imaging (MRI), digital imaging, and today’s most advanced PET-CT technology. Leveraging these new technologies, physicians can more profoundly “probe” into pathological changes within the human body.

Master of Imaging Data Fusion—The PACS System

The advent of medical imaging equipment has led healthcare institutions to increasingly rely on imaging examinations for diagnosis and treatment. Traditional methods of managing medical images—such as physical films, printed images, and paper records—have accumulated over the years, creating massive storage burdens that hinder efficient retrieval and review. Incidents of lost films and records are not uncommon in hospitals. Consequently, conventional file management systems are no longer adequate to meet the demands of managing the vast volume and broad scope of medical imaging data in modern hospitals.

With the advancement of database technology and computer communications, digital image transmission and electronic film have emerged. Many hospitals have undertaken informatization reforms, and as imaging equipment has gradually transitioned to digital formats and the internet has matured, filmless radiology departments and digital hospitals have become a reality. We will provide a detailed introduction to electronic film in the next article; for now, we will not delve into the specifics.

To enable unified storage and management of informational data from diverse medical imaging devices, the PACS system emerged as the master integrator of data across various platforms.

PACS, which stands for Picture Archiving and Communication System, is primarily tasked with digitally storing vast amounts of medical imaging data generated in daily practice. This includes images from magnetic resonance imaging (MRI), computed tomography (CT), ultrasound, various X-ray machines, infrared devices, microscopes, and other equipment. These images are archived via various interfaces (analog, DICOM, and network-based). When physicians require access, the system rapidly retrieves the data, functioning like a diligent steward and serving as an effective integrator among diverse medical devices.

Schematic Diagram of PACS System Application

A complete PACS system primarily consists of three core functions: image acquisition, data transmission and storage, and image analysis and processing.

There are three primary methods for image acquisition: direct digital acquisition, video capture, and film scanning.

Regarding information storage, PACS systems employ two distinct methods for storing structured and unstructured data, respectively. Databases are used to manage structured data such as patient information, while file systems are utilized to handle unstructured data like medical images. This is akin to a passenger traveling by air: the luggage is checked into the cargo hold, while the passenger sits in the cabin; the two proceed along separate paths without interfering with each other.

Furthermore, medical imaging data files are often large in size; a standard CT scan is typically on the order of 10 MB, a chest X-ray can reach 20 MB, and cardiovascular angiography images may exceed 80 MB. Traditional storage methods generally rely on servers and optical discs, which are rigid and difficult to scale functionally. In contrast, emerging cloud computing and cloud storage technologies offer features such as rapid data retrieval, network sharing, and application scalability. Their integration with Picture Archiving and Communication Systems (PACS) represents a major future direction for medical image storage.

The underlying principle is straightforward: hospitals deploy their PACS systems on third-party cloud platforms, leveraging the platform’s distributed architecture and load-balanced cluster systems to achieve 24/7 medical image storage. The establishment of such cloud platforms also facilitates comprehensive integration across platforms and multiple terminals, including PCs and mobile devices, thereby fully realizing paperless, CD-less, and filmless medical imaging.

This entirely new model not only enhances the work efficiency and quality of every physician, but also enriches collaborative workflows among medical professionals. Furthermore, hospitals no longer need to make substantial investments in purchasing servers, thereby reducing cumbersome post-deployment maintenance and capacity expansion efforts, ultimately achieving cost savings.

OK! The issue of data storage has been resolved, but data standardization has emerged as a new challenge. Although hospitals can leverage Picture Archiving and Communication Systems (PACS) to facilitate information exchange among various medical devices, the heterogeneity of data standards across equipment from different manufacturers and disparate PACS platforms makes data acquisition and transmission extremely difficult. It is akin to individuals from different countries and linguistic backgrounds attempting to communicate—one speaks English, while the other uses a culturally specific greeting such as “Have you eaten?” Establishing unified standards for products from diverse manufacturers and regions has become the most significant obstacle.

In this regard, Americans have always been at the forefront of the times. In 1985, the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) jointly established a standard that defines the format for digital medical images and related information, as well as methods for information exchange: the Digital Imaging and Communications in Medicine standard, abbreviated as DICOM. The emergence of DICOM redefined the medical image format for clinical data exchange.

Under the DICOM standard, imaging devices provide image data in a unified format to PACS systems. For external communications, PACS systems continue to use DICOM, thereby achieving maximum standardization. In simple terms, this enables instruments from various manufacturers to utilize a common interface, much like English serves as the global lingua franca.

In 1993, DICOM successfully evolved to its third generation, known as the DICOM 3.0 standard. As medical device manufacturers from an increasing number of countries announced their support for the DICOM 3.0 standard, it gradually became the globally recognized standard in the medical imaging industry.

PACS systems were initially used primarily in radiology departments. As a core component of hospital HIS systems, they generally adhere to HL7 standards and IHE profiles when integrating into hospital information system networks. With the continuous improvement of HL7 standards and IHE profiles, PACS has evolved from simple image storage and communication among a few radiological imaging devices to enabling interoperability across all imaging equipment within hospitals and even among different hospitals. Consequently, various classifications have emerged, such as Mini PACS, departmental-level PACS, hospital-wide PACS, and regional PACS.

Mini-PACS: Refers to a system used exclusively with a single type of imaging modality, such as CT or MRI.
Department-Level PACS: Multiple imaging devices in the radiology department can share images and diagnostic reports.
Hospital-wide PACS: Integrates clinical attending physicians, radiologists, and specialists across all departments, along with various medical images, physician orders, and diagnostic reports, into a unified network.
Regional PACS: A PACS network spanning a local area and cross-regional wide area networks.

Schematic Diagram of Department-Level PACS System

In summary, the advent of PACS has addressed both image acquisition and data transmission and storage. As for image analysis and processing, which have not yet been discussed, we will elaborate on them later. Before that, let us first gain an understanding of big data in medical imaging.

Factors Contributing to the Formation of Big Data in Medical Imaging in China

As a new term, it is now impossible to verify whether medical imaging big data or healthcare big data emerged first. However, to explain medical imaging big data, two points must be clarified: first, the definition of medical imaging big data; and second, the reasons for its formation.

Definition of Big Data: A collection of data that cannot be captured, managed, and processed using conventional software tools within a given timeframe. It requires new processing models to enhance decision-making, insight discovery, and process optimization capabilities, thereby adapting to information assets characterized by massive volume, high growth rates, and diversity.

IBM has summarized the five V characteristics of big data: Volume, Velocity, Variety, Value, and Veracity.

Medical Imaging Big Data, in accordance with the definition of big data, refers to a large-scale, rapidly growing, multi-structured, high-value, and authentic/accurate collection of imaging data generated by medical imaging equipment such as DR, CT, and MRI, and stored within Picture Archiving and Communication Systems (PACS). It falls under the category of healthcare big data, alongside Hospital Information System (HIS) big data, Laboratory Information System (LIS) big data, and Electronic Medical Records (EMR).

The concepts of “multi-structure” and “high value” are easy to understand: they refer to structured and unstructured data with medical analytical and guidance value, generated by the growing variety of medical imaging equipment. The characteristics of “large scale” and “rapid growth,” however, need to be explained in the context of the broader macro environment.

The formation of China's big data in medical imaging is primarily driven by two factors: the market and the population.

In terms of market size, as of June 2015, there were 705 Grade A tertiary hospitals in China. Data from CHIMA 2014–2015 shows that the implementation rates of department-level PACS and multi-department or hospital-wide PACS systems in China had reached 60–70% and 50–60%, respectively, basically covering Grade A tertiary hospitals in first-tier cities across the country.

In terms of market growth rate, China's PACS market has been growing at an annual rate of over 25%. According to ACMR survey data, from 2012 to 2015, the Chinese PACS market continued to expand at a growth rate exceeding 20%.

PACS系统市场.png

Data Source: ACMR

In terms of demographics, the primary factors influencing the formation of big data in medical imaging are the population base and age distribution. According to the main data bulletin of the Sixth National Population Census by the National Bureau of Statistics, China’s total population was approximately 1.37 billion. Regarding the growth rate and proportion of the elderly population, by the end of 2014, the number of people aged 60 and above in China had reached 212 million, accounting for 15.5% of the total population. It is projected that by the middle of this century, China’s elderly population will peak at over 400 million, meaning one in every three people will be elderly.

老年人口比重.jpg

Therefore, the widespread adoption of PACS and China’s large population constitute the massive foundational base for medical imaging big data in China; meanwhile, the rapid growth rates of PACS implementation and the aging population underpin the high-speed expansion of this data. Together, these factors account for the emergence of medical imaging big data in China.

As the final component of the 5V characteristics of big data, how should the veracity of medical imaging big data be achieved? This involves data processing technologies.

Data Processing and “Yuxiang Shredded Pork”

In simple terms, the data collected by PACS from various imaging devices often vary significantly in quality. The accuracy and reliability of data analysis and output results largely depend on the quality of the collected data. As the saying goes, “garbage in, garbage out.” Without ensuring data accuracy, big data analytics becomes nothing more than an empty promise.

Currently, post-processing methods for medical imaging mainly fall into two categories. The first is direct processing technology, whereby images are processed directly on the imaging equipment using software after the patient undergoes an imaging examination, such as performing angiography on CT and MRI scanners. The drawbacks of this approach are quite apparent: it does not allow for image modification, relying instead on physicians’ experience for pathological interpretation, which leads to inaccuracies in the resulting data.

For example, when tissue structures overlap in CT images, conventional image processing software often misinterprets these overlapping data as noise or other interference signals. In contrast, medical experts require the preservation of geometric affine invariance at image boundaries or target contours (simply put, maintaining image integrity), which poses unpredictable challenges for clinical diagnosis.

In addition to software processing on imaging equipment, another approach involves transmitting imaging data from the imaging devices to the PACS system for post-processing. For instance, the PACS system can perform image segmentation, registration, and clustering through multi-dimensional image fusion technologies (such as CT/MRI/PET-CT), thereby preserving the authenticity of the imaging data as much as possible.

Schematic Diagram of Multi-dimensional Image Fusion (CT/MRI/PET-CT)

Multidimensional Image Fusion: This “cutting-edge technology” primarily involves data preprocessing, image segmentation, feature extraction, and matching assessment. While this may sound confusing, in simple terms: Data preprocessing refers to the fact that medical imaging databases contain massive amounts of raw data from diverse sources, which often include substantial blurred, incomplete, noisy, and redundant information. Therefore, prior to data mining, it is essential to clean and filter this information to ensure data consistency and certainty, thereby transforming it into a format suitable for mining.

We are well aware that medical imaging databases contain vast amounts of image data. For illustrative purposes, we liken these image data to various raw ingredients, and the final processed information to the dish “Yuxiang Shredded Pork.”

Data PreprocessingData preprocessing can be likened to the process of cleaning ingredients. To prepare Yuxiang Shredded Pork, you must first thoroughly wash the pork, carrots, green peppers, and even the scallions, ginger, and garlic, filtering out impurities and retaining only the essential components before proceeding to the next steps. This stage, which includes tasks such as image denoising, enhancement, smoothing, and sharpening, is collectively referred to as data preprocessing.

Once the “ingredients” are cleaned, they enter the stage of image segmentation and feature extraction, which can be likened to the process of slicing or cutting the “ingredients” into strips or segments. Taking Huiyi Huiying, a well-known Chinese medical imaging company, as an example, computer algorithms automatically segment pelvic CT structures such as the bladder, prostate, and rectum with a segmentation accuracy of less than 2 mm by leveraging multi-dimensional image fusion technology along with organ morphological models, image edge feature models, and neural network clustering models. This provides essential image processing tools for subsequent intelligent matching and decision-making.

In the final step, we take the “ingredients” processed through the first two stages and stir-fry them with green onions, ginger, and garlic to make a dish of Yuxiang Shredded Pork. This represents the process of image matching and clustering. The core technology relied upon by PACS systems at this stage is deep learning, also known as artificial intelligence (AI). Next, we will explore how AI is applied in the field of medical imaging.

Inaccurate and with significant gaps: the context for the emergence of artificial intelligence

In August this year, the State Council issued the Notice on the “13th Five-Year” National Science and Technology Innovation Plan, which identifies artificial intelligence as a key priority. The Plan explicitly states that it will prioritize the development of human-like intelligent technologies and methods driven by big data; break through theories, methods, and key technologies for human-centered integration of humans, machines, and objects; and develop related equipment, tools, and platforms. It aims to achieve significant breakthroughs in big data analytics-based human-like intelligence, realizing human-like vision, hearing, language, and thinking, thereby supporting the development of the intelligent industry.

Before we explore how to apply artificial intelligence to medical imaging, it is essential to first understand the two challenges facing medical imaging in the absence of AI.

According to VCBeat, over 90% of medical data originates from medical imaging, yet the majority of this data still requires manual analysis. The drawbacks of manual analysis are evident: first, it lacks precision, as judgments rely heavily on experience, making misdiagnosis highly likely. Data on misdiagnosis released by the Chinese Medical Association indicates that approximately 57 million patients are misdiagnosed annually in China’s clinical practice, with an overall misdiagnosis rate of 27.8%. The misdiagnosis rate for organ ectopia reaches 60%, while the average misdiagnosis rate for malignant tumors stands at 40%, including cancers such as nasopharyngeal carcinoma, leukemia, and pancreatic cancer. The average misdiagnosis rate for extrapulmonary tuberculosis, such as hepatic and gastric tuberculosis, also exceeds 40%.

Second, there is a significant gap. According to data from VCBeat’s Eggshell Research Institute, the annual growth rate of medical imaging data in China is currently around 30%, while the annual growth rate of radiologists is only about 4.1%, resulting in a disparity of 23.9%. The growth in the number of radiologists falls far short of the surge in imaging data. This means that radiologists will face increasingly heavy pressure in processing imaging data in the future, potentially far exceeding their capacity.

Radiologist Workflow Diagram

This is also evident from the current working conditions of radiologists. In January this year, a survey was conducted among 1,241 medical imaging physicians, and one finding was particularly noteworthy: over 71% of imaging physicians expressed hope for the reinstatement of radiation leave.

Data in the report shows that more than 50% of doctors work over eight hours a day, with 20.6% averaging more than 10 hours daily. Many physicians have reported that radiation leave exists in name only. Numerous doctors commented expressing their hope to one day enjoy long-denied radiation leave and public holidays, allowing them to spend more time with their families!

So, how can we address the current challenges of high misdiagnosis rates and significant shortages in medical imaging? The best answer is artificial intelligence.

AI's Cutting-Edge Technology: Multi-Layer Convolutional Neural Network Architecture

The application of artificial intelligence in medical imaging is primarily divided into two parts: the first is image recognition, which has been explained earlier; the second is deep learning, which constitutes the core component of AI applications. Both aspects are based on data mining and utilization derived from large-scale medical imaging datasets.

In 2006, Professor Geoffrey Hinton, a leading figure in the field of neural networks, and his doctoral students published papers in *Science* and related journals, introducing the concept of “Deep Belief Networks” for the first time. Unlike traditional training methods, Deep Belief Networks incorporate a “pre-training” phase, which efficiently initializes the network weights to values close to the optimal solution. Subsequently, “fine-tuning” techniques are employed to optimize the entire network. The application of these two techniques significantly reduces the training time for multi-layer neural networks. He coined a new term for the learning methods associated with multi-layer neural networks: “Deep Learning.”

In 2012, Professor Hinton’s research team participated in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), organized by Professor Fei-Fei Li and colleagues at Stanford University. The challenge comprised 1.2 million high-resolution images across 1,000 categories. Leveraging a novel deep learning architecture based on multi-layer convolutional neural networks, Hinton’s team achieved a breakthrough by reducing the image recognition error rate from 26.2% to 15.3%. This revolutionary technology propelled deep neural networks into the medical and industrial sectors at an unprecedented pace, paving the way for the subsequent emergence of numerous medical imaging companies utilizing this technology.

For example, internationally renowned medical imaging companies such as Enlitic and DeepCare, which recently secured RMB 6 million in angel financing from FreeS Fund in China, both leverage large volumes of accumulated imaging and diagnostic data to continuously train neural networks through deep learning, thereby improving the accuracy of physicians’ diagnoses.

QQ图片20160915123555.png

Schematic Diagram of Enlitic's AI-Assisted Medical Imaging Diagnosis

Taking the malignant tumor detection system developed by Enlitic as an example, validation using lung cancer-related image databases—the Lung Image Database Consortium (LIDC) and the National Lung Screening Trial (NLST)—revealed that the system’s accuracy in detecting lung cancer was more than 50% higher than that of a radiologic technologist.

In summary, the integration of artificial intelligence (AI) with medical imaging offers numerous benefits, with patients, radiologists, and hospitals all standing to gain from its application. AI not only helps patients undergo health examinations—including X-rays, ultrasounds, and magnetic resonance imaging (MRI)—more rapidly, but also assists radiologists in reducing image interpretation time, improving efficiency, and lowering the probability of misdiagnosis by providing diagnostic support through alerts on potential adverse findings.

With the increasing adoption and application of artificial intelligence and big data in medical imaging, the challenges of accuracy and significant data gaps can be effectively addressed. The integration of these technologies will become a key direction for the development of medical imaging.