Home Google Enters Bioacoustic AI with HeAR Model; FDA Grants First Approvals – The Sound-Based Diagnostics Sector Takes Flight

Google Enters Bioacoustic AI with HeAR Model; FDA Grants First Approvals – The Sound-Based Diagnostics Sector Takes Flight

Aug 30, 2024 07:59 CST Updated 08:00
Google

Internet-related services and product providers

Pfizer

Pharmaceutical R&D Developer

Listening to sounds emitted from a patient’s body to diagnose diseases has become closer to reality!


Earlier this year, Google officially announced the bioacoustic foundational model HeAR, and recently announced its application in early tuberculosis screening. Meanwhile, earlier this year, Eko Health (referred to as Eko)...Artificial Intelligence ObtainedFDA approval, considered the first AI to use sound to assist doctors in identifying heart failure.


VCBeat statistics found that bio-sound markers have seen rapid progress recently and are now on the verge of significant breakthroughs.


Google Steps In, FDA First Approval, AI for Acoustic Biomarkers Advances Rapidly


In recent months, the emerging field of bioacoustic biomarkers has frequently appeared in the spotlight.Take Eko, one of the most renowned companies in this field, as an example. This year, it achieved a double harvest – its groundbreaking artificial intelligence algorithm was approved by the FDA in March, followed by the completion of a $41 million Series D financing round in June.


Another well-known enterpriseTytoCare completed a $49 million financing round in August last year, and its artificial intelligence algorithm received additional FDA approval at the end of July., which can be used to detect lung crackles in adults and children over two years old, for further detection of potential lung diseases.


Now, Google, which is optimistic about this field, has also entered the market with a large model.


At the beginning of this year,Google Released Research Findings on an AI Model Named HeAR (Health Acoustic Representations)This professional bioacoustic foundational model was trained on an audio dataset of over 300 million two-second-long clips, including approximately 100 million audio samples of coughs, all extracted and edited from 3 billion publicly available, non-copyrighted YouTube audio and video files. The aim is to make new progress in the medical application of bioacoustic biomarkers.


Google's research team benchmarked HeAR on 13 health acoustic event detection tasks across six datasets, 14 cough reasoning tasks, and 6 spirometry tasks, demonstratingHeAR can very accurately identify medical-related sound patterns across a wide range of tasks., calling it one of the most powerful foundational models in bioacoustics to date is by no means an exaggeration.


More exciting than accuracy is its excellent compatibility with audio pickup devices. The study used different audio input devices, ranging from high-end smartphones to entry-level smartphones, and even hidden microphones, ultimately showing thatHeAR can generalize across different pickup devices and achieve high performance with limited training datasets.


At the end of August, Google announcedThe first application project of HeAR will collaborate with the Indian company Salcit Technologies to use HeAR for early tuberculosis screening.


The rapid detection of tuberculosis patients and the provision of timely treatment are the main ways to prevent the spread of tuberculosis bacteria. However, the mainstream diagnostic technologies currently used in clinical practice are still relatively "primitive." The most widely used sputum smear test is already a century old. This method often takes at least one month from sampling to obtaining results, with a positive rate of only about 30%. Its accuracy, efficiency, and speed are all relatively low.


Although the accuracy and efficiency of imaging examinations and the latest molecular biology diagnostic technologies have significantly improved, their promotion in grassroots hospitals is still restricted by cost and technical limitations. In contrast, the early screening of tuberculosis through the collection and analysis of patients' cough sounds via smartphones provides a highly promising universal non-invasive diagnostic method outside medical institutions, offering grassroots facilities much stronger screening capabilities than before.


Besides pulmonary tuberculosis,Asthma and COPD are also targets for bioacoustic markers. Australia's Resapp Health has a long-standing presence in this field and is listed in Australia.One of the two products under its umbrella — SleepCheckRx, used for identifying obstructive sleep apnea, has received FDA approval; the other product, ResAppDx, which helps diagnose lung diseases through coughing and breathing sounds, has obtained CE approval.


At the end of 2022, pharmaceutical giant Pfizer acquired Resapp Health for $179 million.


Google's large model is mainly used for respiratory system disease applications and is still in its infancy. In the utilization of sound biomarkers, cardiac diseases have made the fastest progress and achieved a breakthrough this year.


In May this year,Eko Announces FDA Approval of Its AI Software, "Eko Low Ejection Fraction Tool (eleft)," Paired with Digital Stethoscope, Marked as the First AI Algorithm Approved by the FDA to Aid in Early Heart Failure Screening, is a significant medical innovation.


With Eko's artificial intelligence, doctors can detect low ejection fraction (or low EF) in the heart within 15 seconds using only heart sounds collected by a digital stethoscope—this indicator reflects the heart's ability to pump blood through contraction. Heart failure with reduced ejection fraction (HFrEF) is also a major type of heart failure. Statistics show that among the more than 6 million heart failure patients in the United States, half suffer from heart failure with reduced ejection fraction.


In the past, ejection fraction testing required the use of ultrasound, which not only incurred high costs but also demanded skilled operators, making it unsuitable for routine examinations at grassroots healthcare facilities. Patients often only underwent further testing when they already exhibited obvious symptoms, leading to numerous cases where the golden opportunity for early intervention was missed.


Eko's AI algorithm, combined with its digital stethoscope, can identify heart murmurs, offering the potential to screen for patients with heart failure with reduced ejection fraction at the earliest stage of routine examinations. Screened patients can then undergo further testing for early intervention.


Eko's progress did not happen overnight. Founded in 2013, the company has consistently focused on putting acoustic biomarkers into practical use. As early as 2015, its digital stethoscope received FDA approval. However, at that time, the product was primarily aimed at remote transmission and lacked any auxiliary functions—it merely transmitted collected heart sounds via Bluetooth to a mobile phone, which then uploaded the audio data to the cloud for experts to conduct remote consultations.


However, this method can integrate patients' audio data with electronic health records (EHR) to enable seamless referrals, documentation, and real-time condition monitoring. This providesEko Gradually Builds "The World's Largest Heart Sound Database"Laid the foundation. These heart sound data were later used by Eko to train artificial intelligence to identify early symptoms of heart disease, which is of great significance for the early diagnosis and treatment of cardiac conditions.


These efforts have finally borne fruit in recent years.Between 2020 and 2023, Eko received approval for multiple medical devices, including upgraded digital stethoscopes and artificial intelligence algorithms capable of generating and analyzing phonocardiograms.


Besides Eko, a number of enterprises worldwide have also made substantial progress in this field. For instance, the AI stethoscope developed by Japan's AMI (Acute Medical Innovation), which can assist in identifying early signs of valvular heart disease (including aortic valve stenosis), received Japanese medical device approval in October 2022.


The progress in this field is evident to all.


Bioacoustic Biomarkers Have Great Potential and Are Poised for Takeoff


The physical structure of human organs changes with physiological and pathological conditions, leading to specific alterations in the sounds produced by patients with different diseases as well as the sounds generated by the organs themselves. These sound characteristics can be regarded as "acoustic biomarkers" of diseases. A simple example is how one's voice becomes hoarse after catching a cold, which reflects such changes.


Gao Zheng, founder and CEO of Cosmic Resonance, who has been committed to the research of bioacoustic biomarkers, introduced the principle of bioacoustic biomarkers to VCBeat: "For example, lung cancer patients may develop a metallic cough and hoarseness due to compression of the bronchus caused by aortic aneurysms or mediastinal tumors. Pneumonia patients, on the other hand, may produce coughing sounds accompanied by rales due to infections causing the alveoli in one or both lungs to fill with fluid or pus."Acoustic biomarkers of different diseases have uniqueness and variability. By performing discriminative analysis of visualized features through Mel-spectrograms, significant differences in sound between lung cancer, tuberculosis, and healthy individuals can be identified, providing a novel perspective and method for disease diagnosis.。”


1.jpg

Mel-spectrograms of Healthy Individuals and Patients with Pulmonary Diseases

 

Precisely because of this, sound data has always been essential medical information within the medical field. The "listening" component of the "inspection, listening, inquiry, and pulse-taking" emphasized in traditional Chinese medicine refers to listening to the patient's voice and breath, which has also been proven over thousands of years of practice to effectively diagnose certain diseases.


Invented in 1816 and announced in 1819, the stethoscope represents further exploration by the medical community into bioacoustic markers. Due to its low cost and portability, the stethoscope became widely adopted. Before the advent of large medical equipment such as CT scanners, diagnosing diseases through changes in body sounds had been a primary method of medical examination.


However, limited by the precision of sound collection of traditional stethoscopes and the limits of human auditory organs, stethoscopes can only make very rudimentary judgments and highly depend on the doctor's experience. Nevertheless, its role in grassroots applications should not be underestimated.


The advent of the digital age has breathed new life into the stethoscope, a device with over two hundred years of history. Digital stethoscopes use electronic technology to convert sound waves into high-precision digital electrical signals, which are then amplified and processed to produce sounds far clearer than those from traditional stethoscopes. Coupled with the rapid advancement of artificial intelligence in recent years, the use of bioacoustic markers in clinical settings is gradually shifting from what was once "out of reach" to something "within grasp."


Even so, this process is not a smooth path. Audio signals can be interfered with by environmental noise, and factors such as voices and outdoor noise can all affect the extraction and analysis of cough sound characteristics. Previously, limited by hardware performance, it was often difficult to capture high-quality sound signals.


A piece of good news isThe rapid progress of sensor technology in recent years has greatly addressed this shortcoming.The new generation of electronic stethoscopes, which use piezoelectric ceramic sensors as sound-picking components, already have better signal quality and a more stable frequency response curve compared to previous versions. They are able to obtain more accurate and clearer heart and lung sound signals, thereby accurately acquiring user health data.


More importantly,As a more popular device for picking up sound markers, smartphones have undergone years of fierce competition and technological evolution, and have made tremendous progress in microphone pickup performance, which is sufficient to meet the basic needs of audio signal acquisition.


"Smartphones actually meet clinical needs in terms of sound frequency requirements. Although different phones may cause some variations in sound, these differences can be minimized through domain generalization techniques and improvements in loss functions. Therefore, hardware now basically does not have a significant impact," Gao Zheng introduced to VCBeat.


After overcoming the hardware barriers, the development of acoustic biomarkers is entering a fast track.


The Road to the West Was Not Easy, but Progress in China Is on Par with Giants


Although the hardware issues have been basically resolved, the application of bioacoustic biomarkers is not a smooth path and still requires solving a series of difficulties.


Gao Zheng said,The current application challenges of bioacoustic biomarkers mainly focus on the software aspect, namely artificial intelligence. Technically, there are still issues to be resolved regarding the analysis of patient acoustic biomarker target characteristics by AI models, model stability in complex environments and cross-device scenarios, and learning methods for small sample sizes with limited precisely annotated data.


Due to pathological factors, the characteristics of cough sounds in patients with respiratory diseases (such as tuberculosis) differ significantly from those of healthy individuals and are relatively easy to identify. However, patients with different lung diseases may exhibit similar symptoms, and the features of their cough sounds may overlap, making it difficult to determine the specific disease.


How to design precise target features for cough sounds of patients with specific diseases based on general audio characteristics, the phonation properties of coughing, and the pathological characteristics and symptom manifestations of specific diseases (such as tuberculosis), thereby enabling accurate differentiation of patients with specific diseases from healthy individuals and those with other lung diseases, remains a challenge that needs to be addressed.


Model stability in complex environments and cross-device scenarios is another challenge. Typically, the audio data used for model training is relatively ideal, but in practical applications, the audio signals that need to be identified are often subject to various interferences, which can affect the extraction and analysis of cough sound features. Moreover, performance differences among various audio capture devices may also lead to changes in the quality and characteristics of cough audio, impacting the diagnostic results of the model.


"In the process of model training, how to mitigate the impact of noise through methods such as data augmentation and noise suppression, and eliminate device differences using domain generalization techniques to enhance the robustness of diagnostic models will be a key issue in improving the practical usability of bioacoustic biomarker models," said Gao Zheng.


Moreover, due to the difficulty in collecting and labeling audio data, this requires the model to maintain good detection performance even with a small amount of precisely labeled data. How to resolve the contradiction between small sample data and complex model learning is also a research topic.


"Obtaining audio data is currently the most challenging part. In current medical detection methods, medical imaging data accounts for 90% of medical information, providing a foundation for model training. However, hospitals previously did not store audio data specifically, leaving artificial intelligence training without necessary resources." Gao Zheng told VCBeat.


High-quality audio datasets are very scarce. If open-source data is used, it will involve cross-channel issues.Taking Google as an example, the dataset used for its model training is clipped from YouTube's audio and video data. Especially, converting video data into audio data requires multiple decoding processes, which may cause data loss. In addition, similar data compression issues exist in WeChat voice messages. Our model training utilized 20 million acoustic data entries, and the biggest advantage is that all the data were recorded on mobile phones, with completely consistent channels," he added.


He stated that the lack of high-quality data could pose challenges for the development of related products: "Statistically, currently approved medical AI products have relatively high sensitivity (over 90%) and specificity (around 85%). However, sound is different from imaging, as there are significant differences between individuals. Models trained with only a small amount of labeled data will find it difficult to achieve the required levels of sensitivity and specificity when applied in real-world environments."


Because of this, Gao Zheng believes that,Large models can enhance the stability of this model and its generalization in real-world scenarios, which will become the key to whether acoustic biomarkers can be put into practical use in the future.


This trend of large models has begun to emerge. Canary Speech, founded in 2016, mainly uses sound biomarkers to detect emotions, stress, and energy levels before obvious disease symptoms appear, but it hasn’t drawn much attention. Last May, Canary Speech announced a collaboration with Microsoft to integrate Microsoft’s large model technology to boost research and development. In June this year, the company received its first $13 million in Series A funding.


In the research of bioacoustic biomarkers, progress in China has also been quite remarkable, and media reports on the application of bioacoustic biomarkers are not uncommon.The First Affiliated Hospital of China Medical University has previously conducted research on intelligent diagnostic technology for carotid artery stenosis based on acoustic biomarkers, and the accuracy rate of this project's auxiliary diagnosis for carotid artery stenosis has now reached 97%. In addition, Beijing Chest Hospital, Capital Medical University, is also conducting clinical research on applying these acoustic biomarkers to the intelligent diagnosis of lung cancer and tuberculosis.


In conclusion


AI Intelligent Diagnosis Technology Based on Acoustic Biomarkers Has Broad Application Prospects


With the advancement of hardware and artificial intelligence-related technologies, the research progress of bioacoustic biomarkers is accelerating. It is believed that there will be more breakthroughs in related fields in the future. VCBeat will also keep an eye on this and welcomes industry experts to share their insights.

 

References:

Baur, Sebastien et al. “HeAR - Health Acoustic Representations.” ArXiv abs/2403.02522 (2024): n. pag.

K. S. Alqudaihi et al., "Cough Sound Detection and Diagnosis Using Artificial Intelligence Techniques: Challenges and Opportunities," in IEEE Access, vol. 9, pp. 102327-102344, 2021, doi: 10.1109/ACCESS.2021.3097559. keywords: {Pulmonary diseases;Artificial intelligence;COVID-19;Medical services;Tools;Lung;X-ray imaging;Artificial intelligence (AI);cough detection;2019 novel coronavirus disease (Covid-19);respiratory illness diagnosis;cough-based diagnosis},

Anthony Vecchione,mobihealthnews.com:Salcit partners with Google on AI technology to detect disease based on coughs