Voice-Enabled EHR Solutions Cut Documentation Time by 70%: How Google, Amazon, and iFlytek Are Transforming Clinical Workflows

Oct 09, 2018 08:00 CST Updated 08:00

With the widespread application of voice technology in the healthcare sector, it has provided new solutions for the entry of medical information.

VCBeat (WeChat: vcbeat) has compiled a list of industry giants and startups offering related services, analyzing how they leverage voice technology to address the pain points of electronic medical record documentation.

Medscape surveyed 15,000 practicing physicians in the United States, and nearly two-thirds reported experiencing professional burnout (42%), low mood (15%), or both (14%). The primary reasons include the need for clinicians to handle various complex medical documents (56%) and the substantial time spent entering patient information into electronic health records (24%). Voice and artificial intelligence technologies are addressing this pain point.

And this will undoubtedly become a large market.

The Complex and Time-Consuming Entry of Electronic Medical Record Information: A Major Cause of Physician Burnout

Over the past decade, the healthcare sector has undergone significant changes with the widespread adoption of Electronic Health Records (EHRs) in the United States. Physicians work an average of 11 hours per day, spending six hours on EHR-related tasks and only 1.5 hours on paper documentation. However, most current EHR systems have been designed as bulky and complex billing platforms rather than patient-care–focused systems that integrate outpatient visits, inpatient care, pharmacy services, billing, and reimbursement. As a result, their usability and efficiency have been compromised.

The resulting complexity and time-consuming nature are the primary causes of physician burnout and job dissatisfaction, representing one of the urgent challenges facing today’s healthcare industry. A study published last September in Annals of Family Medicine showed that primary care physicians spend more than half of their total working hours on electronic health records (EHRs), meaning they devote the majority of their energy and attention to so-called “administrative” tasks.

Moreover, burnout leads to decreased patient satisfaction and quality of physician care, as well as increased rates of medical errors, risk of medical malpractice, and turnover among physicians and staff. Furthermore, it is associated with physician substance abuse and suicide. Although the causes of burnout are multifaceted—such as hospital acquisitions of healthcare institutions, rising drug prices, implementation of the Affordable Care Act, and the gradual shift in payment models toward value-based care—the cumbersome and time-consuming process of documenting patient visits hinders face-to-face communication with patients and the effectiveness of clinical treatment. Additionally, the explosive growth of medical data makes it difficult for physicians to access and manage valuable patient information, thereby impeding efforts to improve patient health outcomes.

Therefore, addressing the challenges physicians encounter throughout their workflow and optimizing the existing electronic health record (EHR) entry process are crucial to improving overall efficiency and the quality of medical services, as well as reducing healthcare costs. According to a report by market research firm Technavio, global hospital spending was projected to exceed $72 billion by 2020, with a compound annual growth rate (CAGR) of 6%, driven largely by the adoption of speech recognition technology in hospital initiatives.

A growing number of healthcare providers are increasing their investments in speech recognition technology. For instance, Premier Health, which operates five hospitals and two large medical centers, spent $1.6 million to develop speech recognition software integrated with Epic. This solution helps alleviate physicians’ workload, saving them approximately 90 minutes per day. Thanks to more efficient workflows, the software has enabled Premier Health to reduce medical costs by approximately $1.3 million.

Voice technology is an increasingly popular feature, particularly well-suited for the healthcare sector. According to a survey of 2,784 physicians conducted by DRG Digital | Manhattan Research, 23% reported using voice assistants such as Apple’s Siri and Amazon’s Alexa in their work. Among these users, 29% indicated that their voice assistant system was integrated into their Electronic Health Record (EHR) platform. These data suggest that as more developers create voice tools specifically designed for clinical workflows, voice technology will provide solutions for the transcription of medical information.

How Industry Giants and Startups Are Addressing the Pain Points of Electronic Health Record Entry

VCBeat has compiled a list of several major enterprises involved in voice-enabled electronic health record (EHR) documentation services—Google, Amazon, iFlytek, Unisound, and Nuance—as well as emerging competitors focused on this niche: Saykara, Suki, and Notable.

Landscape of Major Companies in the Medical Speech Technology Sector

大公司布局语音.png

Startups in the Medical Speech Technology Sector

初创企业.png

E-commerce giant Amazon is exploring how to leverage voice technology to facilitate data entry and extraction in electronic health records (EHRs), thereby enabling efficient information exchange. The Alexa platform hosts lightweight medical applications from institutions such as the Mayo Clinic and Libertana, which can answer medical queries, send alerts in emergencies, and help users communicate with caregivers.

Voice assistant Alexa can also be integrated into electronic health records (EHRs) to serve as a passive recorder. Amazon is conducting trials in hospitals across the United States, including Northwell Health, Massachusetts General Hospital, and Boston Children’s Hospital. However, since Alexa is not yet HIPAA-compliant, its applications are generally limited to non-identifiable uses, such as surgeons’ checklists, patient disease and medication information, and general hospital information. If Alexa achieves HIPAA compliance, its scope of application could be further expanded.

Nuance, the world’s largest speech recognition technology company, has deployed its healthcare solutions across 72% of medical institutions in the United States. With clients in more than 30 countries worldwide, Nuance has accumulated over 300 million doctor-patient interaction records and provides services to more than 500,000 physicians and 10,000 healthcare facilities annually. Its flagship product, Dragon Medical One, is dedicated to providing clinical professionals with voice-navigated file systems and applications, aiming to revolutionize patient communication. The application of this technology has significantly improved diagnostic efficiency for physicians, enabling rapid, flexible, and accurate collection of patient clinical information.

In an ongoing AI research initiative, Google analyzed 216,221 inpatient cases involving 114,033 patients and over 46 billion data points to develop accurate and scalable predictions for various clinical scenarios. Building on this research, Google is also developing speech recognition systems for clinical documentation, leveraging automatic speech recognition (ASR) models to enhance the voice transcription process for electronic health records (EHRs).

In April 2017, iFlytek signed a comprehensive strategic cooperation framework agreement with the Chinese Academy of Medical Sciences and Peking Union Medical College, marking the official implementation of iFlytek’s smart healthcare technologies—such as the voice-enabled electronic medical record system for dentistry—at Peking Union Medical College.

Prior to the signing of this strategic cooperation agreement, the aforementioned voice-enabled electronic medical record (EMR) system for dentistry had already undergone pilot testing and practical implementation. The entire system comprises a medical-grade microphone that clips onto the physician’s collar, a transmitter that fits in the physician’s pocket, and a receiver that plugs into the physician’s work computer. During patient consultations, the physician simply dictates the patient’s medical history, and the system automatically generates a structured electronic medical record on the work computer. The physician then only needs to make minor edits and confirm the content before printing it for the patient and saving the electronic file.

Unisound Intelligent Medical Voice Entry System is built on a high-performance recognition engine tailored for the healthcare sector. It efficiently handles large volumes of text entry through voice input and interacts with hospital information systems (HIS), picture archiving and communication systems (PACS), and other platforms via voice commands and function keys on handheld devices. By adopting voice-based documentation, physicians can effectively avoid copy-and-paste operations, standardize medical record entries, and enhance the security of medical record input.

Currently, this system can effectively save doctors more than 38% of their time. Since the launch of its comprehensive healthcare solution, Unisound has officially deployed the system in over 20 representative large tertiary Grade A general hospitals across China, spanning Central, North, South, and Western regions. These institutions include Peking Union Medical College Hospital, Peking University People’s Hospital, Xijing Hospital of the Fourth Military Medical University, and The University of Hong Kong-Shenzhen Hospital, among others. Additionally, approximately 40 other hospitals are currently in the pilot trial phase.

Unlike the aforementioned large enterprises that have independently launched voice services, startups such as Saykara, Suki, and Notable are more focused on applying speech recognition technology to electronic health records (EHRs). Among them, Saykara, founded in 2015, boasts a team composed of former product leads, engineers, and machine learning experts from Amazon, Microsoft, Google, and Nuance. Saykara’s AI-powered voice assistant can automatically generate documentation, streamline workflows, and facilitate easier interaction between physicians and EHR systems. Data indicates that physicians using Saykara have reduced the time spent managing electronic health records by 70%, thereby enabling them to communicate more effectively with patients and deliver higher-quality medical care. Currently, Saykara has partnered with several medium-to-large healthcare systems in the United States, including OrthoIndy, a renowned orthopedic practice that served as one of its early pilot sites.

Suki, formerly known as Robin AI, launched an AI-powered voice assistant designed to alleviate physicians’ documentation burden and streamline the entry of information and data. Suki conducted 12 pilot projects in California and Georgia, spanning specialties such as internal medicine, ophthalmology, and plastic surgery. Preliminary results from these pilots, which involved using the product five days a week across three different electronic health record (EHR) systems, demonstrated that Suki reduced the time physicians spent on medical documentation by 60%. Furthermore, Suki has partnered with companies including Apple, Google, Salesforce, and 23andMe to deliver cutting-edge technological solutions to consumers, healthcare institutions, and large enterprises.

Notable’s products can automatically document physicians’ consultation notes and update electronic health records (EHRs). The company’s solution leverages natural language processing (NLP) and speech recognition technologies to automatically capture doctor–patient interactions, decipher physicians’ notes, and construct structured data to facilitate EHR entry. To ensure the system operates smoothly, researchers devoted substantial time to recording and monitoring over 2,000 doctor–patient interactions. Currently, Notable is developing products for the Apple Watch.

The Medical Speech Market: Challenging Yet Promising

Currently, the application of speech technology in the healthcare sector still faces three major challenges: accuracy, security, and standardization.

First, concerns from various stakeholders regarding the accuracy of voice transcription for electronic medical records have hindered improvements in the overall quality of medical transcription over the past few years. In response, different companies are actively seeking solutions to enable speech recognition technology to better alleviate physicians’ transcription burdens.

For example, Google developed and evaluated two automatic speech recognition (ASR) methods to streamline physicians’ workflows. The first system is a Connectionist Temporal Classification (CTC) model, which focuses on the positioning and sequencing of speech units, directly mapping speech to corresponding text to achieve classification in temporal sequence problems.

Another is the LAS (Listen, Attend, and Spell) model, a multi-component neural network that converts speech into individual characters of language and then selects subsequent items based on prior predictions. Each model was trained on over 14,000 hours of anonymized medical conversations to improve the accuracy of speech transcription.

The research results indicate that the CTC model ultimately achieved a character error rate (CER) of 20.1%, with most errors occurring at the beginning and end of speech segments where the speaker’s utterance duration was less than one second. In contrast, the LAS model achieved a final CER of 18.3%, with most errors arising during conversational phases and unrelated to medical terminology.

Researchers stated, “With the widespread adoption of electronic health record (EHR) systems, the shortage of primary care physicians has intensified, and burnout rates have risen. By optimizing the process of information extraction and analysis, automatic speech recognition (ASR) technology can enhance voice transcription for EHRs, helping physicians alleviate the so-called administrative burden and deliver higher-quality, more focused medical care.”

Another key challenge in the application of voice technology in healthcare lies in safeguarding patient-generated data and ensuring compliance with HIPAA standards. In accordance with U.S. federal and state privacy laws, the U.S. Department of Health and Human Services (HHS) has established the Health Insurance Portability and Accountability Act (HIPAA), which addresses patient safety and the protection of personal privacy. HIPAA sets forth a set of standard measures for healthcare providers to protect patient privacy. Strict adherence to these regulations is mandatory when entering information into electronic health records.

Finally, the issue of standardization is addressed. In 2006, the Healthcare Information and Management Systems Society (HIMSS) published the white paper “Electronic Medical Records vs. Electronic Health Records: Yes, There Is a Difference,” which introduced the Electronic Medical Record Adoption Model (EMRAM). This model serves as a basis for evaluating the level of health information technology adoption in healthcare organizations. The HIMSS evaluation focuses on electronic medical record systems and comprises eight stages. Personalized medicine, evidence-based medicine, and evidence-based management all critically depend on the extensive and in-depth use of modern information technology.

In addition to imposing strict requirements on the documentation, terminology, and coding of electronic medical records (EMRs), China launched the “Graded Evaluation of EMR System Application Levels” in 2010. Under the relevant standards, EMR system application levels are classified into eight grades. The criteria for each grade encompass both localized requirements for the EMR system and overarching requirements for the entire health information system.

Although the application of voice technology in the field of electronic medical records still faces numerous obstacles, reliability, portability, and cost-effectiveness will become key factors for healthcare institutions in adopting transcription tools. The medical transcription industry is considered one of the most promising sectors within healthcare information management, as it is significantly influenced by continuously evolving technologies.

Most medical transcription devices consist of built-in speech recognition and memory storage systems. As automatic transcription technology becomes increasingly prevalent, it is expected to replace various analog devices in the near future. Factors such as the rising value of relevant healthcare professionals or in-house transcriptionists, along with the growth of outsourced medical transcription services, are projected to drive market demand in the coming years.