Nuance’s speech recognition technology, launched in 2011, has maintained a leading position in the industry. This technology is dedicated to providing clinical professionals with voice-enabled navigation file systems and applications, aiming to revolutionize patient communication. Its adoption has significantly enhanced physicians’ diagnostic efficiency, enabling rapid, flexible, and accurate collection of patient clinical information.
Nuance recently announced that, by leveraging its cloud-based clinical speech recognition technology, physicians are now documenting data for an average of 100 million patient encounters annually. This milestone achievement is attributed to the rapidly growing adoption of web-based and embedded Nuance healthcare solutions across medical institutions, with usage having increased by 30% since the beginning of this year. Based on market feedback, Nuance projects that if this growth trajectory continues, 94% of healthcare organizations are either considering or have a strong interest in adopting clinical speech recognition technology, while 89% regard portability as its greatest advantage.
Physicians Grapple with Mounting Pressure to Maintain Comprehensive Patient Insights: Clinicians are facing escalating pressure to utilize diverse clinical system tools and deliver care across multiple platforms, all while ensuring the integrity of electronic health records (EHRs). Generally, documentation of care is completed via mobile devices and other virtual environments. These systems must offer robust security and flexibility, enabling physicians to rapidly and accurately ascertain patients’ real-time clinical status, thereby supporting today’s constantly evolving clinical workflows.
“Compared with the past, doctors are moving more frequently both inside and outside the hospital. At the same time, patient data needs to be recorded as soon as possible. When patients have related needs, doctors must also be able to provide services at any time. Integrating Nuance’s Dragon medical devices with Cerner’s physician documentation tools allows us to effectively meet these demands,” said Dr. Hanna Ehab, Chief Medical Information Officer (CMIO) of Universal Health Services (UHS).
Cerner has deployed this solution in 16 healthcare institutions across China. “Our goal is to integrate with Cerner’s electronic health records (EHR) system and transmit at least 30% of patient clinical information to Nuance’s speech recognition technology. Within just a few weeks, we were pleasantly surprised to see voluntary adoption rates among hospital patients reach 60% to even 90%. This demonstrates that such high adoption rates cannot be achieved without proactive improvements and the use of advanced tools.”
Dr. Hannah attributed this significant adoption rate to Nuance’s speech recognition and voice profile features, which allow physicians to seamlessly integrate the tool into their clinical workflows and retrieve necessary clinical data, thereby enhancing data portability and accuracy. “It understood me from the very first minute I started speaking,” he added, noting that physicians greatly appreciated using the software without requiring any training. Physician satisfaction and the software’s ease of use were also key factors contributing to its high adoption rate, while simultaneously serving as an important medium for improving communication between UHS medical staff and IT personnel.
In addition to Nuance’s cloud-based software applications, more than 3,000 companies that leverage speech recognition to unlock platform data entry points—such as Cerner, Epic, and eClinicalWorks, which have developed mobile EHR apps—have emerged in recent years.
An Analysis of Nuance's Global Product Project Research
Nuance Communications is a multinational computer software and technology development company headquartered in Burlington, Massachusetts, a suburb of Boston, United States. Its primary business focuses on providing voice and imaging applications. Current product offerings are centered on server-based and embedded speech recognition, call routing systems, automated directory assistance services, medical transcription software and systems, optical character recognition (OCR) software, and desktop image processing software.
The company also maintains a small division dedicated to providing software and system development for military activities of government agencies. In October 2011, unconfirmed research indicated that its servers supported the Siri voice recognition application on Apple’s iPhone 4S. As the world’s largest voice recognition company, how did Nuance stand out in fiercely competitive markets? The following provides a systematic overview of the company’s product portfolio, key executives, financing history, and business model.
1. Nuance Company Product Project Introduction
Optical Character Recognition (OCR) is an application that converts images of handwritten or printed text into machine-readable codes through mechanical or electronic input methods. It is widely used in businesses processing printed data, enabling the output of various documents—such as passports, invoices, bank statements, computerized receipts, business cards, mail, and other static data—in any suitable printable format.
As digitally printed text, it can be manually edited, retrieved, stored, and displayed online in electronic form, and is machine-readable. Common applications include machine translation, text-to-speech conversion, and data mining of key information and text.
OCR is primarily active in the fields of pattern recognition, artificial intelligence, and computer vision. Early versions of OCR required training on each character image and could only process one font at a time. With subsequent advancements, OCR has gradually achieved high recognition accuracy and become capable of effectively recognizing most fonts. Some systems can even preserve layout formatting, producing output that closely replicates the original page’s appearance, including images, columns, and other non-textual elements.
Speech synthesis primarily refers to the artificial generation of human speech. Computer systems equipped with speech synthesis capabilities are known as speech computers or speech synthesizers, and speech synthesis functions can be implemented through software or hardware products. Text-to-Speech (TTS) systems refer to systems that convert standard written language into speech; other systems convert symbolic phonetic representations, such as phonetic symbols, into spoken output.
The creation of synthetic speech requires locating the storage positions of recorded speech in a database and linking them with concatenation modules. The size of the stored speech units in the system determines the size of the synthesized speech; storing telephone-quality audio or phonemes provides the widest output range but may lack clarity. For specific application domains, storing entire words or sentences allows for high-quality speech output.
Furthermore, speech synthesizers can leverage vocal tract models and voice characteristic models of other individuals to generate fully “synthetic” voice output. The quality of a speech synthesizer is evaluated based on the degree of similarity between the synthesized speech and human voice. Readable text-to-speech conversion offers significant convenience for visually impaired individuals or those with dyslexia, enabling them to listen to textual content on their home computers. Since the early 1990s, many computer operating systems have incorporated speech synthesis systems.
Speech Recognition (SR) is an interdisciplinary technology combining computer science and linguistics, primarily referring to development methods and techniques used in the fields of linguistics, computer science, and electronic engineering. It enables the recognition of written text and spoken language through computers and computer-equipped devices, leveraging categorized intelligent technologies and robotics. It is also known as “Automatic Speech Recognition” (ASR), “Computer Speech Recognition,” and “Speech-to-Text” (STT) technology.
Some SR systems employ “training,” a process in which a single speaker reads text or isolated words into the system; the system then analyzes the individual’s specific voice characteristics and uses this data to fine-tune speech recognition for that speaker, thereby improving accuracy. Systems that do not use such training are referred to as “speaker-independent” systems, whereas those that do are called “speaker-dependent” systems.
Applications of speech recognition include voice user interfaces, such as voice dialing (e.g., “Call Home”), call routing (e.g., “I want to make a collect call”), home automation and appliance control, search (e.g., finding a podcast where specific words are spoken), simple data entry (e.g., entering credit card numbers), preparation of structured documents (e.g., radiology reports), speech-to-text processing (e.g., word processors or email), and direct voice input.
From a technical perspective, speech recognition has a long history and has undergone several major innovations. Recently, the field has been positively influenced by the trend toward deep learning and the development of big data. Currently, there has been a significant increase in the number of academic papers on the research and design of speech recognition systems worldwide. Prominent companies include Microsoft, Google, IBM, Baidu (China), Apple, Amazon, Nuance Communications, and iFlytek (China), many of which have stated that breakthroughs in the core technologies of speech recognition systems are based on the widespread adoption of deep learning.
2. Nuance Team Members
Paul RicciHaving served as Nuance’s CEO for 13 years, he is well-prepared to compete in the voice-assisted Web arena. Ricci is renowned in Silicon Valley for his relentless pursuit of objectives. During his tenure, he completed 60 mergers and acquisitions, several of which faced threats of patent litigation (Nuance’s official records cite eight such cases). In a Nuance conference room in Sunnyvale, Ricci stated, “I disagree with that assessment; numerous sellers have amassed significant wealth through these M&A deals and are highly satisfied with the outcomes.” He later added, “When running a company, the only thing you can do is create long-term value.”
Daniel Cheng, General Manager of Nuance Greater China
Holds a Master’s degree in Artificial Intelligence from the Department of Computer Science at the University of Essex, UK, and a Bachelor’s degree in Naval Architecture and Marine Engineering from Newcastle University. Previously served as Managing Director of Symantec China and Managing Director for Greater China at Business Objects. With nearly two decades of experience in the IT industry, he possesses profound insights into the Asian and Greater China markets.
Matt Revis, Vice President of Product Management, Nuance Mobile Business Unit
MBA from Columbia Business School; formerly served as Product Manager for Dragon NaturallySpeaking.
3. Nuance's Financing Status
From the late 1990s to the early 2000s, Nuance began to counter competition from other NLSR vendors, including Philips SpeechPearl, SpeechWorks, and other smaller companies. Later, Nuance licensed its technology to third parties (including training and consulting services) and enabled independent software vendors and interactive voice response (IVR) providers to build applications on an IVR platform.
In October 2011, Nuance Communications acquired Swype, a company known for its touchscreen input software. In December 2011, Nuance acquired Vlingo after multiple lawsuits alleging patent infringement by Vlingo. Based in Cambridge, Vlingo aimed to simplify application usage through voice technology, offering its own speech-to-text API for J2ME/BREW applications. In April 2012, Nuance acquired Transcend Services. Transcend leveraged its proprietary internet-based voice and data distribution technologies, customer-base technologies, and home-based medical language specialists to convert physicians’ dictations into electronic documents. Additionally, it provided customers with outsourced transcription and editing services on its platform.
June 2012 – Nuance acquired SAFECOM, joining forces with the provider of print management and cost recovery software integrated with HP printing devices.
September 2012 – Nuance acquired DITECH Networks for $225,000.
September 2012 – Nuance acquired Quantim, QuadraMed’s HIM business—a provider of information technology solutions for the healthcare industry.
October 2012 - Nuance Acquires J.A. Thomas & Associates (JATA)
November 2012 - Nuance Acquires ACCENTUS
December 2012 - Nuance acquired Jiran.
January 2013 - Nuance acquired VirtuOz.
May 2013 - Nuance acquired the Twiddle Group for $80 million
July 2013 - Nuance Acquired Cognitive Technologies, Inc.
October 2013 - Nuance acquires Varolii (formerly Par3 Communications)
4. Nuance’s Business Model
A global overview of the voice technology market reveals that over 80% of speech recognition systems utilize Nuance’s recognition engine technology. Nuance holds more than 1,000 patented technologies, and its voice products support over 50 languages, serving more than 2 billion users worldwide. In the financial sector, Nuance has over 500 clients; in the telecommunications industry, more than 10 of the top 15 companies are Nuance customers. Speech recognition is widely applied in areas such as call center customer service, GPS voice-enabled location search, electronic dictionary pronunciation, and speech-to-text translation into multiple languages.
Nuance is a publicly listed company in the United States, currently employing approximately 5,000 to 6,000 people worldwide. Its revenue last year exceeded $1 billion, and its current market capitalization stands at around $5 billion. From the perspective of the software industry, Nuance has experienced relatively rapid growth.
Nuance currently operates four business divisions: the Healthcare Division, which provides medical record management and transcription services; the Enterprise Division, which offers customer service and call center applications, particularly for users in the banking and telecommunications sectors; the Imaging Division, which delivers solutions for multifunction printer (MFP) scanning, PDF processing, and document automation; and the Mobile Devices Division, which provides command and control functions, voice search, and messaging applications for mobile phones and automotive systems.
As a global leader in speech technology, Nuance provides customers with comprehensive, multi-faceted technical solutions. Many speech recognition companies in China have not yet reached this level of maturity. Nuance continues to strengthen its investment in China, including recruiting local experts and technical personnel to refine and improve its offerings to better align with local user habits.
For instance, the company has an R&D center in Shanghai and an R&D team in Beijing. The Shanghai-based team primarily leverages Nuance’s advanced speech recognition technology for localized development, refining Chinese-specific details to better align with the usage habits of Chinese users. By drawing on the extensive expertise of its professional services team and global partner network, Nuance offers the industry the most comprehensive and diverse portfolio of voice, language, text, and image solutions worldwide. Through voice interactions, Nuance has built a robust user experience database, thereby helping clients maximize the potential of their devices, applications, and information systems. In this field, no other company anywhere possesses more extensive experience than Nuance.
Nuance integrates industry-specific natural human-computer interfaces with some of the world’s most sophisticated technologies, services, and processes, delivering a powerful, nearly effortless user experience to its customers. Today, thousands of companies in healthcare, mobile, telecommunications, and other industries, along with millions of users worldwide, leverage Nuance’s technology to convey critical information, boost productivity, and conduct business—simply by speaking.
Compiled by: Chen Kun
Editor: Zhang Nan